I listened in to the Boulder BI Brain Trust briefing from Aster some weeks back and then got a follow-on update last week. Aster was founded in 2005 based on research performed by a team at Stanford. The initial plan was to develop a data management platform based on commodity hardware and this was released in 2007. In 2008 they started using the MapReduce framework and created In-Database MapReduce (SQL-MR) – essentially integrating MapReduce and SQL. The system is not Hadoop-based – though it integrates with it – Aster Data is a MPP database that offers a MapReduce implementation for in-database analytics.
Like many “big data” companies Aster started with Web 2.0 companies and have expanded into mainstream enterprise companies. Their customers have problems with data today that Aster classifies as advanced analysis on big data volumes. These companies are not just looking for big data storage (Aster’s highly scalable MPP Database running on commodity hardware) but also for integrated analytics (where Aster offers an engine that uses both SQL and MapReduce). Adoption of their platform is driven by a need for speed, scale and increased richness of queries and data (relational and non-relational data for instance). Traditional data warehouses still only hold a fraction of the data most companies have (especially once you consider web logs or semi-structured data, like text for example) and deeply analyzing these data is increasingly critical.
Hadoop/MapReduce could be used as it handles the data volumes (in a batch processing mode), but it is fundamentally a programming-centric approach. Aster’s model is layered on MapReduce to allow SQL to be the access mechanism. Aster also provide a lot of pre-packaged functions (1000 or more) for things like statistics, graph analysis etc. so that an enterprise user does not have to code as much from scratch. Like Hadoop, Aster also use commodity hardware to minimize the cost of storage and, while they don’t support quite the range of data types that Hadoop does, they support most and provide more real-time interactivity and all the usual database functions on top. Some customers do use both, using Hadoop for the data that Aster does not handle. Unlike Hadoop, which tends to be batch oriented, Aster’s analytics also support more interactive scenarios as well as batch. In general Aster is complementing, not replacing, an existing data warehouse.
Aster’s customers are looking for specific capabilities around data storage and query processing. Data storage capabilities include speed, linear scalability, optimized for storage and query performance and the use of commodity hardware. In query processing customers want high- and scalable-performance for demanding queries, support for complex/advanced queries and in-database analytics (pushing analytic code into the database without rewriting it).
Aster sees its unique features as its support for automatic parallelization using MapReduce, delivering 100% of processing in-database, their extensive suite of pre-built functions (clustering, time series etc) and the fact that anyone with SQL skills can use MapReduce through Aster’s approach. Aster is also working with companies like SAS to bring their analytics closer to the data for significant performance improvement – 3-10x is typical – while continuing to invest in their own analytic functions.
Aster customers typically:
– Need to store and process masses of data inside and outside their EDW
– Want a platform for rich, complex analytics on large volumes of data that is not limited by traditional SQL systems
– Want high performance in-database analytics beyond UDFs and stored procedures
– Need cost-effective linear scaling on commodity hardware
These customers tend to fall into one of three categories – those attempting deep data analysis to understand customers, those trying to decide and act rapidly in event-centric situations like fraud detection or marketing, and those trying to understand complex systems and networks. Mostly these customers are also doing this kind of analysis every day not just once in a while.
This week Aster made a new product announcement -nCluster 4.6. The key news in this release is the addition of native support for a column store within their MPP DBMS. This allows for a hybrid row/column DBMS which should result in higher performance for specific use cases. Aster feels it already has great performance for ad-hoc cases, for instance, and the support for columnar storage will drive improvement in more predictable/reporting-oriented cases that lend themselves to column stores. Companies can decide per-table or even per-table partition to use a row or column store
What’s nice about the approach is that a single analytic application can access data from a row store, a column store or a mix – even within a single analytic technique. The column store is a first class citizen and offers all the data management capabilities like workload management, compression, indexing etc that the row store supports. All their 1,000+ functions are also supported for row, column and hybrid queries – the user of the functions does not have to worry about where the data is stored as Aster handles queries across both row and column stores individually or together.
This combination of row/column store support with MPP implementation and support for SQL/MapReduce is unique they say. Now companies using Aster will have a layer of analytic functionality that uses the SQL/SQL-MapReduce interface to hit a mix of row- and column-stored data all running on commodity hardware. The 4.6 release offers more than just the addition of a column store. An additional new tool – Data Model Express – will analyze historical query patterns to make column/row store implementation recommendations based on workloads. Aster has also added some new analytic functions – decision trees, histograms and SAX.