I got my first chance to catch up with Calpont recently. Calpont is a privately held company based in Frisco TX with 25 employees. They have been developing their analytics database technology over the past 2.5 years. InfiniDB Enterprise is the commercial product released in February 2010 and InfiniDB GPL version was launched in October 2009. They just released a new version November 4th.
InfiniDB is a scale-out MPP, columnar, analytic database designed to enable analytics for large data volumes, fast data loading and queries. Calpont has a commercial Open Core model with a core open source engine around which enterprise features and services are wrapped. The open source Community Edition has had 15,000 downloads since April with demand from web analytics groups, media and telecommunications companies. In addition they have a growing, though much smaller, number of production customers. Community feedback is driving development and the community version is updated regularly – the enterprise edition and the community edition have basically the same release and patch schedule. Alpha and beta versions are pushed into the community too and apart from reserved features (primarily support for multiple servers) everything goes into the community edition.
Calpont is targeting the well established problem of exploding data volumes, increasingly complex organizations, more complex analytics and a need for better decisions to be made faster. Market drivers today are interactive marketing, ad serving, social media and web analytics with telecommunications also a focus. They see future demand in retail/eCommerce, financial services, insurance and government. In general their customers are those with multi-dimensional data, large data volumes, complex variables and near real-time needs.
Overall, they see that the need to understand and serve customers more deeply and more rapidly is making large datasets the norm. Companies need to do more granular analysis, micro-targeting and more advanced simulation. Customers are often looking for a combination of horsepower and familiarity (thanks to the MySQL front end). For instance Aviation Software had a serious data load problem – 187M records had to be loaded and then ad-hoc queries run quickly. Their comparison of the available columnar databases settled on Calpont because of the ease of bringing it into their environment and overall speed.
Both editions of InfiniDB are fully parallel/multi-threaded, column-oriented and terabyte capable. Both offer scalable performance optimized for analytic workloads and fast data load. No indexing or traditional tuning is required and the database is designed to be self-learning and self-managing. Interestingly InfiniDB uses a familiar MySQL interface.
There is a user module and a performance module – two processes running in software on the same or (in the Enterprise Edition) separate servers.
- The user module handles MySQL, abstracts physical and logical storage/metadata and controls work distribution and results aggregation. The use of MySQL means there is a familiar DBMS interface for developers and users. The user module offers shared nothing/shared everything storage and built in failover.
- The performance module handles the cache, distributed scan/join, resource management and data/schema. The core of the performance and user modules is multi-threaded to take advantage of multi-core hardware.
Users can scale up with the community edition – using a single server supporting multiple cores and threads. They can scale out with the enterprise edition – supporting multiple user and performance modules on different servers with shared storage. New servers can be registered for user or performance modules in real-time making the Enterprise Edition easy to scale out over time.
They highlight some critical benchmark items:
- Their load rate is consistent over time and scale – billionth row loads at the same speed as the first
- Performance is predictable independent of the fact table size
- MPP enables linear scale using commodity hardware – more performance modules results in proportional reduction in query time for instance
- The software does not care if you use more servers or more cores – about the same gain in each case.
InfiniDB 1.0 was released in Feb 2010 – fully columnar, scale-out MPP, integrated map reduction operations tuned for SQL operations and high speed data load. 1.5 came in June 2010 and added sub-query, UTF-8 and Windows support. 2.0 was November 4th this year and added UDFs for in-database analytics, real-time compression and enhanced partitioning and parallelization.