First Look – Pervasive DataRush

June 4, 2009

in Advanced Analyitcs, Product News

Share

Pervasive is a global software company with 200+ employees that has been profitable for the last 8 years and best known for btrieve (now Pervasive PSQL) and their data integration products. The company is busy expanding into new markets and Pervasive DataRush is one of their new products. They see a new generation of data intensive applications that are Smart, Green, Scalable and Efficient.

  • Smart means applications that handle lots of data, use personalized data, focus on predictive analytics and enable time critical decisions. Today they see problems with software choking on large data volumes, applications being hampered by unclean/incomplete data, limits to accuracy and simulation, and days of processing time for complex problems.
  • Green means applications with smaller carbon footprints that get higher utilization rates (e.g. of multi-core servers) so companies can limit the number of data centers. Currently data processing software tends to need lots of blades and cannot utilize existing hardware as well as it might.
  • Scalable means leveraging multi-core, scaling dynamically to match the hardware and running on any platform. Current software, they say, loses scalability after about 4 cores and requires rewrites or adjustments to use the full capacity of a given piece of hardware. Such a rewrite may also need to be written and maintained for every Operating System.
  • Efficient means finding ways to limit software and hardware costs and free up time on your existing hardware. Current software often limits the number of jobs, requires additional databases for analytic work and requires more blades or servers to scale – hardware is being thrown at the problem.

Obviously they feel that Pervasive DataRush will address these issues. They feel that DataRush can change the way a company does business. When a company can process and understand data on the fly or very rapidly rather than overnight this allows different approaches and creates new opportunities. Taking Netflix as an example they ran more than 100M ratings from 480,000 users (the public data set for the Netflix Prize challenge) in 16.31 minutes with a reasonably competitive result. In contrast most teams spend days running the algorithms. And this was run on a standard 8 core box so not a massive piece of hardware. Clearly if you can run something several times an hour instead of over the weekend you would use it differently and you can easily see how this could create new opportunities for analytic applications.

DataRush is a data processing engine and software platform with a family of embeddable software solutions for data-intensive applications. They have some patent-pending parallel processing technology so that low-level parallel processing issues are handled by the framework – application developers simply focus on their problem. Developers write Java in Eclipse or NetBeans to access APIs in the JRush framework and then scalability, exploiting additional cores and so on is automatic – handled by the JRush framework without recording. DataRush has a large and growing library of operators. It handles locking, memory, threading etc. enabling much easier development and more design-time productivity than attempting to use the standard Java libraries]. They use JMX for monitoring and the Pervasive DataRush Engine runs on a standard JVM/JMX set up. The engine supports an execution API and an operator library. On top of this they have some applications like transforms, join/sort/aggregate, predictive analytics, matching, data profiler.

They have a blog you can check out at http://cs.pervasive.com/blogs/datarush/.

Share

Previous post:

Next post: