≡ Menu

First Look: Revolution Analytics Update


Since I last wrote about Revolution Analytics (back when they announced their Netezza relationship) they have added some new management and are growing fast. In particular they see lots of big data experimentation evolving into actual projects. To recap, Revolution Analytics is a commercial analytics company based on the open source R statistics language. The core product is Revolution R Enterprise and they provide support, consulting and training for companies adopting R. They are still growing rapidly and hiring with 200+ global 2000 customers. They have opened offices in Singapore and London (in addition to their headquarters in Palo Alto CA). Industries of focus include digital media, financial services, government, health and life sciences, manufacturing, retail and telco. All of this is underpinned by the exploding growth of R in terms of users and new R packages.

Revolution R Enterprise is focused on providing distributed high performance analytics, improved productivity and enterprise platform support. Revolution continues to focus on the data scientist experience to make sure Data Scientists can use the full range of capabilities of R. 100% of R code will always run, and run faster, on the multi-threaded engine. Not all R code can take advantage of the scaling/distribution of the Revolution implementation unless a couple of specific routines are used to replace standard R routines.

The product still includes the core open source R libraries but Revolution has ported these libraries to platforms to IBM/Netezza or the Microsoft HPC server. They have added additional connectors, including connectors to SAS or Hadoop. They have also added scalability support (replacing R’s in-memory, single threaded execution with distributed and multi-threaded execution) to deliver R performance that is both better and linearly scalable for millions or billions of rows. They also have both improved development and deployment tools. Revolution also has a large partner ecosystem across SIs, data infrastructure, deployment platforms, ETL etc. Specific changes since I last wrote about them include:

  • Multiple enhancements to the ScaleR library of high-performance analytics including support for Decision Trees and Generalized Linear Models
  • Support in ScaleR for Big Data files (XDF), to overcome R’s in-memory limitation
  •  The ability to run ScaleR high-performance analytics on data in the Hadoop Distributed File System (HDFS), and to write custom Map-Reduce algorithms for Hadoop in the R language
  • High Speed Teradata Connector that uses the Teradata Parallel Transport utility
  • Improvements to the DeployR APIs around repository, script management and R integration
  • Support for distributing computations across multiple processors or multiple nodes including Microsoft HPC giving support for Azure cloud deployments
  • An R “Data Step” function for preprocessing data from external data sources like SAS files.

From a Decision Management perspective, one of the critical components is the DeployR web services component that allows real-time, interactive execution of R models. They also support PMML allowing models to be deployed to rule engines that consume PMML and to platforms like Zementis ADAPA. One of the critical advantages of this is that it splits off the data analytics piece from the deployment piece. Unlike R where a web services deployment would require both R and web services skills, DeployR allows the IT group to focus on the webservice, the analytics team to focus on R and business users/systems to simply consume the predictions from the web service. Often this separates using R to build the model from a real-time scoring engine. You can also call R code as part of the deployed web services too as sometimes that makes for a particular model and the web services also support interactive access to models that might involve sucking in more data and re-modeling.

Later this year, for instance, they plan to release an ability to train predictive analytic models inside Hadoop deployments and ultimately other data warehouse infrastructures (they already support Netezza and plan to add Teradata and Greenplum).

Revolution Analytics is one of the vendors in our Decision Management Systems Platform Technologies Report, and you can get more information on them here.


Comments on this entry are closed.

Next post:

Previous post: