I recently did some research on the requirements for enterprise-scale analytics and the challenges of using open source R in this context. In this first post I wanted to outline some of the requirements I see for enterprise scale analytics and in a second post I will discuss the challenges of R in that context.
As advanced analytics, especially predictive analytics, has gone mainstream more organizations are focusing on scaling analytics to the enterprise – moving beyond research projects and pilots to broad adoption and deployment. Success with such a broad adoption requires an industrial-scale approach and an environment with several distinct characteristics:
- A clear sense of the business and analytics objectives.
Broad and deep adoption of analytics requires a clear understanding of the business problem to be solved with those analytics – a clear definition of the decision-making the analytics are going to improve. We are seeing growing success with Decision Modeling in this context as it clearly specifies where the analytics will be used and how they will help.
- Powerful data exploration tools.
With the explosion of Big Data, analytics can only scale to the enterprise if a large volume of fast moving data van be explored effectively. This exploration has to handle a wider variety of data these days and it must be possible to tell if the data is any good.
- Scalable data preparation, modeling, and evaluation.
At the heart of building an analytic model is the data preparation-modeling-evaluation cycle. Only if lots of data can be integrated, manipulated (ideally without having to select a sample first) and modeled quickly can an iterative and scalable analytic process be sustained.
- Seamless, scalable deployment.
Finally a model is no use unless it is deployed at scale. For broad adoption this means rapid deployment on scalable IT infrastructure that supports batch and real-time usage.
Only if an analytic environment meets these requirements will it deliver enterprise scale. In the second post I will discuss the challenges of open source R in this context.
If you can’t wait for the second post or want more detail check out this white paper sponsored by Teradata – Enterprise Scale Analytics with R: Scaling for R with Teradata Aster. It discusses these requirements and the challenges of R in more detail and provides some detail on how Teradata Aster R addresses them. I also recorded a webinar (Up Your R Game: Break Through R Limitations) with Bill Franks of Teradata. You can also check out the recent Teradata announcement of Teradata Aster R, read Scott Gnau’s blog about Teradata’s embrace of R and my First Look on Teradata Aster R.