≡ Menu

First Look – Rapid-I


Rapid-I provides open source software for predictive analytics, data mining and text mining. Incorporated in 2006, they are based in Dortmund Germany and have been working on RapidMiner since 2001. They have over 35,000 production deployments and more than 400 customers in 40 countries. Banking and financial services is their largest market followed by Pharma and, interestingly, manufacturing. Customers include some large companies such as Siemens, Pepsico and Allianz. Their RapidAnalytics suite consists of RapidMiner and RapidReporting which offers traditional business intelligence reports and dashboard in addition to predictive analytics. Built on top of these are some solutions including RapidLab that is designed to be prescriptive (for instance to help people configure a machine based on its predicted/simulated behavior), RapidNet for network analysis and RapidSentilyzer to analyze web text for sentiment.

RapidMiner is a classic data mining workbench that allows a set of nodes to be linked to create data mining processes. Rapid-I think of RapidMiner as a process execution engine for the processes involved in data mining and analytics and provide a workbench to manage these processes as well as a server, web-based interfaces and an API. RapidMiner can be extended and has large numbers of extensions built by third parties. Rapid-I themselves also extended, for instance to handle Hadoop with a product called Radoop.

Key features of RapidMiner include:

  • A GUI for analytics that handles more than 1,500 basic operations in the predictive analytic process with numerous extensions.
  • Everything is considered a process so everything built in Rapid Miner can be extended and reused- there are no breaks between functions for ETL, data transformation and modeling for example.
  • Supports in-database, streaming and Hadoop data
  • Lots of automated analysis of problems and potential problems in the processes defined that are brought to the attention of the analyst
  • Connectors for R, Weka and others
  • A marketplace for extensions
  • Standards support including PMML

RapidMiner provides a classic Windows UI that allows processes to be defined from a wide range of operators (both standard and extensions) using drag and drop. All programs are written in Java and hence can be executed on all major operating systems. Processes and other artifacts can be stored in a variety of repositories, both local and shared. Each process node can be inspected to examine its characteristics and meta data without having to execute it, allowing you to work with very large amounts of data without having to constantly process it as meta data propagates through the process. Any problems or inconsistencies (such as trying to apply a method that relies on numeric data to a text field) are flagged as the process is designed and the tool will suggest fixes such as adding a discretization node.

Process steps can also be added based on operator recommendations. These recommendations continually view the current process and suggest possible additional nodes such as cross-validation or champion/challenger testing. All the suggested nodes can be added from the short list of recommendations rather than having to go to the long list of operators, helping analysts rapidly build typical modeling processes.

Many operator nodes have a nested process that can be configured by drilling down into the node to see its process and all configuration of nodes is handled this way. Multiple sub processes can be defined for a node to handle things like training and testing. This decomposition can also be viewed as a tree or hierarchical view of the operator nodes. The view in the GUI can also be switched to a results view that allows all the interim results to be viewed and analyzed.

The operators allow for access to many data sources including all the major databases, large numbers of standard transformations and all the major modeling techniques. The GUI also provides a wide range of visualization and analysis tools to the view data being manipulated.

The server product RapidAnalytics shows recent changes to the repository and allows processes to be browsed as XML files (these files underlie the graphical view in the editor). Processes can be run from the server and can be scheduled. This can be configured using the server or from within the editing environment. The RapidAnalytics server offers several options for integrating the processes and models into other infrastructures. Any process can be exposed as a web service, either headless or with parameters. Processes can be configured to take parameters and this can be exposed through an API or entered in the editor or on the server. Results can be previewed through the server interface. This automation allows models to be built or re-built on a schedule and also allows a scoring process to be defined as a service for use in real-time scoring.

Processes also can be pushed into the database using an in-database extension that supports both DBMS-specific functions and a generic scoring operation that can be generated to execute as SQL on almost on any database. Models can also be exported as PMML. A stream mining extension allows for model re-tuning based on streaming data. Model monitoring can be implemented by building a second process that monitors the performance or behavior of a deployed model process.

Rapid-I is one of the vendors listed in our Decision Management Systems Platform Technologies report.


Comments on this entry are closed.