I got a chance to chat with the folks from KNIME recently to discuss their workbench. KNIME is essentially a workbench to define the pipeline of operations you want throw at your data, typically as part of doing analytic work. It allows you to do complex things to the data and document your process – joining, normalizing, cleaning and visualizing/building predictive models. In some ways it could be considered an ETL tool but it also supports the definition of a modeling process in a similar way to SPSS Clementine, SAS Enterprise Miner or FICO’s Model Builder. KNIME is an open source platform and many other companies are adding their own modules the product. In addition KNIME have pre-integrated R and WEKA, though the tool also includes core algorithms natively such as Support Vector Machines, Neural Networks, Decision Trees etc. Most of their R and WEKA users are actually migrating though some are planning to use KNIME and R.
The product is getting lots of use in life sciences, especially in early stage drug discovery, in part because some 10 specialty companies have integrated their offerings (such as chemistry-aware libraries for analysis) with KNIME.
As a company, KNIME is already selling commercial support for the open source platform and is building out a server / grid execution environment and a reporting tool. They have added generation of PMML (the open Predictive Model Markup Language) for model definitions and are partnering with Zementis so that these models can be easily deployed, even to the cloud.
Today most KNIME customers are focused on the use of analytics to gain new insights into the information they have, rather than on the use of these analytics in operational systems. While the KNIME server will allow you to run the pre-defined workflows it is not really designed as a high performance scoring engine (hence the integration with Zementis and the PMML export).
Like any tool offering a more graphical, declarative approach to this kind of work they have to deal with lots of folks who have their scripts – in SAS or whatever – and don’t want to use anything “less flexible”. However they are beginning also to see companies in which managers are overriding this, determined to bring visibility to their data integration and analytic processing.
One of the interesting things about our discussion was that they have found many companies looking for a more visible, declarative approach to data integration. Some of these are worried only about reporting, rather than analytics, and so KNIME is adding a reporting engine. While their Pharma and Life Sciences customers come looking for an integrated platform, their other commercial prospects are often coming from this ETL perspective, moving to reporting and analytics only afterwards.
One of the interesting angles I see in KNIME is that it allows a company to create a common infrastructure – and a clear, well documented one – for providing both reporting data and analytic data. As I have noted before, this need to have analytic and reporting infrastructures return compatible results even though the level/aggregation of data can be quite different, is real. Serious adopters of analytic decision making need it and KNIME would give you an interesting, and open source, way to approach this.