KNIME is an open source data analytics product based in Zurich, Switzerland that I last wrote about a couple of years ago. They have been working away on the product since then (having started development in 2004 and released their enterprise components in 2010) and have been refining their business plan at the same time. They now have 9,000 registered users/organizations and over 500,000 downloads. About 50% of these are in life sciences and 50% in BI and analytics – this represents a significant increase in non-life sciences usage. As an open source product they have 50 or more very active community developers and the company itself is 15 people (about half are at the University of Konstanz) and describes itself as “small but profitable”. Since I last spoke with them they have added a number of enterprise components and continued to develop the core open source platform.
The platform is based on Eclipse and has, in particular, been extended with lots of new node types. Like most modern analytic workbenches KNIME allows a modeler to define the sequence of steps to produce a model. The various actions that can be performed as part of the modeling process is determined by the available node types – nodes to retrieve data, process and transform that data, apply modeling algorithms (both established and more esoteric) or output results.
In addition a new decision tree viewer has been developed and data support (various databases, SAS and SPSS, and other flat files) has been improved. KNIME supports PMML generation for models including the most frequently used preprocessing operations required for a model – this was added to PMML in 4.0 and KNIME is one of if not the only vendor completely supporting this. There are lots of nodes for pre-processing and many algorithms including some of their own, some from the R Project, others from Revolution Analytics and Weka as well as lots of new visualizations.
One of KNIME’s strengths is a well defined node interface that is nicely self-contained allowing easy integration of new node types without destabilizing the whole environment. KNIME makes it easy to add nodes so that community members and partners can extend the tool with their own node types. A community site manages the community submissions and makes it easy to add any you want. Data management and execution control are also handled through well defined interfaces allowing people like Pervasive DataRush to add their own integrations.
Because KNIME is built in Eclipse they also integrate with other Eclipse projects like BIRT (an open source reporting framework). This allows these open source reporting components to be added to KNIME workflows.
Improvements and new node types are often sponsored by companies that use the product, though KNIME continues to push its own development plan. KNIME is also an integration platform for many vendors with 15 technology vendors integrating with KNIME, especially in the life sciences environment where life science companies are paying their technology vendors to integrate with KNIME to create a single environment.
The KNIME workbench is open source with registration desired but not required. This open source client version is a complete environment. Access, transformation statistics and data mining, visualization, reporting and workflow is all available in the open source with 1,000+ native and embedded nodes including text processing, social network analytics, image processing and more. Companies can add support to the open source project or adopt the server or enterprise components that are proprietary and licensed.
KNIME Team Space allows a small group to have a shared repository. KNIME Server adds the full server-based capabilities including remote and scheduled execution of modeling flows, a workflow repository, shared data in a managed repository, shared metanodes or component workflows for reuse, and web browser access. The browser interface exposes workflows to users who can enter the parameters defined for the workflow and get a result displayed back in the browser. An API also allows “headless” execution of workflows. With the full suite companies can write new nodes and wrap legacy software, then allow power users to develop templates and shared workflows, provide this to scientists or others who adapt and use these workflows as well as managers who just run pre-configured ones.
The company is shifting from selling directly to life science and a small number of other companies to an approach based on partner channels with domain expertise. Target markets are obviously life science as well as data mining/predictive analytics (customer intelligence especially). KNIME’s value proposition is in its openness as well as the ability to reduce the cost for existing heavy users of analytic workbenches while also targeting smaller companies that can’t afford the large vendors.
Besides the partnerships with Pervasive and BIRT noted, KNIME has a strong partnership with Zementis for PMML execution in database and on servers and with Dymatrix, a Europe-based partner offering model monitoring and automated updates.
KNIME will be one of the vendors listed in the forthcoming report on Decision Management Systems platform technologies.