One of the announcements at IBM’s World of Watson is of the new Watson Machine Learning Service. I got a chance to ask a few questions about this new capability. A couple of key elements emerged regarding the current platform and the immediate announcement.
First, some context. The Watson Data Platform (also announced at World of Watson) is designed to allow multiple roles to collaborate in developing advanced analytics. It provides a shared environment as well as capabilities such as annotations, versions, collaboration and edit history across all the various elements of the platform. One of the roles is the Data Scientist. Within the platform, the Data Scientist’s primary environment is a notebook metaphor (using the Jupyter notebook). The notebook environment in The Data Scientist Experience offers open source coding as well as some IBM extensions such as support for CPLEX.
The Watson Machine Learning Service is designed to both extend this notebook metaphor and allow a canvas to be used that is similar in style to SPSS Modeler – a more classic predictive analytic / data mining environment in which a set of nodes are linked together to define how an analytic model should be built. This more drag-and-drop environment will extend the ability to use these Machine Learning capabilities to audiences that are not programmers or data scientists. This canvas is a new web-based component. It leverages all the data integration and connection work done in the Watson Data Platform, allowing those developing analytic models to share the data ingestion and analysis being done on the platform. Initially the service and canvas offer Apache SparkML. Other open source algorithms as well as SPSS/IBM algorithms will be added moving forward.
Analytic models developed on the Watson ML service can be accessed throughout the Watson Data Platform, deployed as a Bluemix service with a REST API, or teams can generate a PMML model for deployment to other environments. When deployed, any pre-processing or characteristic generation defined in the canvas is included so that the service takes the raw data defined and completes all the analytic processing required. These deployment options ensure that analytic models developed are not limited to use in other analytic tools but can be deployed for use in systems and processes throughout the organization.