SAP has recently announcement some new predictive analytic offerings. At the core are in-database predictive analytics and R integration based on SAP HANA and a new predictive modeling tool, SAP Predictive Analysis. The latter is aimed to close the gap between “pure” data miners/modelers and the business analysts that might typically use tools like SAP Visual Intelligence.
SAP Predictive Analysis is a Windows client application that is integrated in the SAP Visual Intelligence client. The basic process involves connecting to data sources, prepare data, analyze data and visualize the results. Analysts begin by creating a new document and connecting to SQL databases, SAP HANA, Hadoop (via HANA), Excel or a BusinessObjects Universe (SAP BW to follow). Once the data is loaded the user can work in the Prepare tab where data can be automatically enriched by detecting date fields, fields with specific values etc. There is a wide range of visualization tools available to investigate the data and any metadata available is also used to, for instance, create hierarchies etc.
Once data is prepared the analyst uses the Predict tab to work on the data. Data sources and other elements are applied in a classic data mining process diagram with different nodes for different elements. In the designer view the document created shows as a node and the user can the add data preparation, algorithm and data writing nodes. A reasonable though not extensive set of nodes are already available and can be dropped on and configured. Processing nodes include various analytic algorithms – regression, outliers, time-series, decision-trees, neural network, clustering or association algorithms. Some of these are custom, developed by SAP, others are R-based.
At any point the user can flip from the designer to a results tab where the data identified earlier can be analyzed with scatter plots, variable graphs etc. Once algorithms are being applied then additional charts and analysis tools allow the user to review the results of the algorithm (showing members in clusters, distance between clusters etc after applying K-Means clustering for instance). Multiple algorithms can be run against the data set and the results including in graphs, compared etc.
When internally developed nodes are used they can be run on HANA, in memory, using the HANA Predictive Analytics Library. Other R routines can be run in memory on HANA though the performance gain is not as dramatic – it is still better than typical SQL driven queries as HANA QL is used. Without access to HANA the algorithms run locally or using a standard R server. The tool is currently limited to a specific set of R routines though SAP is extending this. In addition as new algorithms are added to the HANA libraries they are also being added to the tool.
Finally the Share tab allows you to share data as a file, as a dataset to HANA or Explorer, send a visualization by email etc. The model itself can be saved as PMML and soon as SQL (that can be deployed to HANA as a stored procedure or as a calculated field in a BusinessObjects Universe).
You can get more information on SAP Predictive Analysis here: www.sap.com/predictive-analysis and SAP is one of the vendors in our Decision Management Systems Platform Technology Report.