SAS has been rolling out its in-database and in-warehouse analytics over the last few months and I thought it was time to get an update. SAS sees its in-database strategy as one way to address some of the challenges in putting analytics to work, especially in high-volume operational environments. In-database analytics can reduce the time and labor to develop and deploy analytic applications, reduce unnecessary data movement and latency and better utilize companies’ increasingly powerful database and data warehouse infrastructure. Generally putting analytic models into production involves analytic data preparation, the modeling itself and then scoring of data in the production environment. In the in-database scenario, SAS sees most of the analytic data preparation moving to the database/warehouse along with some of the modeling tasks themselves and all of the actual scoring. Essentially the database increasingly becomes the next host for SAS deployment.
Obviously SAS does a lot of integration work with databases to provide connectivity using native access engines. This level of support is offered for all the major RDBMS vendors, the leading hardware and software warehouse appliances as well as several columnar databases. Through these native access engines SAS is also translating Base SAS procedures into SQL for execution in the database. SAS also provides implicit SQL passthru (mapping SAS to vendor-specific SQL), ensures that the relevant SAS/ACCESS Engine provides rapid data access.
SAS pushes some Base SAS procedures into the Teradata database (e.g. FREQ, SUMMARY, MEANS, and the RANK procedures) and integrates the SAS Format library within the database so that it can support processing of SAS formats. SAS is expanding the support for Base SAS Procedures to include IBM DB2 and Oracle in the upcoming SAS 9.2 release.
The in-database integration effort is divided into baseline and advanced capabilities. The baseline integration allows SAS formats to be used inside the database and integrates the SAS scoring code generated from SAS Enterprise Miner. The more advanced integration adds support for SAS analytical processes (data preparation, data exploration and analytic modeling) running end-to-end inside the database.
Baseline in-database support (i.e. SAS Scoring Accelerator) is currently provided for Teradata, as part of the SAS and Teradata partnership, with support for Netezza and IBM DB2 in limited availability. SAS Scoring Accelerator enables customers to translate models created in SAS Enterprise Miner into database-specific functions to be deployed and then executed directly within the database environment. SAS Model Manager provides integration with the SAS Scoring Accelerator for publishing and validating SAS Enterprise Miner models as in-database scoring functions for organizations that need to manage a large repository of models throughout the lifecycle.
The advanced in-database support is currently limited to Teradata. The SAS Analytic Accelerator for Teradata embeds a select set of SAS analytic procedures inside the Teradata warehouse. This ensures that a significant proportion of analytic data preparation tasks, as well as some modeling tasks that require significant resources, are performed in the data warehouse itself. These tasks remain integrated with the analytic steps performed outside the warehouse, managed by an overall SAS Enterprise Miner or SAS/STAT process, but take advantage of the performance and scalability of Teradata while eliminating much of the data movement required in model development.
Depending on customer demand and partnership dynamics SAS has made it clear that it expects to expand the range of capabilities available in the databases / data warehouses it supports and that it will add more database / data warehouse vendors to the list in order to offer both baseline and advanced in-database.