SAS is upgrading its in-memory analytics products with SAS® Visual Statistics (forthcoming) and SAS® In-Memory Statistics for Hadoop. SAS In-Memory Statistics for Hadoop is available now and SAS Visual Statistics is going to be shipping in July of 2014.
SAS Visual Statistics is based on the SAS® LASR™ Analytic Server for in-memory processing and is aimed at both data exploration/discovery and predictive modeling. It will include SAS® Visual Analytics and likewise offers an interactive, drag and drop 100% web-based interface. It will run in single machine mode in distributed environments (Teradata, Greenplum and Hadoop, both Cloudera and Hortonworks) for scale.
Besides the SAS Visual Analytics capabilities, key techniques supported in the first release of SAS Visual Statistics include:
- k-means clustering for segmentation
- Classification and Predictions using logistic and linear regression, interactive decision trees, and Generalized Linear Models
- Plus support for data discovery
Data Scientists and Business Analysts can move directly from SAS Visual Analytics to SAS Visual Statistics and work collaboratively. The in-memory environment is designed to quickly discover relationships between variables, accelerate the model building process and iterate frequently. Building a model simply requires the candidate predictors and target variables to be dragged and dropped into the UI. Analytic models can be built immediately using various methods and against all of the data stored in memory. SAS Visual Statistics also allows you to build numerous models (by customer segment or product for instance)with one click or by dropping a variable on the palette. Variable ranking, visual exploration techniques (e.g. box plots, heat maps) and model comparison and assessment using lift charts, ROC charts, etc. are supported. Once developed, models can be generated as score code for deployment to various SAS products.
SAS In-Memory Statistics for Hadoop also leverages SAS in-memory technology but this time to provide an interactive programming environment based on Hadoop. It’s aimed at data scientists and data miners who like to program but who want to both use Hadoop and focus on exploration. It is based on LASR in-memory and supports data preparation, transformations, exploratory analysis, statistical modeling and machine learning as well as analytic model comparison and scoring. This is a broad based tool with lots of data manipulation, exploration and visualization capabilities. The modeling aspect supports a wide range of modeling techniques too. There is an external API and it also has the ability to produce score code.
SAS In-Memory Statistics for Hadoop supports multiple users accessing the same in-memory data set and works with data persisted in Hadoop. It supports a range of front-ends, including the browser-based SAS Studio. It is unambiguously aimed at a programmer – SAS code is the primary interface – but it works interactively allowing specific pieces of the code to be run immediately against potentially very large datasets stored in Hadoop. The results can be viewed right alongside the code and each piece of code is running against the same in-memory installation so responses are very fast even against very large datasets. This allows a modeler (or a group of modelers) to experiment with variables, models, calculated characteristics etc and get immediate feedback on them. All the various measures of a model are likewise calculated, allowing various models to be developed and compared. Models can be developed very quickly running against the data in memory and output options include score code and re-applying scores as usual.
You can get more details on these products here – SAS Visual Statistics and SAS In-Memory Statistics for Hadoop. SAS is one of the vendors in our Decision Management Systems Platform Technology Report.