I first got a briefing on Dulles Research Carolina back in 2010 and I recently got an update. Dulles Research was founded in 2005 and came to market with Carolina, their SAS to Java convertor, in 2009. A U.S. patent was awarded to Dulles last year for the SAS to Java conversion process.
At its core, Carolina takes all aspects of Base SAS (Macros, Data Steps, PROCs, SAS data sets) and generates Java code from them. Dulles Research offer an execution engine that runs SAS programs in batch as a Java executable (Carolina), a Java generator that creates a JAR for processing a single record in Java (Carolina for Integration) and generators for Hadoop MapReduce jobs and database UDFs (Carolina for Hadoop and Carolina for In-Database). All the products rely on the same, patented, core Java generation technology.
Carolina is aiming at one of the key challenges of operationalizing analytics by helping organizations avoid the need to re-write analytic models for deployment. Many Base SAS programs are executed in batch but companies that want to use these programs to score a single record, a single customer say, often find themselves re-writing the Base SAS in a development language. The ability to generate Java from SAS code allows customers to essentially deploy a SAS model to a Java server and execute it real-time as part of a Java deployment. In addition, it allows customers to execute SAS models on servers that don’t have SAS licenses. Using Carolina allows SAS customers to continue to develop in SAS, continue to use all their existing assets, and get deployment of their models to an open enterprise environment. This is particularly powerful as companies move to real-time scoring where they may already be using a Java server or Java-based rules engine, for instance.
In addition, Dulles Research sees more interest in running SAS models on Hadoop/Big Data infrastructure. Running the generated Java as MapReduce jobs or UDFs in-database generates dramatic performance gains. Because Carolina supports the whole Base SAS environment, this allows an organization to get in-Hadoop, or in-database, execution of their existing models. Carolina for Hadoop generates a mix of Java code and HiveQL queries for execution as MapReduce jobs. This allows parallel execution of Base SAS models against Hadoop data. Carolina for Hadoop can also covert SAS datasets to HDFS. Carolina for Hadoop deployments typically involve some tweaking of the analytic model to maximize the performance on Hadoop due to the differences between the way SAS and MapReduce process data and Carolina for Hadoop identifies opportunities for this kind of tweaking as it generates the code. Carolina for In-Database takes a SAS Datastep program and generates JAR files for either an Oracle or a Teradata UDF. The engine generates the necessary SQL scripts to create the UDFs and invoke them.
Dulles Carolina is sold both directly to large companies and trough partners such as Experian, FICO and Provenir. Dulles Research is one of the vendors in our Decision Management Systems Platform Technologies Report and you can get more information on Carolina here.
Comments on this entry are closed.