First Look – SQLStream

December 3, 2012

in Advanced Analyitcs, BI, Product News

Share

The move over recent years has been towards increasingly distributed processing of data, both in terms of the underlying model and in terms of the processing architectures available. More and more of this data is also streaming and SQLstream is focused on effective access to streaming big data. Founded in 2003 and headquartered in San Francisco SQLstream is focused on delivering a streaming / continuous query platform based on ISO/ANSI SQL. The platform is designed to instantly respond to Big Data in an operational context. This means collecting sensor, system and service data as real-time streams, integrating this data, analyzing it and delivering dashboards or applications that integrate new answers from this analysis.

Key challenges that SQLstream targets are

  • Data explosion (too expensive to analyze volume of real-time data)
  • Agility (too slow to respond to changing requirements)
  • Complexity (too hard to manage real-time analytic applications)

Between 2004 and 2012 they focused on delivering streaming big data applications in some specific verticals and are moving now into delivering a streaming big data management platform. The platform allows you to add and remove feeds and applications that use those feeds without dealing with many to many point connections between sources and applications and it supports streaming analytics, event correlation, alerts and alarms etc.

The products are s-Server (core platform and integration layer running on Linux), s-Cloud (s-Server on EC-2), s-Transport ( module for geospatial analytics, movement and clustering over GPS and other location-based data) and s-Analyzer (building reports and queries). Sources can include Google Big Query, Hadoop as well as traditional data warehouses and information systems that product real time data. SQLstream parallelizes the ETL process so that collection, cleaning, aggregation and analysis are handled for multiple streams in parallel and continuously. The logical extension, if you like, of drip-updates. This is achieved by the execution of a fine grained directed acyclic graph dataflow using pipelining and superscalar processing.

Queries all run continuously and use the WINDOW construct from the SQL standard, and a STREAM keyword for streaming queries (the STREAM keyword is used to maintain SQL standard compliance so that any SQL query without STREAM executes as a normal static query). For instance this allows you to return a stream of data with records joined and analyzed over a window of time such as the preceding one hour. These can be based on simple conditions or can use more complex constructs such as the rate at which standard deviations vary (showing accelerations in errors, for instance). In either case the queries are highly parallelized and are designed to run fast on commodity hardware. In addition, the streams can be joined to stored data where that makes sense.

SQLstream does not develop analytic models itself but analytic models can be developed on stored historical data and then deployed as SQL queries in the stream. For most models you can generate standard SQL from your data mining workbench and then minor changes would allow you to connect these queries to the streams rather than to tables. Because of their support for standard SQL and because these are not compiled as programs, you can rapidly and easily add models and other queries directly to the stream without bringing it down. For other kinds of models you could also use their ability to extend SQL using the standard extension mechanisms and bring other kinds of processing into the environment.

SQLstream is not a transactional database but is designed to process data without side effects. This means it can be highly distributed and parallel as they just transform incoming data to outgoing data. Users can add and remove data sources without stopping the server and use SQL to create a familiar and easy to adopt environment. The basic idea is to square the circle – allowing a high level declarative language (SQL) in a continuous, high performance environment where you might otherwise have written low-level code.

Further information is at www.sqlstream.com.

Share

Previous post:

Next post: