I last wrote about SQLstream back in 2012 and got an update from them recently. Recent news includes a partnership with Oracle and new performance benchmarks (against Hadoop Storm for instance), and their latest SQLstream 4.0 release. New 4.0 features include
- Performance optimization of the streaming integration layer and the UDX mechanism for SQL extensions
- Support for in-memory tables for integration with external reporting tools
- Real-time streaming web agent to take streaming data and update web apps – a REST interface
- Real-time visualization tools– s-Visualizer (enterprise users) and s-Dashboard (HTML5).
- Integration with Storm for deploying for continuous SQL queries in Storm topologies
- StreamApps – libraries of components for building streaming analytics for specific industry applications such as telecommunications
SQLstream describes themselves as a stream processor for operational intelligence. Operations for most companies happen in real-time. Increasingly the systems and machinery operating in companies generates data about what is happening also in real-time. Yet analytics, Business Intelligence, are generally done offline using some extract or summary of all this. Operational Intelligence, they say, is using the same analytical techniques but applying them in real-time to the data streaming continuously through the operational environment – the progression towards real-time action requires collecting and assembling data in real-time, doing analytics (especially predictive analytics) against this data in real-time and using this to drive decision making. SQLstream sees their sweet spot as being moderate to complex analytics against very fast moving data.
The core SQLstream environment is SQL-based (while also supporting Java as a first class language). SQLstream collects data using various push-based remote agents and access tools such as optimized JDBC connectors based on SQLstream’s streaming data protocol. Data can be pulled in as well using adapters, for instance to poll and process log files or updates from external databases. All of these collection mechanisms create real-time data streams that are processed continuously.
Streaming SQL queries are run against these records as records arrive. The nature of these queries and the length of the time windows over which data are processed, determines how many records are kept in memory. New records are added as they arrive, with older, unneeded records being dropped when they are no longer relevant. Streaming queries may also use Java to reach out to other systems in the form of User Defined Functions (UDXes). SQL stream runs its own scheduling and job management to optimize the throughput of this streaming environment. Because the platform is designed for streaming analysis there is no need to stop and re-start the server when queries or data loading mechanisms have to change.
SQLstream allows you to deploy predictive analytics as SQL or as Java code and apply this to the stream as it flows by. Characteristics, including those that summarize data over time, could be added to the queries too and fed into the model. In principle decision-making logic could be added too, certainly within the Java environment, so that decisions could be applied to the stream. Building the model itself would still need to be done in a traditional way, though one could use the in-memory tables and general SQL access to pull data from the streaming engine to use in modeling. In addition, updated data sets could be generated continuously also and fed in if you wanted to regularly update your model.
You can get more information on SQLstream here.
Comments on this entry are closed.
Great to see SQLStream are “still in the game” after the changes in the CEP market last year. I liked their description of “real time ETL” as one of their tool’s roles (where decisions may be made for data acceptance in-process).