≡ Menu

Real Time Big Data #TD3PI


Another Think Big employee came up to talk about real-time big data especially around event analytics. Real-time, he says, is generally something happening in a second or so not minutes or hours. Might be push or pull but what matters is the time from data in to data out. Real-time responses, he pointed out,  need not require real-time analytical processing. A rendering engine for instance might pick up stored recommendations built from a statistical model that is not rebuilt in real-time but simply scored in real-time for instance.

He discussed the so-called Lambda architecture where there are separate batch and streaming serving that are combined to deliver a real-time but also stable/reliable answer. Think Big he says prefer a Mu architecture that feeds data into both batch and streaming processing environments but not for reliability, but because some things are better done in batch, some in streaming.

Real-time infrastructure is particularly used for even analytics where events flow in from various sources and have to drive both reporting, dashboarding and actual analytical decision-making. Think Big take the events as they come in and store them both in an HBase event repository and an HDFS batch store. The event store handles event correlation and allows real-time up to the minute access while the batch store supports reporting for instance as well as building analytical models (that might then be applied to the event context as part of a personalization engine).

Think Big has released some of its work around the batch reporting element of this architecture as a dashboard engine for Hadoop. The events are loaded in and the engine uses Spark to pre-aggregate data in every way you can think of to drive reporting and dashboards. The engine is highly scalable and works with lots of potentially unstructured data. It supports dimensional drill down and various dashboard tools. It’s productized and supported but not a SKU – interesting hybrid.

A pretty standard Tableau demo follows except that it is being supported by the hadoop-based infrastructure Think Big developed rather than a Tableau server.


Comments on this entry are closed.