≡ Menu

In-database analytics – a white paper

Share

Syndicated from BeyeNetwork

My co-author on Smart (Enough) Systems, Neil Raden, has written a great white paper on in-database analytics that is available from Sybase – Analytics from the start. This paper introduces the key concepts, discusses some of the key issues (our book contains more tips in this area) and describes some strong case examples. Well worth a read. As Neil says:

Advanced analytics will be adopted by most organizations and attain the status of “must have.” While the majority of people in organizations will not become quantitative experts and modelers, the affect of predictive models will be felt across the organization and beyond. They already are. It would be wise to take steps now, and a good first step is to begin evaluating technology solutions that will be suitable for the development and implementation of analytics. From a technology perspective, one clear requirement is an analytic engine embedded in your analytical database technology.

The approach Neil describes is one we see more and more as in-database and
in-warehouse analytics become more common. This particular paper talks about the Fuzzy Logix libraries embedded in Sybase IQ but . Fuzzy Logix is one of the sponsors (with SAS, Oracle, Adaptive and Aha!) of the operational analytics research I am doing for BeyeNetwork. Look for it on the BeyeResearch site in a couple of months and, meanwhile, participate by taking the survey.

Share

Comments on this entry are closed.

  • Josh Hemann February 25, 2010, 9:39 am

    In-database analytics will certainly be a growing area of interest as businesses get swamped with data yet need to make real-time decisions, so it’s nice to see related posts and white papers.

    One aspect that deserves more attention is how teams actually deploy in-database analytics. I have a little bit of experience working with people using the Sybase RAP product, for which Sybase IQ is at the core. Along with the Fuzzy Logix libraries, Sybase RAP includes functions from Visual Numerics for doing in-database time series analysis. In my scenario, I was the stat guy who knew about time series modeling but I did not know much about doing complex SQL queries and calling UDFs. Luckily, I worked with someone who did, but did not know much about time series analysis.

    I was able to prototype an analysis in Python calling the exact same functions available to my DB colleague in Sybase. So, when I had iterated to an acceptable analysis I simply handed my Python function calls to him. The API he had was of course a bit different, but it was a one-to-one mapping of functionality using the exact same underlying algorithms, so we could be confident that the analysis I had done was actually repeatable once deployed in the database.

    How do others deal with similar scenarios? Analytical software work strikes me as different than other software development processes because there are often two different groups of people (statisticians/modelers vs developers and DB admins) using different tools (scripting languages vs C, Java, SQL etc). So, it can be especially challenging to identify a useful modeling approach that answers some business need, and then actually deploy that same analysis in an existing enterprise system. When people use the decisioning tools discussed on this blog, is there typically any prototypical work that happens before putting the tools “into production”, and are the same tools used for both aspects?

    Thanks in advance!

  • Will Dwinnell February 25, 2010, 11:56 am

    “While the majority of people in organizations will not become quantitative experts and modelers, the affect of predictive models will be felt across the organization and beyond. They already are. It would be wise to take steps now, and a good first step is to begin evaluating technology solutions that will be suitable for the development and implementation of analytics.”

    Industry’s experience with spreadsheets, database querying tools and OLAP suggests that the much larger lack is in the human department. Decades of experience demonstrate clearly that gee-whiz technologies like this are ineffective (and yes, downright dangerous) when placed in the hands of anyone without the expertise to use them.

    When a business analyst runs excitedly into the room, shouting that “Two out of three of our customers use brand X!”, does he mean 67% of thousands of customers, or literally 2 out of 3? Statistical significance is only one of the pitfalls awaiting naive users.

    There are plenty of analytical tools available today, on a variety platforms, which deliver fantastic promise. The successful company will hire or train people to be skilled enough to fulfill that promise.