≡ Menu

New Rexer Analytics survey


Syndicated from BeyeNetwork

I recently got the survey results from the annual data mining survey that Karl Rexer of Rexer Analytics runs. You can get the summary here or the full results from Karl but here are my thoughts:

  • Data mining is everywhere. The most cited areas are CRM / Marketing and Financial Services with a big lead over Retail and Telecom. Healthcare did poorly, no surprise.
  • Data miners most frequently work in are Marketing & Sales, Research & Development, Risk.
  • Data miners’ most commonly used algorithms are regression, decision trees, and cluster analysis – way ahead of the others. Text mining was back in the pack, interesting given the amount of text mining coming presentations we saw at Predictive Analytics World.
  • Half of data miners say their results are helping to drive operational processes.
    This is encouraging as I think this is by far the most effective way to use predictive analytics.
  • Batch scoring with the results getting stored in the database came top of deployment approaches at 30% with interactive real-time scoring at 21% and 16% putting the model into some overall softwareproject.
  • 60% of respondents say the results of their modeling are deployed always or most of the time.
    This is still not good enough – nearly half are not getting deployed.
  • The top challenges facing data miners are dirty data, explaining data mining to others, and difficult access to data. However, in 2009 fewer data miners listed data quality and data access as challenges than in the previous year. 34% also have problems with IT.
  • Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners.

There’s lots more in the survey so go get it and read it.


Comments on this entry are closed.

  • Hans Gilde March 30, 2010, 6:48 am

    Not sure I agree about 60% deployment being bad; I think more digging would be needed to make a conclusion here. It’s important to manage expectations in this area, so as not to give the false impression that data mining is a magical tonic that will quickly turn your data into actionable information.

    Surely if 40% of clearly actionable information were ignored, this would be very strange. But good analytics processes are iterative – create a goal, gather data, analyze the data, then learn that additional data must be gathered or the goal refined, repeat. In this process, are we worried that the intermediary models are not deployed?

    Similarly with exploratory projects – if two statisticians explore data in two directions, maybe only one produces the finally chosen model. 50% of these two statisticians might correctly report that all their hard work was not deployed.