I started with an interesting breakfast this morning with Ian Ayres and Larry Rosenberger. Ian is the author of Super Crunchers (reviewed here in the wiki) and Larry is a research fellow and ex-CEO of Fair Isaac. The two of them were great conversationalists and we ranged across randomized testing (adaptive control), the power of analytic decision making, the role of judgmental decisions in an era of massive amounts of data and much more. Mark Green, CEO of Fair Isaac, went first (blogged here) and now it’s Ian.
Ian starts off talking about the movie The Interpreter – which was the subject of an Epagogix prediction that the move would take $69M (it actually took $73M) using only a neural network and without actually seeing the movie. They even extended this by predicting how adding a sidekick or focusing more on New York would have boosted the take! This is, he says, the New Wisdom of Crowds. He talks next about Pandora and how it’s algorithms drive new selections and new music based on what lots of data tells you. The speed, size and scale of data being used are new rather than the techniques.
One of the challenges with this kind of thinking is that it tends to push data in front of people’s judgment. For instance, Ashenfelter‘s equation for predicting the quality of wine from winter rainfall, growing temperature and harvest rainfall is very accurate yet very unpopular with wine critics! Everyone tends to think that what they do is too subtle to be replaced with analytics but almost everyone is wrong!
Data mining is becoming more and more accessible. More data, better tools for handling this data and easier access to the techniques (such as data mining inside Excel). These techniques also allow you to assess the accuracy of the prediction – to give a confidence interval. An example is Farecast, which mines available data on air fares to tell you which ones will rise and which ones fall before your flight with a measure of confidence in the prediction. The world is moving from descriptive analytics and dashboards to predictive analytics.
Ian talked about daring to diversify – the next book he is going to write around using data mining (super crunching) to manage retirement investments. People are good at diversifying across asset types but poor at diversifying across time. He suggests that you could use leveraged investment when you are young to diversify over time and he has been crunching the numbers on this – mortgaging your retirement. Even in a world of changing risk you can manage risk using data mining.
Randomized trials are also critical. It is not enough to do analytics, you must create your own data by running randomized trials so you gather data about alternatives that seem, perhaps, worse. This mindset is the drive behind champion/challenger testing and adaptive control (wiki). Particularly when combined with the power of the Internet to display multiple alternatives and collect large volumes of data instantly, this is a powerful technique. A website could see if context-driven content was more effective than standard content and the number of people involved means that the results can be compared without worrying about who, exactly, saw each one. Google AdWords, for instance, will allow this kind of randomized testing as the ads run and Google is now offering a free web optimizer. Web-based testing allows you to test hypotheses quickly and effectively and the data can then be mined for segments to see how different segments behaved.
Overall he likes the combination of data mining of historical data, randomized testing to create new data and mining of the data collected by this randomized testing. If you are not using randomized testing and predictive analytics (regressions) then his presumption is that you are screwing up. He gave numerous examples of ways to do this in everything from quitting smoking to professional sports to government welfare programs (where the randomized testing showed that kids grew faster where it was applied).
One hard to stomach fact is how much better machine predictions are than judgmental ones in study after study. Lots of these are documented in the book but again and again, experts come out poorly if any kind of statistical prediction is possible. People do poorly because we tend to put the wrong weights on important variables – even when we know something is important we tend to undercall just how important it is. Now, over-relying on analytics can go wrong too but data mining is going to continue to win out. People can provide useful information that improves an analytic model but the model is going to drive because it is simply more effective. Humans’ ability to come up with hypotheses, intuit new ideas is a critical one but now the data exists and can be used to verify these hunches.