Predictive Models are not Statistical Models

April 11, 2011

in Advanced Analyitcs, Decision Management

Share

A reader of my company’s newsletter recently emailed me and asked me if I could

highlight the difference in how one approaches building a predictive model when compared to more “traditional” descriptive models. And why the approaches are different.

He went on to say that he had

a colleague who insists that developing aggregate statistical models is critical to develop predictive models. I think the process is fundamentally different… the approach to building predictive models can actually be hurt if you “try to explain why” versus prove that you can come up with a good prediction that you can act on.

To answer this I reached out to my analytical brain trust – my friends Dean Abbott, John Elder and Eric Siegel.

Dean responded by saying in statistics the mindset is that “the model is king” where in predictive analytics “the data is king”.  He went on to say

When the model is king, the typical procedure is that we first have a hypothesis about which fields will be useful and what form of a model is the true form. For the model-centric folks, it’s as if there is a model in the ether that we as modelers must find, and if we get coefficients in the  model “wrong”, or if the model errors are “wrong”, we have to rebuild the model to get it right, which may mean transforming inputs and or outputs so that we conform to the model assumptions and model form. Of course, this is important if your model must explain the behavior.

On the other hand, for many data miners predicting the target variable accurately is paramount. The models are non-parametric and distribution-free so that we don’t care about model forms. We won’t have to explain precisely why individuals behave as they do so long as we can explain how they will behave.
Dean provided a link to a great article by David Hand that summarizes the two disciplines – http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.5404&rep=rep1&type=pdf. Checking this out I came away with some key points:
  1. Statistics and data mining have common aims in that both are concerned with discovering structure in data
  2. But data mining is not a subset of statistics (despite the opinion of some statisticians to the contrary) nor is statistics part of mathematics
  3. Both data mining and statistics rely on math however
  4. Because statistics has a strong historical tie to mathematics there is a tendency for statisticians to require a proof
  5. Because statistics techniques grew up working on samples and subsets, statistics tends to focus a lot of energy in how to extrapolate or infer about a population from a small sample
  6. In contrast data miners are often awash in data – where a statistician may have 1,000 points a data miner may be working with hundreds of millions of transactions
  7. “Model” means something different to each group – to a statistician a model is something that explains relationships in the data where to a data miner or predictive modeler it means something that explains how to combined data elements to get a useful result

So I guess the key thing is that for a predictive model to be useful we don’t need to understand WHY it is useful only HOW it is useful – we must understand how we can use it to make better decisions but we don’t need an explanation of how it works. Predictive analytic models may be more or less explicable (with a decision tree it is easy to see how the result was achieved, less so with a neural network) but we don’t need a real-world explanation of why a model has a particular coefficient, say, or why the split is at a particular value.

Eric gave a great summary comment:

Since a predictive model’s objective is nice and clear — it has a specific prediction goal such as “Will this customer defect?” – its performance and value can be measured without opening the can-of-worms “try to explain why”- causality. Despite this, some kinds of predictive models can be transparent, e.g., composed of business rules that may be broadly “understood”, even if causality is not conclusive. People in Texas buy this product more. Easy to understand even if you don’t know why it is true.

Hope that helps.

Share

Previous post:

Next post: