A reader of my company’s newsletter recently emailed me and asked me if I could

highlight the difference in how one approaches building a predictive model when compared to more “traditional” descriptive models. And why the approaches are different.

He went on to say that he had

a colleague who insists that developing aggregate statistical models is critical to develop predictive models. I think the process is fundamentally different… the approach to building predictive models can actually be hurt if you “try to explain why” versus prove that you can come up with a good prediction that you can act on.

To answer this I reached out to my analytical brain trust – my friends Dean Abbott, John Elder and Eric Siegel.

Dean responded by saying in statistics the mindset is that “the model is king” where in predictive analytics “the data is king”. He went on to say

When the model is king, the typical procedure is that we first have a hypothesis about which fields will be useful and what form of a model is the true form. For the model-centric folks, it’s as if there is a model in the ether that we as modelers must find, and if we get coefficients in the model “wrong”, or if the model errors are “wrong”, we have to rebuild the model to get it right, which may mean transforming inputs and or outputs so that we conform to the model assumptions and model form. Of course, this is important if your model must

explainthe behavior.On the other hand, for many data miners predicting the target variable accurately is paramount. The models are non-parametric and distribution-free so that we don’t care about model forms. We won’t have to explain preciselywhyindividuals behave as they do so long as we can explainhowthey will behave.

- Statistics and data mining have common aims in that both are concerned with discovering structure in data
- But data mining is not a subset of statistics (despite the opinion of some statisticians to the contrary) nor is statistics part of mathematics
- Both data mining and statistics rely on math however
- Because statistics has a strong historical tie to mathematics there is a tendency for statisticians to require a proof
- Because statistics techniques grew up working on samples and subsets, statistics tends to focus a lot of energy in how to extrapolate or infer about a population from a small sample
- In contrast data miners are often awash in data – where a statistician may have 1,000 points a data miner may be working with hundreds of millions of transactions
- “Model” means something different to each group – to a statistician a model is something that explains relationships in the data where to a data miner or predictive modeler it means something that explains how to combined data elements to get a useful result

So I guess the key thing is that for a predictive model to be useful we don’t need to understand WHY it is useful only HOW it is useful – we must understand how we can use it to make better decisions but we don’t need an explanation of how it works. Predictive analytic models may be more or less explicable (with a decision tree it is easy to see how the result was achieved, less so with a neural network) but we don’t need a real-world explanation of why a model has a particular coefficient, say, or why the split is at a particular value.

Eric gave a great summary comment:

Since a predictive model’s objective is nice and clear — it has a specific prediction goal such as “Will this customer defect?” – its performance and value can be measured without opening the can-of-worms “try to explain why”- causality. Despite this, some kinds of predictive models can be transparent, e.g., composed of business rules that may be broadly “understood”, even if causality is not conclusive.

People in Texas buy this product more.Easy to understand even if you don’t know why it is true.

Hope that helps.

Absolutely agree with the viewpoint that getting a useful result out of data is the primary goal of Predictive Analytics – regardless of the model. An interesting subset is one where a Predictive Analytic deploys a statistical model to predict – for example say a Demand Forecasting Predictive analytic. What happens is that when the Predictive Model starts to go astray for whatever reason over time, familiarity with the Statistical Model behind the analytic comes into play – what causal factors are modeled incorrectly etc. Now in a true neural network the model should perpetually adjust itself to course correct but I find many predictive models are “hard coded” to a certain extent to conform to a business’ unique beliefs and do require revisiting periodically. At such times transparency in how the model actually works is handy…

Great topic! I think predictive analytics in spirit is truer to George Box’s original comment on “all models are wrong, some are useful”. The data deluge forces PA to focus on finding a solution that is useful, rather than attempting to find general conclusions which is what i suspect many modelers are fundamentally after.

Kedar:

Your comments on models going astray reminds me of the post on my blog by Will Dwinnell on this issue: http://abbottanalytics.blogspot.com/2011/03/analyzing-results-of-analysis.html. When predictions go astray, he examines inputs to the model to find those that are changing (he uses correlations).

One problem I find with model inputs is that if they aren’t “indexed” in some way, they do go astray naturally. I prefer to use inputs that are self-correcting in some way, such as by using percentiles or deviations from the current norm, but of course this all depends on the kind of model you are building.

Jerome Friedman makes some interesting comments on the broader contrast between data mining and statistics in his essay, Data Mining and Statistics: What’s the Connection?.

Will –

Enjoyed reading the essay you mentioned. Even though it is dated 1997, it makes very smart observations about the emergence of Data Mining and high performance computing. Not sur eif the “hype” component has gotten weaker or stronger since then. I do feel as there has been a sudden explosion of DM companies in the last 2 years almost giving a sense of a “bubble” in this industry…Friedman’s comment rings true “The largest profits are made by selling tools to the miners, rather than in doing the actual mining”

You are absolutely right about the bubble now in the industry. I would quibble though with Friedman’s assessment now. In fact, in some ways, it’s the opposite now. In courses and workshops I teach, most of those who come are doing data mining, but with tools like Excel writing their own SQL code rather than data mining tools. They would benefit greatly from even low-cost data mining tools like R, RapidMiner (free), or sub-$1000 tools like JMP, mind you, but are improving their business decisions even without the tools.

There are of course various subspecies of statistician and they inhabit different niches (ranging from the data-deserted areas of mathmatical statistics to the tropical data forests inhabited by frequentist pygmies hunting monkees.) and there is pretty well no difference between the forest dwellers and data miners.

The true difference is that statisticians study variability and how to take it into account properly in an analysis. Variability and bias. Bias (ie consistently incorrect results) is a product of bad experiments / data or procedures.

The insight from statistics is that there are no ‘pure algorithmns’ they all act on conceptual samples and they generate something even when there are no effects. Statistics studies how to use such results and evaluate them.

So in my view the concept that data mining is somehow ‘statistics free’ is a dangerous illusion.

Many non-parametric or distiribution free methods still make assumptions that the data are identically disributed and independent. Failing these assumptions does have consequences on the inferences.

James,

While I agree that predictive models are not necessarily statistical models, there is also good reason sometimes to understand the causes of the outcomes that we wish to predict. For example, I do not claim to know much about niacin, but last week it was reported in the media that they finally realized after a very large NIH experimental study that niacin did not lower the amount of “good cholesterol”. Previous correlational studies had shown a predictive association. A very large amount of consumer money has been spent on niacin and it also has side effects related to stroke, so understanding that it does not play a causal role is important here.

There are many similar examples in medicine and business where understanding the cause of behavior is fundamentally important. We believe that the beta-amyloid protein is the cause of the progression of Alzheimer’s, but unless we can intervene with a drug and show that stopping that protein from developing also stops Alzheimer’s, we will not be able to reap that benefit for society.

In my experience with business models, there are many classic cases where a business was losing a large number of customers – so it was easy to predict that their business would be sliding downward – but they wanted to know why. Was it due to poor loyalty? Poor Products? Poor Service? Their partner’s business declining? The introduction of a new competitive product? For example, in 1998 a large international brewer lost half of their market in Japan and they came to me and asked why? It turned out to be a simple causal reason, but it was easy to predict that their business was heading south fast unless something was done quickly. The causal model is the gold standard. It is just much more difficult to get causal understanding. In many cases though, I agree that a predictive model may be perfectly fine. In many cases like in marketing applications though, my experience has been that causal understanding is fundamentally important because it tells decision makers what products they should make or why people are not buying their products.

Dan

Thanks for the thoughtful reply. I agree that casual relationships can be both powerful and necessary. That said most predictive models need to be explicable so that business people can see what is driving the model more than they need to be understood from a causal perspective.

DaveG brought out some important considerations that are pretty unique from statistical perspective.

There is nothing wrong to borrow from any science to apply conceptual and experiential thinking that consistently explains a phenomenon. Statistical/Mathematical/Computational/Econometric sciences all have a role in the business of prediction and ensuing application.

That is the reason why Predictive Analytics is a science on its own.

All statistics is not logistic regression or even statistical modeling, like all predictive analytics is not statistics. The moment one understands the concept of random error even if they are not appreciative of it, they are in the statistical sciences.

Statistical or not we need proof for anything to believe that the process will always work consistently. Working with predictive models, I see how the computer scientists struggle to establish the superiority of one algorithm over another only because they do not understand the type I error and type II error and start rank algorithms on the basis of predictive error however small the differences in prediction error is. In the end not really able to say what matters and come empty handed; talk about uncertainty. Statistics is a science of making certainty statements in the world of uncertain phenomenon (C.R.Rao – Congressional Medal of Science Laureate), albeit their need to have additional language in stating the uncertain statements in a tricky certainty looking statement. It is eye opening to see how people end up arguing about random phenomena because their type I and type II errors are so wide in the tail end of the decision tree and/or different people have different bounds for them node to node, with out knowing the differences. This scientific culture lead to famous decisions like how Netflix decided to identify first and second winner.

The advances in data mining computing made what was statistically not possible before, through the concept of data mining. If you innocently use million observations with out properly sampling to build a decision tree, almost all the variable you use in your predictive equation will become significant – there are lot more false positive explanation over the years in explaining marketing models only because we do not know where to draw the line regarding significance/importance of a variable in our predictive models blindly using 100 variable to 200 variable models. This is great if you are predicting using “predictive models” and will pass the science, but one important area that is not fully utilized is what is called “insight models”, as against the predictive models.

Over the years I come to appreciate the importance of balancing these two types of models for any given predictive situation, especially in a world where proliferation of data length(millions) and number of data elements (thousands) have become common and either there is too much correlation among a lot of variables or too little structure in many of the elements.

Now here is the kicker: For every predictive model there is an insight model which performs as good as the predictive model.

You may see for an updated version of my notes in my blog http://predictive-models.blogspot.com. Mr Taylor, appreciate the opportunity of Keeping this addendum as part of my full response. Thanks – nethraS