≡ Menu

Predictive modeling and today’s growing data challenges


Syndicated from Smart Data Collective

Matt Kramer of Axciom and Jun Zhong of Wells Fargo discussed some of the challenges presented by data in the context of predictive models. Matt began by discussing some of the reasons for modeling – reducing costs, avoiding simplistic decisioning, predict attritition, optimize marketing spend etc. Predictive models help by ranking based on probabilities.

Creating a suitable modeling sample requires enough records (10,000-15,000) that are recent enough to be relevant (especially when times are changing fast). Axciom’s data shows that 1,200 or more instances of what it is you are trying to predict gives you a robust model. Focusing on mature/complete data is important, however, and this has to be balanced against timeliness. Appending internal or external data can make a big difference to quality of models.These samples must be checked carefully for problems. Sample bias can be damaging, so using a model on data that looks like the sample is key.

Jun discussed some marketing business issues related to data. Their two main challenges are to recognize likely purchasers and then to recognize likely purchasers that will be influenced by an action or treatment – who is proactive (will buy anyway) and who is reactive (who will buy only in response).

Propensity to Purchase models predict who is likely to buy, allowing the offers to be made only to those likely to buy. A second model, Propensity to Influence, predicts how likely someone will be influenced by a specific promotion.

To develop this second model you need to have both a treatment and a control group to see what response you get. This allows you to find the buyers in both groups and then to see what kind of people did not buy in the control group but did buy in the treatment group – these are those who only purchased because of the treatment. From this you can build the propensity to influence model. Building these models requires all the usual data cleansing, transformation, initial and ongoing validation etc.

Matt came back to talk about some challenges he sees. The ability to demonstrate incremental value and to persuade business users that modeling is necessary – that just specifying the rules explicitly would not be as useful. There are also growing restrictions on the use of certain data as a result of legal worries.

  • As other speakers have noted, it is really important to have clean control groups so that comparisons are both real and believable. If you can’t show real business benefits in terms of total value/total cost then the “lift” of the model doesn’t matter. It can be hard for companies to hold people out to keep a really clean control group – want to market to everyone.
  • Criteria – rules-  tend to be easier to understand and implement. Showing the value of the model is critical. Models tend to produce more optimal results and generally does not exclude whole groups but rather ranks them based on weighted attributes. Matt was trying to make it seem like these are either / or but of course they are not – they can and should be used in conjunction. Models can help make decision criteria better.
  • Restrictions on use of certain attributes are designed to prevent discrimination. Not much you can do about this except keep working on the modeling to find other ways to build the model. One example he gave had 55% of attributes removed dropping the lift from 3x to 1.2x. New attributes and more careful segmentation rebuilt the lift somewhat but not completely.

Not much new in this session, sadly.