Syndicated from Smart Data Collective
This session was a panel discussion on the cross-industry challenges and solutions in predictive analytics. Panel sessions are tough to blog so here are some highlights.
- More and more analysts are having to do their own extract, transform, load work to access databases so having modeling tools that handle this, rather than requiring IT to do it, is helpful.
- It’s really important to match how people work to how they can work with predictive models – incorporate the predictive scores into decisions they already make. Use them to prioritize or assign, for instance, to start with.
- Experience in one industry, like credit card fraud, may not play well in another industry and techniques used as well as the way success is described/reported must vary appropriately.
- Never underestimate the problems in data or the value of cleaning it up before modeling. Clean, valid data is hugely valuable and doing a good job of linking and matching records is particularly important.
- Can be an over-focus on algorithm selection when simple, structured, disciplined techniques will often work as well. Not only that but the hunt for new techniques causes problems with overfitting and with lack of validation rigor.
- Outliers and extreme events can really throw off measures – if a large outlier is predicted well then it can make the model look more predictive than it really is.
- Essential to challenge your assumptions. Don’t get caught out by a single failed assumption.
- Putting models to work – putting them into decisions – requires organizational change and management to make sure people aren’t threatened by it and understand what to do it. Essential to wrap business rules around the models and make it work in a business context.
- Always be suspicious of any model you build – challenge it, disprove it, try and uncover problems. Why, why, why.
- Implicit assumptions can be tough to find and most are found when a test fails. When a test fails therefore, figure out why as there could be a bad assumption in there that caused the failure.