A few months back, Scott Adams posted a great Dilbert that I have been meaning to write about for a while (click on the image to see the original).
In the strip, Dilbert says “You don’t go to war with the data you need. You go to war with the data you have.”
Now Scott Adams was being funny but in fact there is a kernel of truth here. We come across many companies that are failing to apply data to their decision-making, delaying building predictive analytic models or postponing their adoption of machine learning because they don’t have the data they “need”. It’s not integrated enough, clean enough, precise enough or just not as good as it “will be soon”. This is a mistake. You should do as Dilbert advises, and “go to war with the data you have“.
The trick is to start with the decision you want to improve, rather than with the data. Understand the decision, model how you think you make that decision today, work with those who make the decision every day to capture your current approach. This decision making is possible with the data you have – it must be, as this is how you decide right now.
Now you can ask some interesting questions like:
- What would help you make this decision more accurately?
- Which pieces of the decision give you the most trouble?
- Where do you spend your time in this decision?
- Is the data you need to make this decision presented the way you use it in this decision?
- Which pieces of this decision are data analysis – places where you decide something about the data so you can base some other decision on that analysis?
Sometimes the answer to these questions will lead you to new data or identify that your data needs to be improved. If it does, at least you can show exactly WHY you need that new data and so calculate an ROI. But often it reveals that you need to use the data you have in different ways.
The biggest benefit comes from identifying possible predictive models. Because you know how the decision is made, you will be able to see how accurate a predictive model must be to be useful. Often this is a lot lower than you think. We have had clients realize they only needed a model that as a little better than a coin flip and others who only needed 70-80% accuracy. You might need 99.99% but you probably don’t.
Until you know, you can’t answer the question if your data is good enough or not. Without a business-driven target for accuracy, your data team will assume something must be really accurate to be useful and they could easily overshoot. Plus many predictive models cope with missing and bad data quite well or can at least degrade gracefully when the data is of poor quality, allowing reasonable predictions even when data is less good.
So, don’t wait for the data you think you need, start improving decisions with the data you have. It’s noble, its heroic and it works.