Dean Abbot wrote a great post recently “Why Defining the Target Variable in Predictive Analytics is Critical” in which he referenced the CRISP-DM approach to building predictive analytic models and talked about the importance of target variable selection in building an effective model. The thrust of Dean’s post was the crucial point that because
The target variable carries with it all the information that summarizes the outcome we would like to predict
we must be careful how we select it and we must recognize that the target variable we select will have a material impact on our predictive analytic modeling. But how do we ensure the right context to ask this question? What must we do first, before we can pick the right target variable?
Decision Management Solutions is working with a number of analytic modeling teams using CRISP-DM (or something very similar) to manage their analytic process. The first step in this process is Business Understanding in which the analytic team is supposed to develop some understanding of the business problem to guide their effort. To Dean’s point, this business understanding had better result in a good target variable selection or our modeling will not go well. But what can we use to build business understanding? Most teams and companies have no real standard for this and so we have been working on some ideas here at Decision Management Solutions and come up with something that looks like this:
- A statement of the analytic insight that is to be developed
In Dean’s example, Claim Fraud Likelihood for instance.
- An assessment of the information available to develop that insight
Claims databases, customer databases, convictions and litigation information, case management files etc.
- An explicit list of the business decisions that will be made differently because of that analytic insight
This is critical. Understanding the various business decisions that should be improved will drive the selection of target variables and much more. In this case, for instance, we might develop very different models if our objective was to improve the initial determination of fraud decision, the case manager assignment decision or the litigate decision. These decisions should use the same information we are using to build the model so that we can be sure that the data we need to power the model will be available when we make these decisions.
- A set of business objectives or measures that will be met or improved by changing these decisions
Any decision has an impact on one of more business metrics or objectives. Any change to decision-making will likely change this impact. An analytic effort must understand which objectives/measures are being targeted for improvement, which are being held steady and which are potentially being traded off for improvement elsewhere. The set of decisions determines which objectives and measures should be considered.
- An impact analysis showing the systems and business processes that rely on or implement all those decisions impacted by the analytic insight
Decisions are implemented in systems such as the Claims Processing System or Case Management System and support business processes such as Process Claims. Understanding this helps ensure that the model developed will be one that works in the real-world environment of decision-making. If the system is COBOL and batch then this may prevent real-time scoring of a model or execution of more complex scoring logic for instance.
- The set of organizational units that own the definition of how these decisions should be made, the set of organizations that must make these decisions day-to-day, as well as those that are impacted by any change in the decision making.
This determines who needs to buy into the results of the model, who needs to understand it and approve it and who needs to be able to access it and when.
Taken together this information will focus an analytic project effectively and help select the right target variable, right approach to modeling and ensure an effective deployment plan later. Building this information requires:
- An approach to decomposing the design of the decision-making being targeted so that these decisions are understood in some detail.
- A mapping of this decomposition to the objectives, metrics, processes and systems of the business.
- A collaborative software environment to allow business, IT and analytics teams to work together and ensure these models become part of an ongoing model of the business, linking projects and reducing effort in subsequent projects.
I wrote about such a process in my recent book and we are now using it in our consulting projects. Results look good so far, with some great quick wins where analytic teams changed their approach once they understood the decision-making they were trying to influence.
Interested in our approach? Drop me a line - email@example.com. Got a different approach, post a comment as I am always keen to learn what works.