Stuart Crawford from Fair Isaac’s R&D group presented on New Approaches to Creating, Simplifying and Visualizing Rules. While decision trees can be very clear, they can also become very complex. His group has been working on algorithms for simplifying decision trees. Because decision trees often have repeating sub-trees – pieces of the tree that are identical but in different paths – a Directed Action Graph is often much simpler. The graph can re-use the logic by linking to it in multiple ways. Building one of these requires some fairly complex math to find the right nodes and ordering.
While these are simpler they can still be complex so the next step is to develop what is known as an Exception-based Directed Action Graph or EDAG. This takes the most common outcome and puts it at the top and then only has nodes for the other paths. In other words the action at the top is the default.
However this is not always enough. He showed an example of a 492 node decision tree reduced to 89 nodes in an EDAG. A 30,000 node (!) tree from a bank was reduced to a 480 node graph. These are clearly simpler but not simple.
An Action Graph is the next simplification – take an action and find all the logic that could result in that action. This becomes a single action graph. Complex trees and graphs can have many of these action graphs extracted from them and each is focused and easy to read.
Of course you could in fact author this way. Each action graph can be built separately, ordering can be changed and different variables used. Once you have the individual fragments you can merge them into a single decision tree or EDAG. The action graphs can be stitched together but you have the potential for errors when they are built separately like this. They can overlap – have two paths in different graphs that are the same but have different actions – or have gaps – where certain values are not covered across the various action graphs. Unless you can manage these overlaps and gaps, you cannot use the individual graphs for development.
Finally this approach allows you to generate comparisons between trees. Instead of showing structural differences, which can be confusing, logical differences can be shown as an EDAG. Much simpler.
I saw Stuart present this technology before and blogged it here.