I got an update on Modellica’s Decision Engine recently. I was referred to Modellica by the folks at GDS Link who use the Modellica rules capability in their DataView 360 product. Modellica is a European rules and decision management solution with 20 or so projects for 10 clients (one in the US, the rest in Europe) across credit cards, consumer credit, motor loans, leasing, mortgages and commercial credit insurance including one large European bank.
Modellica is a typical decision management solution, consisting of a Decision Studio for managing policies and models and a Decision Engine suitable for both batch and online decisioning.
Decision Studio, the IDE, opens to a standard project list. When a project is opened a list of all the objects in the project is displayed. Modellica manage the data dictionary, processes (decision flows), scorecards, scripts (sequential formulas), segmentation or decision trees, rulesets (sequential rule lists) and decision tables. All projects begin with a Main process that has a decision flow overview that can be expanded one layer at a time.
Like many decision management solutions with a focus on the credit space, Modellica assumes that all the data it needs will be passed in and the results passed out but that no data access will occur during the decision service execution (I discuss the various kinds of decision service here). The data dictionary displays the variables in the project – system variables, input variables, internal variables, decision variables that will be returned, and simulation variables for use in assessing the results of the decision. The system variables are things like date and time and are not changed. All the other blocks of variables have sub-blocks to give the variables a structure. Variables can be numbers, strings, dates etc. Variables can be given allowed values and continuous variables can be binned (identifying the range of values for each bin) and these bins can be combined into useful groups such as “low income” (simplifying the writing of rules and building of trees/tables later). Multiple groupings of allowed values or bins can be defined and a variable can use multiple groups in different parts of the decision. The data has to be flattened from a relational structure to create a dataset to enter and has to be defined for each project, a bit of a limitation from a reuse perspective.
The decision itself is managed with a Process or what I would call a decision flows. These are built using a nice drag and drop metaphor. These flows define a sequence of execution objects (scripts, scorecards, trees, rulesets and tables) and clicking on the objects in the process opens the editor for the object (though you can also navigate to related objects from within the object editors). The flows are all linear/sequential but multiple sub-processes could be invoked by a segmentation tree, for instance, branching the execution (though this would not be shown on the flow diagram). Each object opens in a new tab and multiple tabs can be opened.
Having specified the data to use, the next step is to define a set of calculations, linking variables to each other through formulas. The flow has to explicitly call these formulae, so they are only recalculated when you specify that they need to be. The formulae can be re-used in multiple objects, making this fairly easy though not automatic. There is a fairly comprehensive formula editor that validates the formula as it is built. I appreciated the ability to manage these formulae but would have liked to see some automated recalculation capability as data changes during a decision. The ability to re-use the scripts/formulas allows you to configure the formulas to recalculate during a decision but you need to say when you want this to happen.
Scorecards are built from the variables, assigning weights to the sub groups and variables. These additive scorecards are a classic way to implement predictive models in a decision system. The predictive model determines which variables are predictive and how much they contribute to a prediction. The model is then described in terms of data elements and the weight or contribution to the score of a specific value (or bin) for a specific data element. Once the model is created outside Modellica and its statistical validity is confirmed, scorecards are specified by hand in the IDE. Adverse reason codes can also be specified to explain which factors influenced a particular score when it was generated. I would like to have seen a PMML import here but the interface was friendly and would be easy for those used to predictive scorecards, such as the target audience of risk managers in this case.
Segmentation trees can be defined also. These decision trees have branches defined based on the groups defined for variables – each group becomes a branch. Branches can also be based on an expression, built using the standard formula editor. Using the groups ensure completeness as they ensure that every case is handled once and only once. If formulae are used then an Else is forced to make sure cases are handled but use of the formula editor prevents overlap checking at this time. The action nodes – leaves – can set any number of variables to specified values and add reason codes or decision keys. These nodes can also specify the next step in the decision process (allowing you to use a decision tree to branch your execution into, say, a high and low risk path.
A rule set can be defined, specifying the conditions and actions for each rule in a list. All the rules are evaluated sequentially. Multiple actions can be specified for a rule and the editor uses the same interface as for the tree except that all rules are evaluated systematically and sequentially. I would like to have seen an non-sequential execution mode but it was a nice enough interface.
Decision Tables are supported – “real” decision tables not rule sheets, with each cell being a rule. The cells can be values or formulas (using the standard formula editor) and the table is multi-dimensional. Groups again drive the initial row/column creation as they did in the decision tree, making it easy to ensure that the table is both exclusive and exhaustive (each combination of values identifies one cell, and therefore one result, and only one). Decision tables and scorecards must be invoked from a script, specifying the variable to store the result.
Once you are done specifying rule sets and flow, you create a decision system from the specification. Versioning is supported when you do this, and you can rollback to a previous one, fork the versions etc. Backup and recovery is supported too. Testing involves using test files – tab delimited data files that match the input data – and these are loaded into the environment. The tool allows you to create an excel file with the right columns (as well as a PDF defining all the variables) to ease the test data creation process. Once the test data is loaded it can be processed and the various data elements are calculated using the rules defined. The results (including all the reason codes) can then be exported to Excel for analysis. Would be nice if it did more of this in the system but the use of Excel gives users good tools and the flat file in/out model makes managing the data in Excel practicable.
A QA tool allows comparison of execution as well as detailed analysis of the rule execution. Designers can step by step through the execution. A nice expanding hierarchy shows how the various processes, trees, tables and scorecards are executed for a particular record. In production an XML file is created containing this same data, allowing later analysis and reporting for compliance purposes. The comparison tool allows the execution of the same records and compares the results to see when they different, allowing two different versions of the project to be compared using a common test set. Each can be walked through separately or all the differences can be found in a single run. The objects or the processes can be used to organize the results, making it easy to find why two objects are different in the two runs. This allows users to do impact analysis – what would change in last month’s results if I made this set of changes, for instance.
From a technical perspective, this deploys as a stateless decision service that cannot access additional data, though the integration with GDS Link’s DataView 360 gives you lots more data access options as well as pre-built connectors etc. The engine runs and loads/uses the project files to define execution rather than generating code. The engine exists as a .Net or a Java instance. The Java version runs on a mainframe as well as a Linux server. Changes to the rules can be loaded into a running instance without interruption, allowing rule changes to be put into production regularly and without disruption.
Updated: GDS Link acquired Modellica in July 2011.