I recently published a First Look on the open source OpenRules Decision Management System. Along with traditional Business Rules components, OpenRules includes two other important decision management components:
- Rules Solver for solving optimization problems
- Rule Learner for predictive analytics.
OpenRules Rule Solver is based on Constraint Programming (CP) technology, that is typically used in scheduling, resource allocation or configuration problems with large numbers of possible solutions. CP solvers start with a set of predefined constrained objects, predefined constraints (arithmetic, logical, cardinality, all different, etc.) and various search algorithms. CP solvers do well when business rules run out of steam (because there are too many alternatives or there is a need to balance lots of “rules” against each other or find the solution that violates the smallest number of rules). CP solvers are successfully used for large combinatorial problems for years, but they have exploded recently with more offerings and a move towards standardization such as a CP Java API standard “JSR-331”. CP standards should allow different solvers to be used interchangeably and simplify CP integration with decision management environments.
Rule Solver has been a component of OpenRules for a while but it is now compliant with JSR-331 and thus can utilize different CP solvers. Rule Solver uses the OpenRules rules syntax (decision tables) to define constraint satisfaction problems within an Excel-based editing environment. For scheduling problems, for instance, a simple table is defined that lists the Activities and their duration while a second table defines the Precedence Constraints between Activities. There are tables for Resources with limited capacities, for Activity-Resource constraints like “Masonry requires Joe or Bill”. Rule Solver provides rule templates for many frequently used binary and global constraints allowing a non-technical user to work in Excel to define a business problem in terms of unknown variables and constraints on them.
The Rule Solver supports hard and soft rules/constraints. This allows it to solve over-constrained problems where there is no perfect solution that satisfies all constraints and the “least bad” solution must be found. The user defines hard and soft constraints with relative importance, limits of allowed violations, etc. and provides them as an input to the solver. The Rule Solver reads the information from these various Excel rules tables, creates the proper constraint satisfaction model on the fly, and executes the solver – producing a feasible or an optimal solution. The Rule Solver is integrated with the rest of the OpenRules stack, by using the rules to generate a set of constraints, invoking the solver and then moving on to more rules that use the outcome of the solver for instance. There is documentation and a number of examples freely downloadable from http://openrules.com/download.htm.
Rule Learner is a component of the OpenRules BDMS that can generate business rules based on historical data. It uses supervised machine learning. OpenRules provides a Rule Trainer that automates the labeling of different data instances as “good” or “bad” – the crucial first step in supervised techniques. Rule Trainer is essentially a rule engine that executes “training rules” to generate training data sets. A business analyst uses Rule Trainer to define training rules in Excel rule tables or through a customizable web interface. Training rules specify criteria for selection of “bad” and “good” data instances and also allow further classification of “good” instances into different categories like “Very Good”, “Average”, “Not So Good”, etc. The Rule Trainer uses these training rules to extract instances from enterprise data sources using a highly configurable DB interface and to automatically label them – essentially building an analytic dataset.
The generated training records are then fed into the machine learning algorithms which will find the rules that predict how a record might be categorized using attributes that were not used to build the training data. For instance, if the total amount of an order is used to generate the training data then the generated rules will not use total amount of the order (a potential “leak from the future”). The result is a set of rules that use other attributes to categorize your records similarly to your explicit categorization. This matters as you will typically have the extra details you used for training for only some records. The discovered rules will label all records. So, for instance, records you have not investigated or audited might be flagged as potentially fraudulent.
Each generated rule has an associated category and a score (hit rate) that reflects how predictive/certain the rule is. Weights can be provided by the user for each category and the combination of the weights and scores are used to rank new records. Thus, the generated rules can be applied against data that was not used for rule generation. Because the generated rules are presented in Excel tables in a human-readable (and machine-readable) format, a user may analyze, modify, activate, and deactivate the rules that will be used for actual classification of new data records. As part of the verification process for these new rules the tool generates reports such as one that shows which rules fired on each of the records that are highly ranked. This allows experts to review why records were included evaluating them against the records with already known results.
The data mining algorithms are pluggable provided that they generate rules as an output from a training dataset – OpenRules default is to use different Weka classification algorithms such as RIPPER. Rule Learner is well-integrated with other OpenRules components. It includes a graphical interface that is based on OpenRules Dialog (described in the review of OpenRules) allowing creation of custom web interfaces for different rules discovery environments. While the discovered rules by default are generated as OpenRules decision tables and placed in different Excel files, the approach and its implementation are generic enough to use different formats and other rule engines. OpenRules Learner has been successfully used by such large organizations as the IRS.