My friends at Actico recently had me record some videos on Decision Management and decision modeling with DMN. Here’s the first – 3 reasons why financial and insurance companies should adopt Decision Management.
My friends at Actico recently had me record some videos on Decision Management and decision modeling with DMN. Here’s the first – 3 reasons why financial and insurance companies should adopt Decision Management.
As part of the build up to Building Business Capability 2017 I gave an interview on transforming the business. Check it out.
If you want to come to BBC 2017, there’s still time to register with code SPKDMS for a 10% discount.
If you are coming to BBC 2017, don’t forget to register for my tutorial Decision-Centric Business Transformation: Decision Modeling. See you there.
I have been working on Decision Management since we first started using the phrase back in 2002 – I’m probably the guilty party behind the phrase – and Decision Management Solutions (the company I run) does nothing but Decision Management. This gives us a unique perspective on new technologies and approaches that show up. One of the most interesting developments in Decision Management recently has been the use of decision modeling – especially the use of decision modeling with the Decision Model and Notation (DMN) standard.
In DMN a decision is the act of determining an output or selecting an option from inputs. In this context we mean a repeatable decision, allowing us to define a decision model for our decision-making approach. If we use DMN then we:
Our experience with DMN is both broad and deep – we have trained nearly 1,000 people and used decision modeling on dozens of real-world projects. We have seen how valuable it is on these projects and we particularly notice how many different kinds of projects it is valuable for.
Unlike some DMN proponents, we don’t think that defining executable decision models is the only reason for using DMN. Here are some other reasons you might use decision modeling and DMN on your projects:
Specific projects we have used decision modeling on have shown us that decision modeling with DMN is:
We have had great success with decision modeling and are helping many organizations adopt it right now by delivering a business value pilot that goes from a business need to a working pilot in a few weeks. Get in touch if we can help you.
Last week we completed the main phase of a proof of concept project at a client, one based in Jakarta Indonesia. After the report out, I tweeted
Loving watching a Dr at one of my clients explain (in Bahasa) the decision model for claims handling they built #dmn #decisionmgt
Tweets are great for this kind of quick heartfelt observation but I thought a blog post to explain why this is such a breakthrough was called for.
First, some background. This project was to develop a proof of concept decision service for claims handling. The client was already processing claims and integrating a Business Rules Management System. Our project was to build on that, introduce decision modeling, and show them how this decision-centric approach could help them maximize the value of their BRMS investment by engaging business owners, clarifying requirements, and enabling continuous improvement of the decision. Two things made this project exciting for me:
I am looking forward to seeing this in production, and looking forward further to working with this very impressive group moving forward. Now if only I spoke Bahasa…
Drop us a line if this is something you would like to know more about.
Some other notes on the project for the more geeky among you:
Neuro-ID TM started about 7 years ago with the premise that by monitoring how people use their keyboards and mice one could identify the confidence level of the person filling out the form. And that this could be done without any personally identifiable information. They developed a Neuro Confidence ScoreTM (Neuro-CS TM) which they have patented. Their focus is on the questions companies ask of their customers. Many of these questions are risk-related – they are asked to help the business establish how risky someone or something is – and companies lack confidence in the answers they get. Neuro-ID likes to say “Smarter Questions, Better Bottom Line TM”.
There is an inherent tension in how organizations design surveys or online forms. Making an online form “frictionless” is a good objective as it makes it more likely people will fill it out. But it is hard to do this if one is also concerned about compliance. A focus on compliance can lead organizations to ask for too much detail and so create friction while a frictionless experience can easily fail to check on someone.
Neuro-ID’s technology delivers prescriptive analytics that score someone’s behavior in terms of the confidence with which those questions are answered (as well as some supporting attributes). As an example, consider declarations in financial applications. The technology monitors the session to see how people answer, how they move their mouse, what options they pick, which things they change. A baseline is created for each person as they interact and subsequent actions are compared to assess their confidence. The confidence of their movements reflects whether they are concealing something or don’t understand the question or are just not sure what the right answer is.
The technology sits behind existing forms and does not collect any personal information or PII. Because it compares the user to themselves, a lack of language skills or poor eyesight does not impact the score. Forms can add baseline questions before asking risk-related questions or can treat all questions as both baseline and risk relevant questions. It also detects meaningful edits, allowing it to ask questions like “did you overstate your income”. Experience is that this often triggers better behavior.
The technology generates a confidence level on each question. It has an interactive mode allowing loan officers or others to replay the interaction and everything is available programmatically through an API. A decision id is used at the Neuro-ID end that the company has to match to a particular applicant, allowing the technology to store detailed records without knowing how someone is. While mouse and keyboard are the most common environment, the technology also handles touch screens by assessing hesitations and changes.
Neuro-ID can be used for risk mitigation, fraud prevention or user experience design depending on the situation. An initial target are traditional banks and FIs working in the Prime segment. These organizations need a clean, quick online onboarding process that is self-directed yet does not expose them to unnecessary risk. It’s also effective with credit-invisible customers as it can send additional questions to check for third party verification when confidence is low.
A really interesting technology in my opinion. You can find more at www.neuro-id.com.
Silvie Spreeuwenburg of LibRT came up after lunch to talk about a rules-based approach to traffic management. Traffic management is a rapidly changing area, of course, thanks to IoT and self-driving cars among others.
When one is considering traffic, there are many stakeholders. Not just the road user, also businesses reliant on road transport, safety authorities etc. The authorities have a set of traffic priorities (where they need good flow), they have delays and they have restrictions for safety or legal issues. They manage this today and expect to keep doing so, even as technology evolves.
To manage this they create lots of books about traffic management, incidents and other topics for each road. Each contains flow charts and instructions. This creates a lot of overlap so its important to separate problem definition from problem solution and to be specific – differentiate between things you must or may not do and those that are or are not actually possible.
The solution involves:
The logic is all represented in decision tables. And applying the approach has successfully moved traffic to lower priority roads. Plus it fits very well with the way people work and connects changes in policies very directly to changes in behavior.
Marcia Gottgtroy from New Zealand tax presented on their lessons learned and planned development in decision management. They are moving from risk management to a business strategy, supported by analytical decision management. The initial focus was on building a decision management capability in the department and they initially focused on GST (sales tax) and it went very well, producing a decision service with proof of STP, operational efficiency very quickly. The service also had a learning loop based on the instrumentation of the service. They automated some of this (where the data was good) but did manual analysis elsewhere – not trying to over-automate nor wait for something perfect.
After this initial success, the next step is to focus on business strategy and get to decision management at an enterprise level. Hybrid and integrated solutions supported by a modern analytical culture driven by the overall strategy. Need to define a strategy, a data science framework, a methodology – all in the context of an experimental enterprise. They began to use decision modeling DMN – using decision requirements models to frame the problem improved the clarity, understanding, communication. And it documented this decision-making for the first time.
But then they had to stop as the success had caused the department to engage in a business transformation to replace and innovate everything! This has created a lot of uncertainty but also an opportunity to focus on their advanced analytic platform and the management of uncertainty. The next big shift is from decision management to decision optimization. Technology must be integrated, different approaches and an ability to experiment are key.
Nigel Crowther of IBM came up next to talk about business rules and Big Data. His interest is in combining Big Data platforms and AI with the transparency, agility and governance of business rules. Big Data teams tend to write scripts and code that is opaque, something business rules could really help with. Use cases for the combination include massive batches of decisions, simulations on large datasets and detect patterns in data lakes.
The combination uses a BRMS to manage the business rules, deploys a decision service and then runs a Map Job to fetch this and run it in parallel on a very large data set – distributing the rules to many nodes and distributing the data across these nodes so the rules can be run against them in parallel and very fast. The Hadoop dataset is stored on distributed nodes, each of which is then run through the rules in its own Map job before being reduced down to a single result set – bringing the rules to the data. This particular example uses flat data, about passengers on flights, and uses rules to identify the tiny number of “bad actors” among them. 20M passengers per day so it’s a real needle in a haystack problem. The batch process is used to simulate and back-test the rules and then the same rules are pushed into a live feed to make transactional decisions about specific passengers. So, for instance, a serious set up with 30 nodes, could scan 7B records (a year’s worth) in an hour and a half, 1.2M/second.
It’s also possible to use Big Data and analytic tools to analyze rules. Customers want, for instance, to simulate the impact of rule changes on large portfolios of customers. The rule logs of rules executed in a year, say, can also be analyzed quickly and effectively using a Big Data infrastructure.
Vijay Bandekar of InteliOps came up to talk about the digital economy and decision models to help companies face the challenges this economy creates. The digital economy is driven by the explosion of data and the parallel explosion in IoT devices. While this data is increasingly being stored but little if any is being effectively used. We need applications that can manage this data and take advantage of it because its just not possible for even the best human staff to cope – autonomous, learning, real-time decision-making systems are required. These systems require inferencing, reasoning and deductive decision models. While the algorithms work, it can be cumbersome to manage large rule bases. While machine learning approaches can come up with the rules, integrating these manually can be time consuming.
Architecturally, he says, most organizations focus on stateless decisioning with a database rather than a stateful working memory. Yet the stateful approach offers advantages in the era of fast moving, streaming data while also taking advantage of the rapidly increasing availability of massive amounts of cheap RAM. This requires agenda control and transparency, as well as effective caching and redundancy/restoration.
It’s also important to add learning models with both supervised and unsupervised learning engines to handle the increasing volumes of data. These learning models need to be injected into the streams of data, he argues, to make decisions as it arrives rather than being pointed at stored data. In addition, combinations of algorithms – ensembles – are increasingly essential given the variety of data and the value of different approaches in different scenarios.
The combination of delivers an adaptive decisions framework for real-time decisions. It uses stateful decision agents based on business rules and continuous learning using ensembles of analytic approaches on streaming data.
Last up is Tim Stephenson of Omny Link. His recent focus is on smaller companies and one of the key things about the new digital economy is the way in which this allows companies to punch above their weight. Small companies really need to drive leads to conclusion and manage customers effectively. CRM systems, even if they start free, can be complex and expensive to use. To unlock the value and respond appropriately faster to serve more customers you need to do a set of things well:
He walked through how these elements allow you to deal with core scenarios, like initial lead handling, so the company can manage leads and customers well. You need to use APIs to record well understood data, decide what to do and make sure you do what you decided to do.
The value of DMN (especially decision tables) allows you to get the business people to defined how they want to handle leads, how they want to make decisions. They can’t change the structure of the decisions, in his case, but they can tweak thresholds and categories, allowing them to focus and respond to changing conditions. And these decisions are deployed consistently across different workflows and different UIs – the same decision is made everywhere, presenting the standard answer to users no matter where they are working (a key value of separating decisions out formally as their own component). Using Decision Requirements Models to orchestrate the decision tables keeps them simpler and makes the whole thing more pluggable.
The payback for this has been clear. One user found that the time saved was about 75% but in addition, the improvement in response time ALSO means the company closes more work. Even small businesses can get an advantage from this kind of composable, consistent, repeatable, auditable, transparent decision automation.
And that’s a wrap. Next year’s Decision CAMP probably in Luxembourg in September and don’t forget all the slides are available on the Decision CAMP Schedule page.
Little bit of a late start for me so I am starting with Geoffrey De Smet from Red Hat talking about constraint planning. He points out that some decisions cannot be easily solved with rules-based approaches – they can be described as decision (and as a DMN decision model in our experience) but not readily made with rules and decision tables only. His key point is that different decision problems require different technology
And our experience is that you can do this decision by decision in a decision model too, making it easy to identify the right technology and to combine them.
He went into some detail on the difference between hard and soft constraints and on the interesting way in which the Red Hat planner leverages the Red Hat rules format and engine to handle constraint definition, score solutions etc. They support various approaches to planning too, allowing you to mix and match rules-based constraints and various algorithms for searching for a solution. The integration also allows for some incremental work, taking advantage of the standard rule engine features.
I wrote about some of the early work around Drools Planner back in 2011.
I went next, presenting on the role of decision models in analytic excellence
Bastian Steinart of Signavio came up after the break. Like Jan and I, he focused on their experience with DMN on Decision Management projects and the need for additional concepts. Better support for handling lists and sets, handling iteration and multiplicity for instance is also something they find essential. They have developed some extensions to support these things and are actively working with the committee – to show their suggestions and to make sure they end up supporting the agreed 1.2 standard.
They have also done a lot of work turning decision modeling in DMN into Drools DRL – the executable rule syntax of Drools. This implies, of course, that DMN models can be turned into any rules-based language, we would strongly agree that DMN and business rules (and Business Rules Management Systems) are very compatible. From the point of view of a code generator like Signavio however, the ability to consume DMN XML generated from a model, is probably preferable. With support for DMN execution in Drools this becomes practical.
Denis Gagne introduced how elements of DMN can and perhaps should be applied in some other standards. He (like us) has seen organizations gradually pull things out of their systems because they have separate lifecycles – data, process, decision-making etc. Extracting this helps with the disjoint change cycles but also engaging business users in the evolution of operations and systems. Simpler, more agile, smarter operations.
In particular, Denis has been working with BPMN (Business Process Model and Notation), CMMN (Case Management Model and Notation) and DMN (Decision Model and Notation). All these standards help business and IT to collaborate, facilitate analysis and reuse, drive agreement and support a clear, unambiguous definition. BPMN and CMMN support different kinds of work context (from structured to unstructured) and DMN is relevant everywhere because good decisions are important at every levell in an organization.
Trisotech wants to integrate these approaches – they want to make sure DMN can be used to define decisions in BPMN and CMMN, add FEEL as an expression language to BPMN and CMMN, harmonize information items across the standards and manage contexts.
The three standards complement each other and have defined, easy to use, arms-length integration (process task invokes decision or case for example). Trisotech is working to allow expressions in BPMN and CMMN to be defined in FEEL, allowing them to be executable and allowing reuse of their FEEL editor. Simple expressions can then be written this way while more complex ones can be modeling in DMN and linked. Aligning the information models matters too, so it is clear which data element in the BPMN model is which data element in DMN Etc. All of this helps with execution but also helps align the standards by using a common expression language – BPMN and CMMN skipped this so reusing the DMN one is clearly a good idea.
Denis has done a lot of good thinking around the overlap of these standards and how to use them together without being too focused on unifying them. Harmonizing and finding integration patterns, yes, unifying no.
Alan Fish took us up to lunch by introducing Business Knowledge Models. Business Knowledge Models, BKMs, are for reuse of decision logic. Many people (including me) focus on BKMs for reuse and for reuse in implementation in particular. This implies BKMs are only useful for the decision logic level. Alan disagrees with this approach.
Alan’s original book (which started a lot of the discussion of decision modeling with requirements models) introduced knowledge areas and these became BKMs in DMN. BKMs in his mind allow reuse and implementation but this is not what they are for – they are for modeling business knowledge in his mind.
Businesses, he argues, are very well aware of their existing knowledge assets. They need to see how they fit in their decision-making, especially in a new decision making system. Decision Requirements Models in DMN are great at showing people where specific knowledge is used in decision-making. But Alan wants to encapsulate existing knowledge in BKMs and then link BKMs into these models. He argues you can show the functional scope in a decision using BKMs and that by itemizing and categorizing these BKMs.
Each BKM in this approach is a ruleset, table, score model or calculation. The complexity of these can be assessed and estimates/tracking managed. This is indeed how we do estimates too – we just use the decisions not BKMs in this way. He also sees BKMs as a natural unit of deployment. Again, we use decisions for this, though like Alan we use the decision requirements diagram to navigate to deployed and maintainable assets. He thinks that user access and intent do not align perfectly with decisions. He also makes the great point that BKMs are a way for companies to market their knowledge – to build and package their knowledge so that other folks can consume them.
The key difference is that he sees most decisions having multiple BKMs while we generally regard these as separate decisions not as separate BKMs supporting a single decision.
Jan Vanthienen came up after lunch – not to talk about decision tables for once, but to talk about process and decision integration. In particular, how can ensure consistency and prevent clashes. Testing, verification, validation are all good but the best way to obtain correct models is to AVOID incorrect ones! One way to do this, for instance, is to avoid inconsistency e.g. by using Unique decision tables in DMN.
Jan introduces a continuum of decision process integrations
In scenario 4 particularly there are some potential mismatches:
Last session before my panel today was Gil Ronen talking about patterns in decision logic in modern technical architectures, specifically those going to be automated. His premise is that technical architectures need to be reimagined to include decision management and business logic as a first class component.
The established use case is one in which policy or regulations drive business logic that is packaged up and deployed as a business rules component. Traditional analytic approaches focused on driving insight into human decision-making. But today big data and machine learning are driving more real-time analytics – even streaming analytics – plus the API economy is changing the boundaries of decision-making.
Many technical architectures for these new technologies refer to business logic, though some do not. In general, though, they don’t treat logic and decision-making as a manageable asset. For instance:
They all vary but they consistently fail to explicitly identify and describe the decision-making in the architecture. This lowers visibility, allows IT to pretend it does not mean to manage decisions and fails to connect the decision-making of the business to the decision logic in the architecture. A common pattern or approach to representation and a set of core features to make the case to IT to include it in architectures:
All correct problems and things that would help. This is clearly a challenge and has been for a decade,. Hopefully DMN will change this.
After yesterday’s pre-conference day on DMN, the main program started today. All the slide decks are all available on the DecisionCAMP site.
Edson Tirelli started things off with a session to demystify the DMN specification. DMN does not invent anything, he says, but takes some of these concepts and defines a common language to express them. To take advantage of it we need to implement it, develop interchange for it and drive adoption.
Edson developed a runtime for DMN that takes DMN XML and executes on the Drools engine. This takes interchange files from tools and executes the logic from those files. This drives his focus – he’s thinking about execution. He has a set of lessons learned from this
Jan Purchase and I spoke next, discussing three gaps in the specification that we see in the specification. Here’s our presentation and you can get more on our thinking in our book, Real-World Decision Modeling with DMN:
Bruce Silver came next to discuss the analysis of decision tables. DMN allows many things to be put into decision tables that are “bad” – not best practices – because the specification cannot contain methodology, because there are sometimes corner cases and because there are some disagreements, forcing the specification to allow both.
Bruce generally likes the standards restrictions on what can be in a decision table and has developed some code to check DMN tables to see how complete they might be. While these restrictions are limiting they also allow for static analysis. He checks for completeness (gaps in logic for instance), compares hit policy with the rules to make sure the rules and hit policy match and to spot problems like masked rules (rules that look valid but are never going to execute due to the hit policy). It recommends collapsing rules that could be combined and makes other suggestions to improve clarity.
It also applies “normalization” based on the work of both Jan Vanthienen and some of the later work done for The Decision Model by von Halle and Goldberg. These are applied somewhat selectively as there are some that are very restrictive.
A clear approach to validating decision tables based on DMN – very similar to what BRMS vendors have been doing for years but nice to see if for DMN.
A break here so I’ll post this.
The first day at Decision CAMP 2017 is focused explicitly on the Decision Model and Notation (DMN) standard.
Alan Fish introduced the ongoing work on 1.2. He quickly summarizes the new features in 1.1 – such as text annotations and a formal definition of a decision service. Then he went through the new features, starting with those that are agreed:
In addition, several key topics are being worked on. These three issues have not been voted on yet but we are tracking to get these done:
Bruce Silver then facilitated a discussion on what people liked and disliked about DMN.
Last week, Silicon Valley research firm Aragon Research cited Decision Management Solutions as a visual and business-friendly extension to digital business platforms and named us a 2017 Hot Vendor in Digital Business Platforms. We’re delighted about this and feel pretty strongly that this validates our vision of a federated digital decisioning platform as an essential ingredient in a company’s digital business strategy.
The report’s author, Jim Sinur, said:
Digital Business Platforms combine five major technical tributaries to create a cornerstone technology base that supports the changing nature of business, as well as the work that supports digital. Enterprises that are looking to manage a complex or rapidly changing set of rules that empower outcomes would benefit from decision management as offered by Decision Management Solutions, especially when combined with predictive or real-time analytics.
The report says that what makes us unique is that business people can represent their decisions in a friendly, visual and industry-standard model while managing the logic and analytics for these decisions across many implementation platforms. We’re working with clients to create “virtual decision hubs” that map the complexities of enterprise decision-making to the underlying technologies that deliver the decision logic, business rules, advanced analytics and AI needed to operationalize this decision-making across channels.
Open Data Group is an analytic deployment company. The company was started over 10 years ago and has transitioned from consulting to a product company, applying their expertise in Data Science and IT to create an analytic engine, FastScore.
Successful analytics require organizational alignment (specifically between Data Science and IT) to create coordination of systems and business problem collaboration. In addition to understanding analytics, companies are trying to leverage new technologies and modernize their analytic approach. To address some of these challenges, Open Data Group have developed FastScore.
FastScore is designed to address various analytic deployment challenges to monetize analytic outcomes including:
FastScore provides a repeatable, scalable process for deploying analytic workflows. Open Data Group see the model itself as the asset and emphasize that a model needs to be language and data neutral as well as deployed using micro-services (they are a Docker container) to be a valuable, and future proofed, asset.
FastScore is an analytic deployment environment that connects a wide range of analytic design environments to a wide range of business applications. It has several elements, all within a Docker container. It also includes a model abstraction (input and output AVRO schemas, an initialization and the math action) that allows models to be ingested from a wide variety of formats (including, Python, R, C, SAS, PFA) and a stream abstraction (input and output, AVRO schema in JSON, AVRO binary or text) to consume and produce a wide range of data (from streaming to traditional databases) using a standard lightweight contract for data exchange.
The FastScore Engine is a Docker container into which customers can load models for push button deployment. Input streams are then connected to provide data to the model and output streams to push results to the required business applications or downstream environment. Multiple models can be connected into an analytic pipeline within FastScore. Models can be predictive analytic models, feature generators or any other element of an analytic decision. Everything can be accessed through a REST endpoint, with model execution being handled automatically (selecting between runners for R, Python, Java, C for instance). Within the container is the stream processor that will enforce the input and output schemas and a set of sensors that allow model performance to be monitored, tested and debugged.
Besides the core engine, additional features include:
Plus Command Line Interface and REST APIs for everything.
Because all of this is done within a Docker container, the product integrates with the Docker ecosystem for components such as systems monitoring and tuning. The Docker container, allows easy deployment to variety of cloud and on premise platforms and supports micro services orchestration.
FastScore allows an organization to create a reliable, systematic, scalable process for deploying and using all the analytic models developed by their analytic and data science teams – what might be called AnalyticOps, a “function” created to provide a centralized place to manage, monitor and manipulate enterprise analytics assets.
More information on FastScore.
Avola Decision is a decision model-based decisioning platform migrating from supporting the proprietary TDM (The Decision Model) approach to support for the DMN (Decision Model and Notation) open standard. I reviewed the previous product and since then the team has been working on a new product. The new Avola Decision is .Net based on the backend and is available on-premise or on Azure for the SaaS version (public or private clouds). The UI has been rewritten and is completely HTML and browser-based.
Customers begin at a landing page – a dashboard where shortcuts and information that are used regularly such as notifications or tasks can be displayed. Customers can have many projects within their environment and projects can be created by non-technical users. Projects are within a domain (and a domain may have many projects) and individual projects can be linked to dependent domains to bring in shared content if the owner of that content allows it. Multiple Decision Services can be defined for the project and different members can be added with different roles. A separate identity server supports two-factor identification and allows custom security approaches for specific customers.
Domains contain business concepts (sets of data elements) and projects. Users work within a project and its related domain(s). They have instant free-text search across the project that shows hits for the search as it is typed. Explicit tags are coming soon, allowing objects to be tagged and managed using these tags.
Data elements can be defined based on a set of allowed types. Specific data types can be constrained to a specific set of values (value lists) or precision. These can be used as a glossary for multiple data elements with a where-used capability to see which data elements are using the definition and which decisions use that data. Documents such as policies can be uploaded to create Knowledge Sources and additional ones can be created that point to websites etc.
The decisions in a model can be viewed, either just those exposed as decision services or all decisions. As before, the diagram is generated from the logic being defined behind the decisions. Plans exist to allow editing of the diagram directly but for now it is based on the executable logic behind it. The diagram is DMN-like, using input data nodes as well as the boxed list of attributes (combining both styles of data presentation). In addition, the decision nodes are divided up to see conditions, operands, and metadata from both data and sub-decisions. Users can zoom in and out, restrict the number of levels being viewed, see the layer “above” a decision – the decisions that require it etc. Future versions will allow the user to hide the data elements, knowledge sources etc.
Behind each decision is decision logic, currently only as a decision table. Other DMN representations will be coming soon as will multiple action columns but they plan to continue to use the TDM layout for decision tables as well as some of the decision table features in TDM but not yet defined in DMN. The decision table editor has been upgraded to support row and column movement, in-line editing and change highlighting. Rules can be cloned and edited and some checking is built in such as type conformance, range overlap/underlap. A formula builder is used for calculations and users can click through to follow inputs to their sources. Importing and exporting FEEL that defines decision logic is a future possibility also but they don’t plan to expose it as a standard way to edit logic in the tool.
Testing can be done at any point. Test data collections can be defined and used to test the development or any deployed version. One or many test collections can be run and the status of each collection and test is shown, making a quick visual check for success easy. An Excel template can be downloaded to bulk create test cases or they can be entered/edited individually. Tests can be viewed in terms of the rows that executed in the various decision tables and the table can be opened for editing. Impact analysis – seeing the impact of a proposed change in terms of overall results – is also planned.
Once logic is tested and confirmed, there is an approval cycle and deployment support as you would expect. Once this process starts, the whole service is packaged up and encapsulated so it cannot be impacted by changes e.g. to a shared value list.
Executions store the version of the model used, the outcome of the decision invoked as well as the results of each sub-decision as well as the data used to create it. This information is available for reporting and analysis.
Future add-ons will allow the data defined to be used to define web forms or surveys to capture the data needed.
More information at Avola Decision.
When the famous nerd webcomic XKCD pokes fun at how you use technology, it’s probably time to try a different approach. Recently he posted on Machine Learning and took a swipe at the mindless way some people approach machine learning. His characters discuss a “machine learning system” that involves pouring data into a big pile of linear algebra and stirring until you get the answers you want. Humorous though this is, it also represents a definite school of thought when it comes to advanced analytics – that more data and better algorithms is all you need. With enough data and algorithmic power there’s no need to think about the problem, no need to talk to the people who need the output, no need to do anything except “let the data speak”.
In our experience this approach has a number of problems:
In the end there is no substitute for knowing what the business problem is. In machine learning (predictive analytics, data mining, data science), this means:
We have found on multiple projects that this is the biggest single problem – get the problem (decision) definition right and the odds of successful analytic projects (ones that actually improve business results) go way up. Decision discovery and modeling, especially using the Decision Model and Notation (DMN) standard, is tremendously effective at doing this. So much so that we do this as standard now on all our analytic projects.
But don’t just believe me – AllAnalytics identified this as the greatest problem in analytic projects and research by the Economist Information Unit talked about the Broken Links In The Analytics Value Chain (you can find some posts on this over on our company blog – How To Fix The Broken Links In The Analytics Value Chain and Framing Analytics with Decision Modeling).
If you want to learn more, we have a case study on bringing clarity to data science project as well as two briefs – Analytics Teams: 5 Things You Need to Know Before You Deploy Your Model and Analytics Teams: 6 Questions to Ask Your Business Partner Before You Model – to show how a focus on decisions, and decision modeling, can really help. Or contact us and we can chat.
DataRobot is focused on automated machine learning and on helping customers build an AI driven business, especially by focusing on decisions that can be automated using machine learning and other AI technologies. DataRobot was founded in 2012 and currently has nearly 300 staff including 150+ data scientists. Since it was founded, well over 200M models have been built on the DataRobot cloud.
DataRobot’s core value proposition is that they can speed the time to build and deploy custom machine learning models, deliver great accuracy “out of the box” and provide a simple UI for business analysts so they can leverage machine learning without being a data scientist. The technology can be used to make data scientists more productive as well as to increase the range of people who can solve data science problems.
DataRobot runs either on AWS or on a customer’s hardware. Modeling-ready datasets can be loaded from ODBC databases, Hadoop, URLs or local files – partnerships with companies like Alteryx support data preparation, blending etc. The software then automatically performs the kind of data transformations needed to make machine learning work – data cleansing, feature engineering needed for the various machine learning algorithms such as scaling and converting data to match the algorithms. It does not currently generate domain-specific potential features/characteristics from raw data, instead making it easy for data and business analysts to create them and feed them into the modeling environment. Once data is loaded, some basic descriptive statistics are loaded and the tool recommends a measurement approach (to select between algorithms) based on the kind of data/target.
DataRobot can apply a wide variety of machine learning algorithms to these datasets, for now almost exclusively supervised learning techniques where a specific target is selected by the user. Multiple algorithms are run and DataRobot partitions data automatically to keep holdout data for validation (to prevent overfitting), applies smart downsampling to improve the accuracy of algorithms and allows some other advanced parameters to be configured for specific kinds of data. Once started, DataRobot looks at target variable, dataset, characteristics, combinations of characteristics and selects a set of machine learning algorithms/configurations (blueprints) to run. These then get trained and more “workers” can be configured to speed the time to complete, essentially spinning up more capacity for a specific job.
As the algorithms complete, the results are displayed on a leader board based on the measurement approach selected. DataRobot speeds this process by running the blueprints initially only against a subset of the data and then running the top ones against the full dataset. Users who are data scientists can investigate the blueprints, see exactly the approach taken for the blueprint in terms of algorithm configuration, data transformations etc. Key drivers- the features that make the most difference – are identified and a set of reason codes generated for each entry in the dataset. Several other descriptive elements, such as word clouds for text analytics, are also generated to allow models to be investigated.
The tool also has a UI for non-technical users. This skips the display of the leader board and internal status information and displays just a summary of the best model with its confusion matrix, lift and key drivers. A word cloud for text fields and a point and click deployment of a scoring UI (for batch scoring of a data file or scoring a single hand-entered record) complete the process. More advanced users can interact with the same projects, allowing the full range of deployment and reuse of projects created this way.
Once a model is done, the best way to deploy them is to use the DataRobot API. A REST API end point is generated for each model and can be used to score a record. All the fields used in the sample are used to create the REST API and the results come back with the reason codes generated. Everything to do with modeling is also available through an API, allowing customers to build applications that re-build and monitor models. Users can also generate code for models but this is discouraged.
You can get more information on DataRobot at http://datarobot.com
The Rexer Data Science survey is one of the best and longest running polls of data mining, analytic and data science professionals. I regularly refer to it and blog about it. It’s time to take this year’s survey – and the survey is aimed at all analytic people, no matter whether they consider themselves to be Data Analysts, Predictive Modelers, Data Scientists, Data Miners, Statisticians, Machine Learning Specialists or any other type of analytic person. Highlights of the 2017 survey results will be unveiled at Predictive Analytics World – NY in October, 2017 and the full 2017 Survey summary report will be available for free download from the Rexer Analytics website near the end of 2017.
The survey should take approximately 20 minutes to complete. Your responses are completely confidential.
SAP BusinessObjects Predictive Analytics 3.1 is the current release of the SAP predictive analytic suite. Like most in the analytics space, SAP sees its clients struggling to make use of massive amounts of data that are newly available while facing ever increasing business expectations, faster business decision cycles and an analytical skill gap. SAP therefore is focused on predictive analytic capabilities that:
The predictive analytics suite consists then of four elements:
These can access data from SAP HANA, SAP VORA, Hadoop/Spark, 3rd party databases and SAP HANA Cloud. And they can be embedded into SAP applications and other custom applications.
Four offerings package this up:
SAP is focused on speed, building models fast, but also on automating techniques. The assumption is that organizations need to manage hundreds or thousands of models and very wide data sets. Plus, for many SAP customers, SAP integration is obviously important. Finally, the suite is designed to support the whole analytic lifecycle.
The tools are moving to a new UI environment, replacing desktop tools with a browser-based environment. Predictive Factory was the first of these and more and more of the capabilities of the suite are being integrated, allowing Predictive Factory to be a single point of entry into the suite. As part of this integration and simplification, everything is being built to be effective with both SAP Hana and Hadoop. There is also an increasing focus on massive automation e.g. segmented modeling.
One of the most interesting features of the SAP BusinessObjects Predictive Analytics Suite is that there are two integrated perspectives – Automated Modeler and Predictive Composer. This allows data scientists and analytics professionals to build very custom models while also allowing less technical teams, or those with more projects to complete, to use the automation. All the models are stored and managed in Predictive Factory and Predictive Composer can be used to configure nodes for use in Automated Modeler. Predictive Factory also lets you create multiple projects across multiple servers etc. Existing models can be imported from previous tool versions or from PMML, new tasks (such as checking for data deviation or retraining models) can be created and scheduled to run asynchronously. Tasks can be monitored and managed, allowing large numbers of models to be created, supervised and updated.
The same automated algorithms can be accessed from the SAP BusinessObjects Cloud. Users can identify a dataset, identify something they are interested in and run automated modeling algorithms to see, for instance, what influences the data element of interest. This requires some understanding of the power and limitations of predictive analytics but no skill with the analytic technique itself. Data is presented along with some explanation and supporting text. The results can easily be integrated into stories being developed in the BI environment or applied to datasets. Over time, this capability will include all the capabilities of the on-premise solution.
Predictive Analytics Integrator allows these capabilities to be brought into SAP applications such as SAP Fraud Manager. Because SAP applications all site on SAP HANA, the Predictive Analytics Integrator is designed to make it easy to bring advanced analytics into the applications. Each application can develop a UI and use terminology that works for the application users while accessing all the underlying automation from the suite.
Predictive Analytics 3.2 in July will be the first release where the suite’s components are being integrated into the browser environment and the Predictive Composer name will be used. This release will not have 100% equivalence with the desktop install but will support the building and deployment of models using both the data scientist and automated tools.
You can get more information on the SAP BusinessObjects Predictive Analytics Suite here.
I recently worked with Tho Nguyen of Teradata on a white paper called Illuminate Dark Data for Deeper Insights
While organizations of all sizes across all industries are keen on becoming data-driven, most focus on only a fraction of the many types of available data. Not accessing a fuller spectrum of data, including those from “dark data”—emails, texts, images, photos, videos, and other documents—along with traditional data sources can limit an organization’s ability to gain a complete picture of their customers and operations, and exclude them from game-changing insights that improve business outcomes.
Dark data is defined by Gartner as “…the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes…” (Gartner IT Glossary) and this paper discussed what makes data dark, what kinds of data go dark and how new technologies are illuminating this dark data.
You can register for the paper here.
If you want to talk about the role of dark data in your analytic and decision management systems, drop me a line.
Reltio Cloud is a modern data management Platform as a Service (PaaS) company focused on delivering data-driven applications, founded in 2011 by folks from Siperian which was acquired by Informatica. Unlike most data integration and MDM platforms, which are IT-focused, Reltio’s mission to make it possible for business and IT teams in enterprises to “Be Right Faster” by building data-driven enterprise apps that deliver reliable data, relevant insights and recommended actions. They compare these applications, based on broadly sourced, cross-functional data, with the traditional approach that delivers process-driven and siloed data. With data-driven applications contextual, analytical and operational data can all be brought together. This requires a reliable data foundation.
Reltio Cloud is a modern data management Platform as a Service (PaaS) and it includes:
The graph schema is key to Reltio, allowing them to store both entities and their relationships in a semantically rich way. Data is stored in a combination of Apache Cassandra, graph technology, and in-memory structures such as Elastic. It offers an extensible structure for an organizations entities and relationships. The Reltio cloud collects data from multiple sources, matches, merges and relates them to create these relationship graphs and these graphics then underpin the data-driven applications being developed.
Reltio Insights shares objects (built from the profile and transaction data) with Reltio Cloud and analytics environments like Spark (either the Reltio platform or a customer’s own) to create analytic insights. These insights then get integrated with the master data so that these can be made available to data-driven applications. Reltio Insights is designed to rapidly provision master and transactional data into a Spark, environment. The resulting analytic insights are available throughout the Reltio environment, added to the data e.g. a customer’s churn propensity becomes an attribute of the customer profile.
The applications themselves can offer several different views – for instance, some users such as data stewards might see where the data came from and be able to interact with it to clean it up while others might only see the final, integrated view. A standard feature of the app is to visualize relationships, based on the underlying graph models. Some simple analysis, such as distribution of transactions by channel, can be easily included as can the results of more sophisticated analytics. Anything available in the Reltio data platform can be collaborated upon, managed and updated through data-driven operational applications. The data can then be used to drive analytical model development and provision the data to other operational applications. In addition, everything is tracked for audit and change purposes and the workflow engine can be used to manage requests for updates, changes etc.
Everything in the platform is available as HTML 5 widgets so that additional content like Google maps, can be easily embedded, and this means that Reltio content can also be easily embedded elsewhere. Many customers take advantage of this to mix and match Reltio content in other environments and vice versa. Similarly, all the data in Reltio Cloud is available from a REST API for use in all legacy operational and analytics systems.
You can get more information on Reltio here.
DecisionCAMP 2017 is coming up July 11-14, 2017 at Birkbeck College, University of London. This is going to be a great opportunity to learn about decision modeling, the Decision Model and Notation (DMN) standard and related topics. In fact the week is full of great things to do if you are in London or can make it there:
You can register for DecisionCAMP here.