I got a briefing this week from my friends at Tibco about their Service Performance Manager product released a couple of months ago. The product is a big step along the road to what some call “autonomic computing” in that it provides dynamic and automated monitoring and correction of service levels in a service-oriented world.
The product combines Tibco’s BusinessEvents (rule engine), SOA products, is integrated with ActiveMatrix and wraps this all up for monitoring and management of complex, service-oriented environments. SOA has, of course, made monitoring and management much more complex. In an SOA environment defining and then managing to Service Level Agreements is difficult.
The product allows the definition of SLAs (for services, parts of services or complete BPEL transactions) and then tracks to see if these are being met. The product discovers services, measures observable aspects, analyzes and predicts behavior, monitors and acts. This includes predictive analysis of services trending to failure, as well as monitoring of the SLAs using explicit rules, and it will both report and act in response to problems. This response might include re-provisioning new resources to maintain service levels or borrowing resources from other services. The role-based dashboards allow users to figure out why they are having problems and do some computing capacity planning.
Some time ago I wrote about the need for this kind of “smart” decision-making solution to deliver on autonomic computing. If you examined such a solution, I said:
- It would use business rules to record SLAs, procedures, best practices, rules of thumb from experts as to how to respond to particular failures, how to interpret readings, how to select new routings around failed equipment.
- It would use predictive analytics to turn historical log data into executable predictive models. These might take temperature data and use it to predict the likelihood of failure of a piece of hardware or traffic data to predict a bottleneck. Other models might use Neural Network technology to “learn” from patterns of data and identify unusual variations and even unusual variations from the usual variations.
- Clearly these predictions result in the need for more rules to deal with these predictions – I might behave differently in response to a potential for overheating when traffic is low and I have spare capacity than when it is really high and I do not, for instance.
- Lastly companies might want to run simulation of how a set of decisions might impact uptime, throughput etc for different constraints so as to develop optimal strategies over time.
Pretty close to what the folks at Tibco have done for service performance management. Autonomic computing may still be a vision but people are already using business rules and analytics, enterprise decision management (EDM), to deliver on some of its promises.