First session today was some folks from Allstate Financial talking about the Impact of Service Oriented Architecture on Data Modeling: A Case Study. AllState Financial has the large number of data sources typical of a large corporation. Each line of business has its own administrative systems and mergers and acquisitions also create new data sources. Data could be common across these systems and each could use a different kind of database storage or MS Office files. Defining the systems of record is challenging and data coordination is very difficult. The data is also very long living – life policies and annuities, for instance, must be kept for a long time. Yet it is not stable as the product mix, and the data requirements for those products, changes constantly. Finally they must support XBRL and ACORD in a service-oriented architecture. An integration hub is used to bring multiple producers and consumers together to support the various systems.
Complexity tends to increase over time, resulting in more logical and physical data models and more database implementations. When supporting SOA they rationalized the logical models into a single logical model. A physical model was created to create a common name space but not implemented as an actual database. This physical model is used by the ETL processes and implements the industry standards like ACORD. Individual physical models for specific databases are created from this model and specific implementations created on various DBMS.
Building a standard model involved agreeing on the meaning and usage of the data elements in the model. The model is not static – it evolves – but is agreed across different lines of business and subject areas. SOA makes it more necessary than ever to have agreed definitions. They used ACORD as a starting point but just “dated” the standard rather than being “married to it”.
Developing a common name space was critical to bring common meaning to data and to driving a rules-based approach to naming attributes. The name space is also referred to as “tags” that are used in XML for ETL, for instance. 21.5% of their data is codified (ACORD has 25%) so it can be managed and shared. One of the challenges is that enumerations can be very different across systems with both different labels for values and different numbers of values. To resolve this they have a repository of enumerated values that are exposed by the various systems. Each system registers its enumerated values in the repository and tries to map to a domain – partly automated, partly manual using a group of data governance folks. Once this is all done the resulting information is pushed out to an operational database for query support so that queries pick up common code definitions allowing them, for instance, to join across systems.
Derivations, to create new data using transformation logic, are increasingly common for analytics, dashboards and so on. It is critical to ensure that these derivations are common. For instance a group might use an identifier that is not atomic but built from various elements while another group might use the same identifier but different derivation or the same derivation but a different name. A single data mart was built to store pre-calculated business logic to rationalize down the number of places some was derived. ETL processes to populate data marts then call the common services to use this common logic. Users can see their reports using their terminology but common logic is used to handle calculations.
Hierarchies are another element that must be standardized. A hierarchy handles drill-down and aggregation logic to drive trending and forecasting, for instance. As before, hierarchies could be used by multiple groups and maintaining common hierarchies can be tricky given different perspectives. To make this work they took the codes management structure they had developed and created each level in the hierarchy as a code element. This meant that everything was being managed in one location and a common service could be built to use this and deliver the hierarchy.
SOA and common services for hierarchies, codes etc helps ensure that data quality can be improved across all the services.
I wonder, do you really need to take control of the whole data model to do SOA?