Final session today focused on systems and architecture for Big Data Analytics. It began by talking about the friction between business and IT and how this is increasing, especially around information and analytics where business users want to be able to work with data without worrying about IT. This creates challenges for IT specifically:
- The data furball Data governance and quality initiatives are long running and complex yet don’t seem to prevent silos and data inconsistency, both of which can be a challenge for analytics. Increasing needs for speed, big data volumes and velocity don’t mean that old problems go away.
- Becoming agile and incremental As the world becomes more fast moving, especially with respect to data, it becomes essential that data activities like governance become more incremental. It must be possible to do more of this as data is needed or adopted.
- Deliver lower latency And all of this needs to be done faster so there is less latency in getting data from input to decision-making. The data, or analytic, supply chain must match the timeliness or latency of the business. Batch data preparation may be fine sometimes but not others for instance. The speed of decision-making is driven by business need and the data pipeline has to support it.
- Reduce IT costs Eliminate or reduce costs caused by bad data as well as the “raw” cost of the infrastructure. Plus it must be proportional to the value being created, incurred more gradually not monolithic. Cloud, of course, is a big potential driver of cost reduction.
- Provide security and compliance Finally all this has to remain secure and compliant, especially as companies use the cloud more, access more external data and adopt hybrid cloud/on premise architecture.
From a solution perspective IBM sees the need for a data reservoir (data lake) based on but not limited to Hadoop. ING came on stage to join the IBM team and talk about their big data infrastructure. ING wanted to bring all their data together but it was essential to them that they could do this under control so they could address their regulatory and compliance needs. They saw a mix of IBM and open source technologies as appropriate and developed an evolving architecture to deliver it. Short, rapid iterations have been key to selling this value – 5 week proof of value projects for instance -as well as working top-down with the board of directors. They have also found that using simple analogies helped the business see the value and that it was critical to do this incrementally.
From a metadata perspective they see a move to a more incremental and crowd sourced approach. Moving to Big Data can hide metadata back in code again if you are not careful. Learning as people do things is likely to be key given have fast things can move. Of course some things need more control and more centralized management.
APIs are another aspect of this infrastructure – data and APIs are increasingly coming together and must be used together in solutions. Analytic digital transformation is generally driving customer engagement. But to get real value this must also transform the operational environment through automation. APIs allow data to be made more available and allow analytics to be driven into code. This is what will drive a cognitive business. Making this work requires old apps to be exposed, new open APIs to be used, different speeds of interaction to be managed. A hybrid cloud of micro services seems to work best.
IBM sees a final model that has several elements:
- Discovery, insight, search, visualization and self service for analysis
- Collaboration and governance
- Information visualization fabric that delivers a consistent and holistic view
- Built on a ubiquitous open source model
This has to enable and empower rapid, iterative self-service as well as more formal data science access and rapid deployment of the results.