So assuming you have a program that produces JSON then the process is straightforward. The database administrator adds a JSON column along with any necessary ETL. They can then add an SQL view on the column, exposing some or all of the named value pairs as columns. Once this is done then users can query and access the data in the JSON object as though it is a set of fields.As new fields are added to the JSON document they flow in automatically and can then be simply added to the view so that they are available in the query or analytic environment being used. This should take days or hours instead of weeks or months. The use of a view means that BI or analytic tools that don’t understand JSON can still access the data while Teradata SQL can access the data inside the JSON field without using a view. Once the JSON data is loaded all the standard Teradata functions are available like joins, sorts/sums,indexes and even support for columnar access or temporal data management (though you probably don’t want to). The JSON object still exists and can be loaded up directly into a program (the standard Java dot notation is used) and full advantage taken of late binding of the JSON schema etc.
- The flexibility of JSON as a format is going to be front and center as it allows data with rapid format changes to be effectively stored and managed. By allowing schema on read it avoids the need to shred the JSON data into columns initially.
- Complex datasets like bills of material and complex financial instruments are a strong secondary use case for it.
- Finally those scenarios where the schema is not known in advance will benefit from JSON.
Obviously there are challenges to deal with in this approach such considering the performance of queries against new data attributes, the need to handle nulls in records that pre-date a change in the JSON schema etc. Indexing high access name-value pairs can make a big difference. In addition some JSON data is likely to duplicate data in the warehouse – some parts of a JSON document might well be the same data as is already managed in the warehouse – and this will need to be managed by the DBA. Similarly schema on read access means no referential integrity, data lineage, cleansing and more as well as CPU time at read. As with everything, therefore, there are tradeoffs. Some JSON data might be better shredded into specific columns at load time while some is going to fit the new JSON column type.
There’s an interesting discussion around JSON and the tradeoffs between the use of schema on read (more agile, less quality control) and the traditional schema-first of data warehousing (less agile, more control). Each company, each project,will have to make this choice thoughtfully going forward.