Flatten nested attributes

Learn about extracting and flattening nested and reference attributes into a custom view.

All Reltio objects - entities, relations, and interactions - are hierarchical. Reltio data pipelines write data related to these objects into tables in destination systems.

This data includes customer data — written as a complex data type in the Attributes column — and Reltio system-generated data for each entity — written as simple data type such as Entity URI, Entity ID, created time, and so on.

In the data warehouses, we maintain data in a tabular form with each entity in a separate row. You can query this data and analyze it as per your requirements.

Some of the attributes are nested attributes, which have several levels of nesting. So, how can one query data in a nested attribute? To achieve this, we flatten the hierarchical data by exploding the nested attribute into separate rows, and associating each row with with simple sub-attributes nested within it or simple attributes outside the nested attribute.

Query your dataset

When you want to associate the exploded data with the sub-attributes of the nested attribute or the simple attributes at the same level as the nested attribute, ensure all these attributes have a single surviving value (single OV) corresponding to each exploded value.

Queries are easy to use, depending on whether the data sets are written with or without the following properties provided by Reltio's data pipelines:

the value of the isOV filter
- If set to true, all non-operational values are skipped at every level of the hierarchy
- If set to false, non-operational values are included along with the operating values.

the value of the flattenAttribute filter

If set to false, the value of each attribute is written as shown below:

array of
	struct
	{
	   numeric Id,  		            // Unique id for current val of this attribute type
	   boolean isOv,		            // Indicates whether the current value is OV or not

           struct<attrType> value	// If current attribute is a nested attribute.
					                    // Each member of struct<attrType> corresponds
					                    // to each sub-attribute nested within current
					                    // attribute, and is in turn an array of the
					                    // self-same struct {Id, isOv, value} above.
		OR
	  <simpleAttrType> value	// If current attribute is a simple attribute.
	}

If set to true, the value of each attribute is written as a simple array, as shown below

	array of struct<attrType>	// If current attribute is a nested attribute.
					                                // Each member of struct<attrType> corresponds
					                                // to each sub-attribute nested within current
					                                // attribute, and is in turn an array of all
					                                // the values of that sub-attribute directly.
	OR
	array of <simpleAttrType> 	            // if current attribute is a simple attribute.

Let's consider the Individual entity type, which has the Address as a nested attribute. In this view the Addresses attribute is flattened to separate columns like AddressLine1, AddressLine2, City, StateProvince, Country, and Zip5. If you have multiple addresses for a single individual, then it's flattened and extracted to separate rows.

Let's take a look at how the Address nested attribute is flattened and exploded in different scenarios.

Datasets without OV Filter

In case of datasets without OV filter, i.e. where both the OV and non-OV values are available at all levels, you flatten the entity dataset by exploding the nested attribute and:

extract the single-OV of AddressLine1, AddressLine2, City, StateProvince attributes from the immediate children. Furthermore, extract the single-OV of Zip4 and PostalCode sub-attributes from the single-OV of the Zip attribute.
associate each address with the top-level columns URI, entityID, and createdtime, along with the single-survived value of the top-level attributes FirstName and LastName. .

Note: In case of datasets without OV filter, the number of rows will be a total of the OV and non-OV addresses for each individual entity.

Here's a sample query (where we have highlighted the address fields just for your reference):

When using a query in GBQ:

SELECT
	Uri AS IndividualEntityURI,
	Id AS IndividualEntityID,
	createdTime AS IndividualEntityCreatedTime,
	FirstName.value AS FirstName,
	LastName.value AS LastName,
	Address.Id AS AddressID, AddressLine1.value AS AddressLine1, AddressLine2.value AS AddressLine2, City.value AS City, StateProvince.value AS StateProvince, Zip5.value AS Zip5, PostalCode.value AS PostalCode
FROM 
	entity_Individual
		LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Individual.attributes.FirstName)) AS FirstName
		LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Individual.attributes.LastName)) AS LastName
		LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Individual.attributes.Address)) AS Address LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.AddressLine1)) AS AddressLine1 LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.AddressLine2)) AS AddressLine2 LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.City)) AS City LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.StateProvince)) AS StateProvince LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.Zip)) AS Zip LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Zip.value.Zip5)) AS Zip5 LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Zip.value.PostalCode)) AS PostalCode
WHERE
	(FirstName IS NULL OR BOOL(FirstName.isOv) = TRUE)
	AND (LastName IS NULL OR BOOL(LastName.isOv) = TRUE)
	AND (AddressLine1 IS NULL OR BOOL(AddressLine1.isOv) = TRUE) AND (AddressLine2 IS NULL OR BOOL(AddressLine2.isOv) = TRUE) AND (City IS NULL OR BOOL(City.isOv) = TRUE) AND (StateProvince IS NULL OR BOOL(StateProvince.isOv) = TRUE) AND (Zip IS NULL OR BOOL(Zip.isOv) = TRUE) AND (Zip5 IS NULL OR BOOL(Zip5.isOv) = TRUE) AND (PostalCode IS NULL OR BOOL(PostalCode.isOv) = TRUE)

When using a query in Databricks:

SELECT
	Uri AS IndividualEntityURI,
	Id AS IndividualEntityID,
	createdTime as IndividualEntityCreatedTime,
	FirstName.value AS FirstName,
	LastName.value AS LastName,
	Address.uri as AddressURI, AddressLine1.value AS AddressLine1, AddressLine2.value AS AddressLine2, City.value AS City, StateProvince.value AS StateProvince, Zip5.value AS Zip5, PostalCode.value AS PostalCode
FROM
	entity_Individual
	LATERAL VIEW OUTER explode(attributes_Individual.FirstName) AS FirstName
	LATERAL VIEW OUTER explode(attributes_Individual.LastName) AS LastName
	LATERAL VIEW OUTER explode(attributes_Individual.Address) AS Address LATERAL VIEW OUTER explode(Address.value.AddressLine1) AS AddressLine1 LATERAL VIEW OUTER explode(Address.value.AddressLine2) AS AddressLine2 LATERAL VIEW OUTER explode(Address.value.City) AS City LATERAL VIEW OUTER explode(Address.value.StateProvince) AS StateProvince LATERAL VIEW OUTER explode(Address.value.Zip) AS Zip LATERAL VIEW OUTER explode(Zip.value.Zip5) AS Zip5 LATERAL VIEW OUTER explode(Zip.value.PostalCode) AS PostalCode
WHERE
	(FirstName is null or FirstName.ov = true)
	AND (LastName is null or LastName.ov = true)
	AND (AddressLine1 is null or AddressLine1.ov = true) AND (AddressLine2 is null or AddressLine2.ov = true) AND (City is null or City.ov = true) AND (StateProvince is null or StateProvince.ov = true) AND (Zip is null or Zip.ov = true) AND (Zip5 is null or Zip5.ov = true) AND (PostalCode is null or PostalCode.ov = true)

Datasets with OV filter

In case of datasets having only OVs of attributes at all levels, you flatten the dataset by exploding the nested attribute and:

extract the single-OV of the desired sub-attributes
associate the single-OV of the desired sibling attributes.

Taking the above mentioned Address nested attribute of the Individual entity, if each of the sub-attributes has only 1 surviving value, there's no need to explode any of these sub-attributes. The very first element of each sub-attribute is picked up from the dataset.

Note: In this case, the number of rows will be the number of OV addresses for each individual entity.

For the nested sub-attributes within the Address attribute, for example the Zip sub-attribute, the first element in the Zip array will encapsulate the desired sub-attributes Zip4 and PostalCode.

For the attributes on the same level as the Address attribute, the first element of each sub-attribute array is picked up.

Here's a sample query:

When using a query in GBQ:

SELECT
	Uri AS entityUri,
	Id AS IndividualId,
	createdTime AS entityCreatedTime,
	Address.Id AS AddressId,
	Address.value.AddressLine1[0].value AS AddressLine1,
	Address.value.AddressLine2[0].value AS AddressLine2,
	Address.value.City[0].value AS City,
	Address.value.StateProvince[0].value AS StateProvince,
	Address.value.Zip[0].value.Zip4[0].value AS Zip4,
	Address.value.Zip[0].value.PostalCode[0].value AS PostalCode
FROM
	`customer-facing.dph_test_ceEW27DYnQ6TaSY_json.entity_Individual` AS Individual
	 LATERAL VIEW OUTER explode(attributes_Individual.Address) AS Address

When using a query in Databricks:

SELECT
	Uri AS IndividualEntityURI,
	Id AS IndividualID,
	createdTime AS IndividualEntityCreatedTime,
	attributes.FirstName[0].value AS FirstName,
	attributes.LastName[0].value AS LastName,
	Address.uri AS addressURI,
	Address.value.AddressLine1[0].value AS AddressLine1,
	Address.value.AddressLine2[0].value AS AddressLine2,
	Address.value.City[0].value AS City,
	Address.value.StateProvince[0].value AS StateProvince,
	Address.value.Zip[0].value.Zip5[0].value AS Zip5,
	Address.value.Zip[0].value.PostalCode[0].value AS PostalCode
FROM
	entity_Individual
	LATERAL VIEW OUTER explode(attributes_Individual.Address) AS Address

Datasets with OV filter and flattenAttribute=true

In this case, the Reltio data pipeline writes the OVs of each sub-attribute as a single value or an array of values of the respective type, without wrapping each value under a wrapper structure of {"Id", "value"}. To get each address, you'll need to explode the 'Address' attribute as outlined in previous examples.

Note: In this case, the number of rows will be the number of OV addresses for each individual entity.

With each address, if the sub-attributes have only 1 surviving value, there is no need to explode any of the sub-nested attributes. Select the first element of each sub-attribute array. For the attributes on the same level as the Address attribute, the first element of each sub-attribute array is picked up.

Here's a sample query:

When using a query in GBQ:

SELECT
	Uri AS IndividualEntityURI,
	Id AS IndividualEntityID,
	createdTime AS IndividualEntityCreatedTime,
	attributes.FirstName[0] AS FirstName,
	attributes.LastName[0] AS LastName,
	Address.AddressLine1[0] AS AddressLine1,
	Address.AddressLine2[0] AS AddressLine2,
	Address.City[0] AS City,
	Address.StateProvince[0] AS StateProvince,
	Address.Zip[0].Zip5[0] AS Zip5,
	Address.Zip[0].PostalCode[0] AS PostalCode
FROM
	`customer-facing.dph_test_ceEW27DYnQ6TaSY_json_ov_flatten.entity_Individual` AS Individual
	LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Individual.attributes.Address)) AS Address

When using a query in Databricks:

SELECT
	Uri as IndividualEntityURI,
	Id as IndividualID,
	createdTime as IndividualEntityCreatedTime,
	attributes.FirstName[0] AS FirstName,
	attributes.LastName[0] AS LastName,
	Address.AddressLine1[0] AS AddressLine1,
	Address.AddressLine2[0] AS AddressLine2,
	Address.City[0] AS City,
	Address.StateProvince[0] AS StateProvince,
	Address.Zip[0].Zip5[0] AS Zip5,
	Address.Zip[0].PostalCode[0] AS PostalCode
FROM
	entity_Individual
	LATERAL VIEW OUTER explode(attributes_Individual.Address) AS Address

Unify and manage your data

Flatten nested attributes

Query your dataset

Datasets without OV Filter

Datasets with OV filter

Datasets with OV filter and flattenAttribute=true

Footer