Flatten nested attributes
Learn about extracting and flattening nested and reference attributes into a custom view.
All Reltio objects - entities, relations, and interactions - are hierarchical. Reltio data pipelines write data related to these objects into tables in destination systems.
This data includes customer data — written as a complex data type in the Attributes column — and Reltio system-generated data for each entity — written as simple data type such as Entity URI, Entity ID, created time, and so on.
In the data warehouses, we maintain data in a tabular form with each entity in a separate row. You can query this data and analyze it as per your requirements.
Some of the attributes are nested attributes, which have several levels of nesting. So, how can one query data in a nested attribute? To achieve this, we flatten the hierarchical data by exploding the nested attribute into separate rows, and associating each row with with simple sub-attributes nested within it or simple attributes outside the nested attribute.
Query your dataset
When you want to associate the exploded data with the sub-attributes of the nested attribute or the simple attributes at the same level as the nested attribute, ensure all these attributes have a single surviving value (single OV) corresponding to each exploded value.
Queries are easy to use, depending on whether the data sets are written with or without the following properties provided by Reltio's data pipelines:
-
the value of the
isOV
filter-
If set to true, all non-operational values are skipped at every level of the hierarchy
-
If set to false, non-operational values are included along with the operating values.
-
-
the value of the
flattenAttribute
filter-
If set to false, the value of each attribute is written as shown below:
array of struct { numeric Id, // Unique id for current val of this attribute type boolean isOv, // Indicates whether the current value is OV or not struct<attrType> value // If current attribute is a nested attribute. // Each member of struct<attrType> corresponds // to each sub-attribute nested within current // attribute, and is in turn an array of the // self-same struct {Id, isOv, value} above. OR <simpleAttrType> value // If current attribute is a simple attribute. }
-
If set to true, the value of each attribute is written as a simple array, as shown below
array of struct<attrType> // If current attribute is a nested attribute. // Each member of struct<attrType> corresponds // to each sub-attribute nested within current // attribute, and is in turn an array of all // the values of that sub-attribute directly. OR array of <simpleAttrType> // if current attribute is a simple attribute.
-
Let's consider the Individual entity type, which has the Address as a nested attribute. In this view the Addresses
attribute is flattened to separate columns like AddressLine1
, AddressLine2
, City
, StateProvince
, Country
, and Zip5
. If you have multiple addresses for a single individual, then it's flattened and extracted to separate rows.
Let's take a look at how the Address nested attribute is flattened and exploded in different scenarios.
Datasets without OV Filter
In case of datasets without OV filter, i.e. where both the OV and non-OV values are available at all levels, you flatten the entity dataset by exploding the nested attribute and:
-
extract the single-OV of
AddressLine1
,AddressLine2
,City
,StateProvince
attributes from the immediate children. Furthermore, extract the single-OV of Zip4 and PostalCode sub-attributes from the single-OV of the Zip attribute. -
associate each address with the top-level columns URI, entityID, and createdtime, along with the single-survived value of the top-level attributes FirstName and LastName. .
Note: In case of datasets without OV filter, the number of rows will be a total of the OV and non-OV addresses for each individual entity.
Here's a sample query (where we have highlighted the address fields just for your reference):
-
When using a query in GBQ:
SELECT
Uri AS IndividualEntityURI,
Id AS IndividualEntityID,
createdTime AS IndividualEntityCreatedTime,
FirstName.value AS FirstName,
LastName.value AS LastName,
Address.Id AS AddressID, AddressLine1.value AS AddressLine1, AddressLine2.value AS AddressLine2, City.value AS City, StateProvince.value AS StateProvince, Zip5.value AS Zip5, PostalCode.value AS PostalCode
FROM
entity_Individual
LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Individual.attributes.FirstName)) AS FirstName
LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Individual.attributes.LastName)) AS LastName
LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Individual.attributes.Address)) AS Address LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.AddressLine1)) AS AddressLine1 LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.AddressLine2)) AS AddressLine2 LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.City)) AS City LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.StateProvince)) AS StateProvince LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Address.value.Zip)) AS Zip LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Zip.value.Zip5)) AS Zip5 LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Zip.value.PostalCode)) AS PostalCode
WHERE
(FirstName IS NULL OR BOOL(FirstName.isOv) = TRUE)
AND (LastName IS NULL OR BOOL(LastName.isOv) = TRUE)
AND (AddressLine1 IS NULL OR BOOL(AddressLine1.isOv) = TRUE) AND (AddressLine2 IS NULL OR BOOL(AddressLine2.isOv) = TRUE) AND (City IS NULL OR BOOL(City.isOv) = TRUE) AND (StateProvince IS NULL OR BOOL(StateProvince.isOv) = TRUE) AND (Zip IS NULL OR BOOL(Zip.isOv) = TRUE) AND (Zip5 IS NULL OR BOOL(Zip5.isOv) = TRUE) AND (PostalCode IS NULL OR BOOL(PostalCode.isOv) = TRUE)
-
When using a query in Databricks:
SELECT Uri AS IndividualEntityURI, Id AS IndividualEntityID, createdTime as IndividualEntityCreatedTime, FirstName.value AS FirstName, LastName.value AS LastName, Address.uri as AddressURI, AddressLine1.value AS AddressLine1, AddressLine2.value AS AddressLine2, City.value AS City, StateProvince.value AS StateProvince, Zip5.value AS Zip5, PostalCode.value AS PostalCode FROM entity_Individual LATERAL VIEW OUTER explode(attributes_Individual.FirstName) AS FirstName LATERAL VIEW OUTER explode(attributes_Individual.LastName) AS LastName LATERAL VIEW OUTER explode(attributes_Individual.Address) AS Address LATERAL VIEW OUTER explode(Address.value.AddressLine1) AS AddressLine1 LATERAL VIEW OUTER explode(Address.value.AddressLine2) AS AddressLine2 LATERAL VIEW OUTER explode(Address.value.City) AS City LATERAL VIEW OUTER explode(Address.value.StateProvince) AS StateProvince LATERAL VIEW OUTER explode(Address.value.Zip) AS Zip LATERAL VIEW OUTER explode(Zip.value.Zip5) AS Zip5 LATERAL VIEW OUTER explode(Zip.value.PostalCode) AS PostalCode WHERE (FirstName is null or FirstName.ov = true) AND (LastName is null or LastName.ov = true) AND (AddressLine1 is null or AddressLine1.ov = true) AND (AddressLine2 is null or AddressLine2.ov = true) AND (City is null or City.ov = true) AND (StateProvince is null or StateProvince.ov = true) AND (Zip is null or Zip.ov = true) AND (Zip5 is null or Zip5.ov = true) AND (PostalCode is null or PostalCode.ov = true)
Datasets with OV filter
In case of datasets having only OVs of attributes at all levels, you flatten the dataset by exploding the nested attribute and:
-
extract the single-OV of the desired sub-attributes
-
associate the single-OV of the desired sibling attributes.
Taking the above mentioned Address
nested attribute of the Individual
entity, if each of the sub-attributes has only 1 surviving value, there's no need to explode any of these sub-attributes. The very first element of each sub-attribute is picked up from the dataset.
For the nested sub-attributes within the Address attribute, for example the Zip sub-attribute, the first element in the Zip array will encapsulate the desired sub-attributes Zip4 and PostalCode.
For the attributes on the same level as the Address attribute, the first element of each sub-attribute array is picked up.
Here's a sample query:
-
When using a query in GBQ:
SELECT
Uri AS entityUri,
Id AS IndividualId,
createdTime AS entityCreatedTime,
Address.Id AS AddressId,
Address.value.AddressLine1[0].value AS AddressLine1,
Address.value.AddressLine2[0].value AS AddressLine2,
Address.value.City[0].value AS City,
Address.value.StateProvince[0].value AS StateProvince,
Address.value.Zip[0].value.Zip4[0].value AS Zip4,
Address.value.Zip[0].value.PostalCode[0].value AS PostalCode
FROM
`customer-facing.dph_test_ceEW27DYnQ6TaSY_json.entity_Individual` AS Individual
LATERAL VIEW OUTER explode(attributes_Individual.Address) AS Address
-
When using a query in Databricks:
SELECT Uri AS IndividualEntityURI, Id AS IndividualID, createdTime AS IndividualEntityCreatedTime, attributes.FirstName[0].value AS FirstName, attributes.LastName[0].value AS LastName, Address.uri AS addressURI, Address.value.AddressLine1[0].value AS AddressLine1, Address.value.AddressLine2[0].value AS AddressLine2, Address.value.City[0].value AS City, Address.value.StateProvince[0].value AS StateProvince, Address.value.Zip[0].value.Zip5[0].value AS Zip5, Address.value.Zip[0].value.PostalCode[0].value AS PostalCode FROM entity_Individual LATERAL VIEW OUTER explode(attributes_Individual.Address) AS Address
Datasets with OV filter and flattenAttribute=true
In this case, the Reltio data pipeline writes the OVs of each sub-attribute as a single value or an array of values of the respective type, without wrapping each value under a wrapper structure of {"Id", "value"}. To get each address, you'll need to explode the 'Address' attribute as outlined in previous examples.
With each address, if the sub-attributes have only 1 surviving value, there is no need to explode any of the sub-nested attributes. Select the first element of each sub-attribute array. For the attributes on the same level as the Address attribute, the first element of each sub-attribute array is picked up.
Here's a sample query:
-
When using a query in GBQ:
SELECT
Uri AS IndividualEntityURI,
Id AS IndividualEntityID,
createdTime AS IndividualEntityCreatedTime,
attributes.FirstName[0] AS FirstName,
attributes.LastName[0] AS LastName,
Address.AddressLine1[0] AS AddressLine1,
Address.AddressLine2[0] AS AddressLine2,
Address.City[0] AS City,
Address.StateProvince[0] AS StateProvince,
Address.Zip[0].Zip5[0] AS Zip5,
Address.Zip[0].PostalCode[0] AS PostalCode
FROM
`customer-facing.dph_test_ceEW27DYnQ6TaSY_json_ov_flatten.entity_Individual` AS Individual
LEFT JOIN UNNEST(JSON_EXTRACT_ARRAY(Individual.attributes.Address)) AS Address
-
When using a query in Databricks:
SELECT Uri as IndividualEntityURI, Id as IndividualID, createdTime as IndividualEntityCreatedTime, attributes.FirstName[0] AS FirstName, attributes.LastName[0] AS LastName, Address.AddressLine1[0] AS AddressLine1, Address.AddressLine2[0] AS AddressLine2, Address.City[0] AS City, Address.StateProvince[0] AS StateProvince, Address.Zip[0].Zip5[0] AS Zip5, Address.Zip[0].PostalCode[0] AS PostalCode FROM entity_Individual LATERAL VIEW OUTER explode(attributes_Individual.Address) AS Address