Data Flows in the Azure Data Factory is a very handy tool for ETL (Extract Transform Load) as well as data cleanup work. It helps you load unclean data then clean them up using various in-built steps then finally publish the data into the destinations. It can join data from various sources to one stream then publish data into various destinations.
In this example, I will show you how to filter dataset based on the child array in a json file. So, see the sample of the orders json file below.
[
{
"custId": 43242,
"deliveryAddress": "1 Newton Rd CA 534534",
"orderId": 768,
"orderComments": "Please deliver on time",
"orderAmount": 432.65,
"orderDate": "2021-02-08",
"items": [
{
"name": "USB Powerbank",
"category": "accessories",
"quantity": 1,
"model": "CR432"
},
{
"name": "Computer monitor",
"category": "electronics",
"quantity": 1,
"model": "MT43"
},
{
"name": "Thosiba Laptop",
"category": "laptop",
"quantity": 2,
"model": "US27"
}
]
},
{
"custId": 8764,
"deliveryAddress": "1 Newton Rd CA 534534",
"orderId": 864,
"orderComments": "Please deliver on time",
"orderAmount": 432.65,
"orderDate": "2021-02-08",
"items": [
{
"name": "USB Powerbank",
"category": "accessories",
"quantity": 1,
"model": "CR432"
},
{
"name": "Computer monitor",
"category": "electronics",
"quantity": 1,
"model": "MT43"
}
]
},
{
"custId": 9765,
"deliveryAddress": "1 Newton Rd CA 534534",
"orderId": 324,
"orderComments": "Leave it outside",
"orderAmount": 499.65,
"orderDate": "2021-02-08",
"items": [
{
"name": "Acer Laptop",
"category": "laptop",
"quantity": 1,
"model": "LP542"
},
{
"name": "LCD monitor",
"category": "electronics",
"quantity": 1,
"model": "MT33"
}
]
}
]
From the above orders data, I want to filter the customers who has ordered at least one laptop inside the items array.
Create a Data Set for the the JSON data
You need a service connection to connect to the Azure storage account and upload the json file in the preferred folder.
Create a data flow based on the dataset created in the earlier stage
You can enable the debug mode to see the preview of the data. See below, I have three customer orders and I want to find the customers who has ordered at least one laptop.
As you can see from above, the items child array has the list of items each customer has ordered. So there are three orders displayed in this dataset.
Let’s add a filter to filter the data who has ordered category laptop using contains
function. Add a filter step and add the following expression in the expression section.
contains(items, #item.category == "laptop")
Now click on the preview tab to see the filtered data. In the sample data above we have seen there are two orders that contains laptop category. Examine the filtered rows and expand the items. It should only return the rows that has laptop. See the picture below.