Rohan Nevrikar

Cloud Consultant @ Rapid Circle

Managing Unity Catalog using REST APIs

Zell Am See, Austria (March 2023)

Intro

The purpose of this article is to help readers navigate through the different steps involved in using Unity Catalog using REST APIs. There is plenty of really good documentation available, but my personal experience was that it is a bit scattered and confusing in the beginning to navigate through all those docs in order to set up our first API request. This article is just going to be a quick setup guide, with links to relevant documents wherever required for deeper understanding.

Prerequisites

Unity Catalog is a data governance tool in Databricks. This article won’t cover different concepts related to Unity Catalog, as our focus is more on how to automate Unity Catalog operations. Hence, familiarity with Unity Catalog concepts is a prerequisite. This is a great document to understand Unity Catalog concepts – What is Unity Catalog.

We’ll also need to make sure that Unity Catalog is enabled. Here’s how to do it if haven’t been done already – Enabling Unity Catalog.

Setting up a metastore

Metastore is the top-level container for metadata. We’ll need to create a metastore first in order to work with other data assets in Unity Catalog.

Before creating the metastore, we’ll need some resources in Azure to be setup-

  1. ADLS Gen2- This is where Unity Catalog will store metadata
  2. Access Connector- This is a managed identity that can be configured in Databricks to access Azure resources like a data lake.
  3. Azure Databricks workspace- We’ll use this workspace to attach it to a Unity Catalog metastore.

Once the above resources are provisioned, one last thing we need to do is assign the appropriate RBAC on ADLS to the managed identity. We’ll go with Storage Blob Data Owner for now, but it is all good to evaluate what’s the least privileged role that can be used.


Let’s go to Databrick account -> Data. Click on “Create Metastore”

Fill out the details as shown in the screenshot below. Access Connector Resource ID can be found in the resource itself. The metastore’s region is important here as the region of workspaces whose metadata needs to be attached and the region of the metastore need to be the same. ADLS Gen2 path is the location where metadata will be stored and accessed by Unity Catalog.

Once the metastore is created, we can then attach our workspace with the newly created metastore. This means that all the data assets tied up with that workspace, and their metadata will be managed by Unity Catalog in that metastore.

Using the REST APIs

Databricks has very neat documentation when it comes to REST APIs. It is just that some minor things change based on which cloud provider is being used, and if the APIs are being called at the account level or workspace level. Here’s the link- Databrick API reference. At the time of writing this article, for some reason, there is no document available for Azure. It is not a problem as everything is pretty much the same for all the cloud providers.

In order to call the APIs, we’ll need to take care of authentication first. As we are working with Azure in this article, we’ll be using Azure AD token to authenticate. Personal Access Tokens can also be used, but they are not supported in case of account-level operations, and only support workspace-level operations.

We can generate Azure AD tokens for both users and service principals. Either way, the user or the service principal should have the appropriate privileges assigned in order to call the APIs. For example, if you want to create a catalog using a user token, then the user needs to be a metastore admin. Here’s a great document to understand privileges- Unity Catalog privileges and securable objects.

We’ll get a user Azure AD token for a user who is both account admin and metastore admin so that we are able to cover various operations using the same token (Of course this is not preferred in production. This is just me being lazy while writing this article.)

Use the code below to get the access token-

az login --tenant <tenant-id> --output table
az account set --subscription <subscription-id>
az account get-access-token \
--resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d \
--query "accessToken" \
--output tsv

And yes, the resource parameter here is a fixed GUID, and not a dynamic one. This is the resource ID of Azure Databricks service.

Now that we have the authentication bit sorted, we can start playing with the endpoints. Of course, there are a lot of operations, and that’s why we’ll see one endpoint of account-level and workspace-level each.

Account-level: Create a new group

To use account-level endpoints, you’ll need-

Hosthttps://accounts.azuredatabricks.net. This value will change if you are using AWS or GCP.
Authorization header– The token generated above
Account ID– You’ll find this in your Databricks account console


Depending on the operation, other components of the endpoint will change. For groups, it’ll end with /Groups (kinda obvious). The API request for creating a group would be like this-

curl -X POST --location 'https://accounts.azuredatabricks.net/api/2.0/accounts/<Account ID>/scim/v2/Groups' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <Azure AD token>' \
--data '{
  "displayName": "<Group name>"
}'

The above request with the response in Postman would look like this-


Upon successful completion, this is how it would look in Databricks account-

Workspace-level: Create a catalog

To use workspace-level endpoints, you’ll need-

Host– Your Databricks workspace URL. This is usually of the format “https://adb-xxxxxxxxx.xx.azuredatabricks.net/
Authorization header– The token generated above

To create a catalog, the API request would look something like this-

curl -X POST --location 'https://adb-xxxxxx.xx.azuredatabricks.net/api/2.1/unity-catalog/catalogs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <Azure AD Token>' \
--data '{
    "name": "uc-demo-catalog"
}'

And the Postman view-

And this is how it would look like in the workspace-

Conclusion

Unity Catalog is new for a lot of people and could be confusing when it comes to making the most out of it. Fortunately, I got a chance to work on it for one of the clients and got good exposure to it. I know this article doesn’t deep dive into Unity Catalog concepts, but hopefully, it helps while doing hands-on. Feel free to reach out if I can help with anything else related to Unity Catalog and Databricks administration. Cheers!

Published by

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: