Databricks

Databricks is a data lakehouse that unifies the best of data warehouses and data lakes in one simple platform to handle all your data, analytics and AI use cases. It’s built on an open and reliable data foundation that efficiently handles all data types and applies one common security and governance approach across all of your data and cloud platforms.

1. Generate a Databricks access token

You must have an access token in order for Metaplane to access Databricks. It is recommended that you create a service principal for Metaplane and generate an access token for that service principal. Alternatively you can use a personal access token for your user.

Generate personal access token for service principal

1. Create a service principal for Metaplane

Follow the instructions here to create a service principal using the Databricks API. Take note of the service principal's application id and save it somewhere safe.

2. Grant token usage to service principal in workspace

Follow the instructions here to give Metaplane's service principal permissions to use access tokens.

3. Generate an access token for Metaplane's service principal

Follow the instructions here to generate an access token for Metaplane's service principal. If you want Metaplane's connection to Databricks to be uninterrupted, set lifetime_seconds to null to prevent the token from expiring. Save this access token somewhere safe.

Generate personal access token for your user

Follow the instructions here to generate a personal access token for your user. Save this access token somewhere safe.

2. Grant permission to data to Metaplane's service principal

Run these commands on each catalog you want Metaplane to have access to.

Unity Catalog

Grant Metaplane access to tables to monitor

Grant access to all existing and future tables within catalog

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<application_id>`;
GRANT USE_SCHEMA ON CATALOG <catalog_name> TO `<application_id>`;
GRANT SELECT ON CATALOG <catalog_name> TO `<application_id>`;

Grant access to specific tables within catalog

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<application_id>`;
GRANT USE_SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO `<application_id>`;
GRANT SELECT ON TABLE <catalog_name>.<schema_name>.<table_name> TO `<application_id>`;

Grant Metaplane access to query history and column_lineage system tables for Data Insights

  1. Enable the query and access system schemas using the Databricks API. Follow instructions here. The API request will look like: curl -v -X PUT -H "Authorization: Bearer <token>" "<workspace url>/api/2.0/unity-catalog/metastores/<metastore id>/systemschemas/query"
  2. Grant access to the query history system table and access column_lineage system tables
GRANT USE_SCHEMA ON SCHEMA system.query TO `<application_id>`;
GRANT SELECT ON TABLE `system`.`query`.`history` TO `<application_id>`
GRANT USE_SCHEMA ON SCHEMA system.access TO `<application_id>`;
GRANT SELECT ON TABLE `system`.`access`.`column_lineage` TO `<application_id>`

Hive Metastore

GRANT READ_METADATA, USAGE, SELECT ON catalog <catalog_name> to `<application_id>`

3. Create a Databricks SQL Warehouse for Metaplane

  1. Follow the instructions here to create a SQL Warehouse for Metaplane to use. You will use the Host, Port and HTTP path from the 'Connection details' tab when creating the connection to Databricks in Metaplane.
  2. Click the 'Permissions' button and give the Metaplane service principal 'Can use' permissions.

4. Add Databricks as a connection in Metaplane

On the connections page, click the 'Add connection' button in the upper right corner and find the Databricks Unity Catalog or Databricks Hive Metastore icon under Warehouses. The Host, Port and HTTP Path fields come from the SQL Warehouse created in Step 3. The Access token comes from Step 1.