Databricks
Databricks is a data lakehouse that unifies the best of data warehouses and data lakes in one simple platform to handle all your data, analytics and AI use cases. It’s built on an open and reliable data foundation that efficiently handles all data types and applies one common security and governance approach across all of your data and cloud platforms.
1. Generate a Databricks access token
You must have an access token in order for Metaplane to access Databricks. It is recommended that you create a service principal for Metaplane and generate an access token for that service principal. Alternatively you can use a personal access token for your user.
Generate personal access token for service principal
1. Create a service principal for Metaplane
Follow the instructions here to create a service principal using the Databricks API. Take note of the service principal's application id and save it somewhere safe.
2. Grant token usage to service principal in workspace
Follow the instructions here to give Metaplane's service principal permissions to use access tokens.
3. Generate an access token for Metaplane's service principal
Follow the instructions here to generate an access token for Metaplane's service principal. If you want Metaplane's connection to Databricks to be uninterrupted, set lifetime_seconds
to null
to prevent the token from expiring. Save this access token somewhere safe.
Generate personal access token for your user
Follow the instructions here to generate a personal access token for your user. Save this access token somewhere safe.
2. Grant permission to data to Metaplane's service principal
Run these commands on each catalog you want Metaplane to have access to.
Unity Catalog
Grant Metaplane access to tables to monitor
Grant access to all existing and future tables within catalog
GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<application_id>`;
GRANT USE_SCHEMA ON CATALOG <catalog_name> TO `<application_id>`;
GRANT SELECT ON CATALOG <catalog_name> TO `<application_id>`;
Grant access to specific tables within catalog
GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<application_id>`;
GRANT USE_SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO `<application_id>`;
GRANT SELECT ON TABLE <catalog_name>.<schema_name>.<table_name> TO `<application_id>`;
Grant Metaplane access to query history and column_lineage system tables for Data Insights
- Enable the query and access system schemas using the Databricks API. Follow instructions here. The API request will look like:
curl -v -X PUT -H "Authorization: Bearer <token>" "<workspace url>/api/2.0/unity-catalog/metastores/<metastore id>/systemschemas/query"
- Grant access to the query history system table and access column_lineage system tables
GRANT USE_SCHEMA ON SCHEMA system.query TO `<application_id>`;
GRANT SELECT ON TABLE `system`.`query`.`history` TO `<application_id>`
GRANT USE_SCHEMA ON SCHEMA system.access TO `<application_id>`;
GRANT SELECT ON TABLE `system`.`access`.`column_lineage` TO `<application_id>`
Hive Metastore
GRANT READ_METADATA, USAGE, SELECT ON catalog <catalog_name> to `<application_id>`
3. Create a Databricks SQL Warehouse for Metaplane
- Follow the instructions here to create a SQL Warehouse for Metaplane to use. You will use the
Host
,Port
andHTTP path
from the 'Connection details' tab when creating the connection to Databricks in Metaplane. - Click the 'Permissions' button and give the Metaplane service principal 'Can use' permissions.
4. Add Databricks as a connection in Metaplane
On the connections page, click the 'Add connection' button in the upper right corner and find the Databricks Unity Catalog or Databricks Hive Metastore icon under Warehouses. The Host, Port and HTTP Path fields come from the SQL Warehouse created in Step 3. The Access token comes from Step 1.
Updated about 1 month ago