End-to-End Lineage
Metaplane automatically uses a combination of parsing queries from your warehouse and metadata available from 3rd party APIs of tools that you've connected in your account to generate a lineage diagram, similar to the one shown below. You're able to define the number of levels either upstream or downstream and which column you'd like to drill into through the dropdowns on the bottom left corner of your lineage graph view.
How Warehouse Lineage is Generated
Metaplane is able to provide intra warehouse lineage, e.g. table → table and column → column connections, via two mechanisms. You can have any combination of these mechanisms enabled
- Metaplane can automatically ingest lineage table level lineage when dbt core or cloud is connected.
- When Metaplane has been given access to your warehouse query logs, we can parse SQL statements to build a column level lineage.
Column Level Lineage (CLL)
To build CLL, Metaplane parses your query history looking for any table or view CREATE
statements.Metaplane can parse complex sql statements to figure out what columns are related to one another.
Let’s take a look at a simple query to understand how CLL works in practice:
create or replace view db.schema.customers as
SELECT
(
SELECT
count(distinct owner_id)
from
db.schema.deals
) as total_deals,
deal_flow.deal_id
FROM
deal_flow
WHERE
deal_flow.amount > 100
In this example, we are creating a new view called customers
that has the columns total_deals, deal_id
. Our parser looks at each top level column and recursively determines what columns contribute data. Here, we can see that the total_deals
column is derived from the deals.owner_id
column and the deal_id
column is derived from the column with the same name in the deal_flow
table. We think of these as “DIRECT” lineage, since the columns are directly contributing their own data to the top level column. Metaplane is also able to parse out “FILTER” lineage, which we think of as any column that isn’t contributing data, but is being used to filter down a result. In the example above, we can see that deal_flow.amount
will be picked up as a “FILTER” column for deal_id
. By default, our application only shows “DIRECT” lineage.
Stale lineage
Because we parse queries to determine lineage, there are cases where warehouse to warehouse lineage for columns can become stale over time. If 21 days pass with no update to the lineage, we will remove it the next time we see new lineage information for that column come in.
BI and Ingestion Tools
If you've connected any of the business intelligence tools to your Metaplane account, we'll automatically add the dashboards (or equivalent terminology) and relations to objects in your warehouse to your lineage graph, typically found downstream (right hand side of graph).
If you've connected Fivetran to your Metaplane account, you'll be able to see which connectors load data into which tables in your warehouse, located upstream (left hand side of graph).
Where to find Lineage
You have two options to see lineage generated for your data stack.
-
Use the Lineage indicator on the left hand sidebar: This will show you all of the integrated data warehouses, databases, and data lakes that you have integrated with Metaplane. You can then click into each object to browse lineage.
-
Use the Lineage tab along the top of your integrated warehouse(s): Find the table(s) and/or column(s) that you'd like to see lineage for, from the entity list page, and use the tab along the top.
How to navigate the Lineage page
All items for navigation can be found along the bottom of the page.
Switching to particular fields
Clicking the table name at the bottom of your screen will pull up a list of the columns within that table. Once you select your desired column, your chart will update to show where that field is referenced.
List view
In some cases, you'll have hundreds of downstream objects and tens of transformation models, which can be more easily sorted through a list. You'll be able to see both downstream objects and upstream objects in a list view, grouped by the number of "hops" (i.e. transformations, references in queries) they are away from your root object.
Updated about 1 year ago