End-to-End Lineage

Metaplane automatically uses a combination of parsing queries from your warehouse and metadata available from 3rd party APIs of tools that you've connected in your account to generate a lineage diagram, similar to the one shown below. You're able to define the number of levels either upstream or downstream and which column you'd like to drill into through the dropdowns on the bottom left corner of your lineage graph view.

How Warehouse Lineage is Generated

Metaplane is able to provide intra warehouse lineage, e.g. table → table and column → column connections, via two mechanisms. You can have any combination of these mechanisms enabled

  1. Metaplane can automatically ingest lineage table level lineage when dbt core or cloud is connected.
  2. When Metaplane has been given access to your warehouse query logs, we can parse SQL statements to build a column level lineage.

Column Level Lineage (CLL)

To build CLL, Metaplane parses your query history looking for any table or view CREATE statements.Metaplane can parse complex sql statements to figure out what columns are related to one another.

Let’s take a look at a simple query to understand how CLL works in practice:

create or replace view db.schema.customers as 
SELECT 
  (
    SELECT 
      count(distinct owner_id) 
    from 
      db.schema.deals
  ) as total_deals, 
  deal_flow.deal_id 
FROM 
  deal_flow 
WHERE 
  deal_flow.amount > 100

In this example, we are creating a new view called customers that has the columns total_deals, deal_id . Our parser looks at each top level column and recursively determines what columns contribute data. Here, we can see that the total_deals column is derived from the deals.owner_id column and the deal_id column is derived from the column with the same name in the deal_flow table. We think of these as “DIRECT” lineage, since the columns are directly contributing their own data to the top level column. Metaplane is also able to parse out “FILTER” lineage, which we think of as any column that isn’t contributing data, but is being used to filter down a result. In the example above, we can see that deal_flow.amount will be picked up as a “FILTER” column for deal_id . By default, our application only shows “DIRECT” lineage.

📘

Stale lineage

Because we parse queries to determine lineage, there are cases where warehouse to warehouse lineage for columns can become stale over time. If 21 days pass with no update to the lineage, we will remove it the next time we see new lineage information for that column come in.

BI and Ingestion Tools

If you've connected any of the business intelligence tools to your Metaplane account, we'll automatically add the ddashboards (or equivalent terminology) and relations to objects in your warehouse to your lineage graph, typically found downstream (right hand side of graph).

If you've connected Fivetran to your Metaplane account, you'll be able to see which connectors load data into which tables in your warehouse, located upstream (left hand side of graph).