Data CI/CD overview

Halt PRs with unintended effects and prevent data quality issues from hitting the warehouse.

When you integrate your GitHub account and repository hosting your dbt model code with Metaplane, and add our GitHub app, you're able to see how changes you make to your model will affect other models in dbt, objects in your warehouse, and dashboards in your business intelligence tool.

Teams use Metaplane's GitHub application to prevent making breaking changes prior to merging any new code. Integrating with GitHub also allows users to understand when a pull request led to a data quality incident surfaced in Metaplane for faster root cause analysis. There are two types of checks run by the GitHub app:

  • Data impact previews – Metaplane identifies which downstream warehouse table(s) or BI dashboard(s) may be affected by your pull request.
  • Data test previews – Metaplane runs a suite of tests to forecast how the values in your downstream models and tables will change.

With Data CI/CD, you’ll also be able to:

  • Raise awareness of data quality starting in the pull request–Metaplane can run checks directly in your source control system to ensure your data team can merge changes with confidence.
  • Identify the root cause of data incidents-Metaplane keeps track of pull requests so that when a data incident occurs, we can provide context about recently merged pull requests related to any data that is broken.

Setup

To implement Data CI/CD, you’ll need to:

  1. Make sure that dbt is already setup before starting on Data CI/CD setup.
  2. Install the GitHub app
  3. Identify where your dbt jobs are stored in your GitHub instance
  4. Enable and configure data impact previews and/or data test previews

Here’s what your experience will look like:

  1. Find Data CI/CD from the sidebar to get started.

  2. Add the GitHub app from your Metaplane account

  3. Confirm permissions for your organization’s GitHub repository

  4. Go back to Metaplane to configure your GitHub integration

  5. Find the appropriate dbt job URL and GitHub paths for Metaplane to read from.

  6. (Optional) Further down the GitHub integration configuration panel, you’ll be able to toggle Impact Previews on to identify objects downstream of your pull request.

  7. (Optional) Underneath the Impact Previews dropdown, you’ll also be able to configure Test Previews.
    Mandatory fields include:

    • Target Warehouse - which data warehouse you’d like your regression tests run against
    • CI Job URL–note that this will be different than the one you configured in step 5
    • Optional fields include:
      • Checkboxes to ignore draft pull requests and the option to test downstream dbt models
      • Tag filters–Using this option will restrict your tests only to specific Metaplane tags that you’ve applied (link to docs)
      • Guardrails for your CI checks include: CI Job timeout, Query time
      • Change threshold allows you to configure what breaches will be surfaced in your next CI check.

  8. Congratulations! After saving the changes in the configuration panel made above, this is what you’ll see in Metaplane:


What’s Next

Get started by connecting your dbt Cloud repo in Github to Metaplane: