Airbyte

Airbyte is a data movement platform with an expansive catalog of connectors, allowing users to seamlessly sync data between systems.

Generate Airbyte API credentials

Pick an Airbyte user for the Metaplane connection

The credentials you give Metaplane must be associated with some Airbyte user account. You can use your own account, or you can create a dedicated Metaplane account in Airbyte, which allows you finer-grained control over what resources you allow Metaplane to access.

Airbyte Cloud

If you use Airbyte Cloud, you can use API token authentication to connect Metaplane.

Create an Application

You can follow the steps outlined in https://reference.airbyte.com/reference/authentication-20 to generate an Airbyte Application for Metaplane. The tl;dr is: go to Airbyte cloud, go to Settings -> Applications, and then click Create an Application and call it Metaplane.

Once you've created the application, note the Client ID and Client Secret. These are what you'll need to pass to Metaplane in the next step.

Airbyte OSS

If you run a self-hosted open-source deployment of Airbyte, you can use username-password authentication to connect Metaplane. This could be the username and password of your personal Airbyte user, or of the Metaplane service you (optionally) created in the previous step.

Whitelist Metaplane IP Addresses

Depending on your method of deployment, your Airbyte instance may be behind a firewall that restricts external access.

Metaplane will only access your Airbyte instance through the following IPs:

  • 44.197.96.121
  • 34.206.79.174
  • 107.22.42.246

As list: 44.197.96.121, 34.206.79.174, 107.22.42.246

Identify your base API URL

Usually of the format https://<airbyteInstanceUrl>/api/public/v1, but you can check with your Airbyte administrator to be sure. The airbyteInstanceUrl may or may not include a port as well - the default is usually :8000.

Validating URL and credentials

If the Airbyte Public API isn't something you use regularly, finding the correct URL can be difficult. To validate your URL and credentials in your local terminal, you can run the following, substituting your own values for AIRBYTE_USER, AIRBYTE_PASS, and AIRBYTE_API_URL

AIRBYTE_USER=
AIRBYTE_PASS=
AIRBYTE_API_URL=
curl $AIRBYTE_API_URL/health --header Authorization:"Basic $(printf "${AIRBYTE_USER}:${AIRBYTE_PASS}" | base64)"

If your credentials and API url are all correct, the response will be Successful operation.

If the response is an HTML block containing something like This deployment of Airbyte is protected by HTTP Basic Authentication, then your API url is most likely correct, but the username and password were rejected.

If the response times out, then your API url is incorrect (more likely), or your local ip address needs to be whitelisted for access to your Airbyte instance (less likely, since you probably already have access to Airbyte).

If you get back a message like Object not found, this means your API url is the url of an airbyte instance, but likely not the correct path. To further validate, you can try hitting the /workspaces endpoint and see if it returns any more information:

curl $AIRBYTE_API_URL/health --header Authorization:"Basic $(printf "${AIRBYTE_USER}:${AIRBYTE_PASS}" | base64)"

Create an Airbyte connection in Metaplane

Head over Metaplane, add a new connection via Settings -> Data Stack -> Add connection, and select Airbyte.

Select your deployment type - Cloudif you use Airbyte Cloud, or Self hosted otherwise, and enter the corresponding credentials you generated in the previous step (username/password for Self hosted and client ID + secret for Cloud).

For Self hosted Airbyte instances, you'll also need to enter the Airbyte Public API base URL that you identified.

What to expect

Metaplane will populate with all of the Airbyte Workspaces, Connections, and Streams that it has access to. You'll see both upstream and downstream warehouse lineage for Streams. You'll also see some useful metrics (which you can create monitors on) about the syncs that Airbyte ran for each Connection, including:

  • Time since an Airbyte Connection last successfully synced
  • Bytes + rows written per sync
  • Duration of each sync