Amazon S3

Set up monitoring of your raw files stored in AWS S3 including S3 tables using Apache Iceberg

Reading S3 Files from Snowflake

Overview

Metaplane can monitor your S3 data by querying through Snowflake's secure integration with AWS. This guide walks you through setting up the connection and querying your S3 data, using a storage integration to manage access credentials securely.

Prerequisites

Before beginning, ensure you have:

  • Administrator access to both Snowflake and AWS
  • The S3 bucket containing your data files
  • Appropriate permissions to create IAM roles in AWS
  • The necessary Snowflake privileges to create storage integrations and external tables

Connect Snowflake to AWS

Configure AWS IAM Role

First, create an IAM role that Snowflake can assume to access your S3 bucket. Full details can be found in Snowflake's documentation.

  1. Create a new IAM role in AWS with a trust relationship to Snowflake's account. You'll need to obtain Snowflake's AWS account ID and external ID after creating the storage integration in the next phase. For now, create a role with the following S3 permissions:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::your-bucket-name/*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::your-bucket-name"
        }
    ]
}

Create Snowflake Storage Integration

Create a storage integration object in Snowflake that securely stores the AWS credentials:

CREATE STORAGE INTEGRATION s3_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = S3
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::001234567890:role/your-role-name'
  STORAGE_ALLOWED_LOCATIONS = ('s3://your-bucket-name/');

After creating the integration, retrieve the AWS IAM user and external ID:

DESC STORAGE INTEGRATION s3_integration;

Return to AWS and update your IAM role's trust relationship with these values.

Create External Stage

With the integration in place, create an external stage that points to your S3 location:

CREATE STAGE my_s3_stage
  STORAGE_INTEGRATION = s3_integration
  URL = 's3://your-bucket-name/path/'
  FILE_FORMAT = (TYPE = CSV);  -- Adjust format as needed

Create External Table

Now you can create an external table to read files directly from S3 using Snowflake. Set the file format to make the raw data queryable in Snowflake:

CREATE FILE FORMAT my_csv_format
    TYPE = CSV
    SKIP_HEADER = 1
    FIELD_DELIMITER = ','
    TRIM_SPACE = TRUE;

CREATE EXTERNAL TABLE my_s3_table
		LOCATION = @my_s3_stage/your_file.csv
    FILE_FORMAT = my_csv_format

Grant Metaplane access to query External Tables

Metaplane's standard permissions cover newly created External Tables, so no additional permissions need to be granted. The permissions controlling this are:

grant SELECT on all external tables in schema [your_schema] to [your_role];  
grant SELECT on future external tables in schema [your_schema] to [your_role];

When Metaplane next syncs with Snowflake, find your external tables in your entity tree. Verify this worked by running a SELECT command on the external table within Snowflake as the Metaplane role.

Common Considerations

  1. Performance: When querying large files, consider using Snowflake's COPY command to load the data into a table first, rather than querying it directly.

  2. File Formats: While the examples use CSV, Snowflake supports multiple file formats including JSON, Parquet, and Avro. Adjust the file format specifications accordingly.

  3. Error Handling: If your queries fail, verify:

    • The storage integration has correct permissions
    • The AWS IAM role trust relationship is properly configured
    • File paths and formats match your data exactly
  4. Security: Always follow the principle of least privilege when setting up IAM roles and Snowflake permissions. Grant access only to the specific S3 paths needed.