Amazon S3
Set up monitoring of your raw files stored in AWS S3 including S3 tables using Apache Iceberg
Reading S3 Files from Snowflake
Overview
Metaplane can monitor your S3 data by querying through Snowflake's secure integration with AWS. This guide walks you through setting up the connection and querying your S3 data, using a storage integration to manage access credentials securely.
Prerequisites
Before beginning, ensure you have:
- Administrator access to both Snowflake and AWS
- The S3 bucket containing your data files
- Appropriate permissions to create IAM roles in AWS
- The necessary Snowflake privileges to create storage integrations and external tables
Connect Snowflake to AWS
Configure AWS IAM Role
First, create an IAM role that Snowflake can assume to access your S3 bucket. Full details can be found in Snowflake's documentation.
- Create a new IAM role in AWS with a trust relationship to Snowflake's account. You'll need to obtain Snowflake's AWS account ID and external ID after creating the storage integration in the next phase. For now, create a role with the following S3 permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::your-bucket-name/*"
},
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::your-bucket-name"
}
]
}
Create Snowflake Storage Integration
Create a storage integration object in Snowflake that securely stores the AWS credentials:
CREATE STORAGE INTEGRATION s3_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::001234567890:role/your-role-name'
STORAGE_ALLOWED_LOCATIONS = ('s3://your-bucket-name/');
After creating the integration, retrieve the AWS IAM user and external ID:
DESC STORAGE INTEGRATION s3_integration;
Return to AWS and update your IAM role's trust relationship with these values.
Create External Stage
With the integration in place, create an external stage that points to your S3 location:
CREATE STAGE my_s3_stage
STORAGE_INTEGRATION = s3_integration
URL = 's3://your-bucket-name/path/'
FILE_FORMAT = (TYPE = CSV); -- Adjust format as needed
Create External Table
Now you can create an external table to read files directly from S3 using Snowflake. Set the file format to make the raw data queryable in Snowflake:
CREATE FILE FORMAT my_csv_format
TYPE = CSV
SKIP_HEADER = 1
FIELD_DELIMITER = ','
TRIM_SPACE = TRUE;
CREATE EXTERNAL TABLE my_s3_table
LOCATION = @my_s3_stage/your_file.csv
FILE_FORMAT = my_csv_format
Grant Metaplane access to query External Tables
Metaplane's standard permissions cover newly created External Tables, so no additional permissions need to be granted. The permissions controlling this are:
grant SELECT on all external tables in schema [your_schema] to [your_role];
grant SELECT on future external tables in schema [your_schema] to [your_role];
When Metaplane next syncs with Snowflake, find your external tables in your entity tree. Verify this worked by running a SELECT command on the external table within Snowflake as the Metaplane role.
Common Considerations
-
Performance: When querying large files, consider using Snowflake's COPY command to load the data into a table first, rather than querying it directly.
-
File Formats: While the examples use CSV, Snowflake supports multiple file formats including JSON, Parquet, and Avro. Adjust the file format specifications accordingly.
-
Error Handling: If your queries fail, verify:
- The storage integration has correct permissions
- The AWS IAM role trust relationship is properly configured
- File paths and formats match your data exactly
-
Security: Always follow the principle of least privilege when setting up IAM roles and Snowflake permissions. Grant access only to the specific S3 paths needed.
Updated about 5 hours ago