AWS S3
AWS S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. It is read-only and supports below file formats:
The S3 Wrapper allows you to read data of below formats from S3 within your Postgres database.
- CSV - with or without header line
- JSON Lines
- Parquet
The S3 Wrapper also supports below compression algorithms:
- gzip
- bzip2
- xz
- zlib
Note for CSV and JSONL files: currently all columns in S3 files must be defined in the foreign table and their types must be text
type.
Note for Parquet files: the whole Parquet file will be loaded into local memory if it is compressed, so keep the file size as small as possible.
Preparation
Before you can query S3, you need to enable the Wrappers extension and store your credentials in Postgres.
Enable Wrappers
Make sure the wrappers
extension is installed on your database:
_10create extension if not exists wrappers with schema extensions;
Enable the S3 Wrapper
Enable the s3_wrapper
FDW:
_10create foreign data wrapper s3_wrapper_10 handler s3_fdw_handler_10 validator s3_fdw_validator;
Store your credentials (optional)
By default, Postgres stores FDW credentials inside pg_catalog.pg_foreign_server
in plain text. Anyone with access to this table will be able to view these credentials. Wrappers is designed to work with Vault, which provides an additional level of security for storing credentials. We recommend using Vault to store your credentials.
_14-- Save your AWS credential in Vault and retrieve the `key_id`_14insert into vault.secrets (name, secret)_14values (_14 'vault_access_key_id',_14 '<access key id>'_14)_14returning key_id;_14_14insert into vault.secrets (name, secret)_14values (_14 'vault_secret_access_key',_14 '<secret access key>'_14)_14returning key_id;
Connecting to S3
We need to provide Postgres with the credentials to connect to S3, and any additional options. We can do this using the create server
command:
_10create server s3_server_10 foreign data wrapper s3_wrapper_10 options (_10 vault_access_key_id '<your vault_access_key_id from above>',_10 vault_secret_access_key '<your vault_secret_access_key from above>',_10 aws_region 'us-east-1'_10 );
The full list of options are below:
aws_access_key_id
(required) - Your access keyaws_secret_access_key
(required) - Your secret keyaws_region
(required) - The region of your bucket (if providing an endpoint URL with a region in it, make sure that they are the same)endpoint_url
(optional) - An optional URL to allow connection to S3-compliant providers (i.e. Wasabi, Cloudflare R2, Backblaze B2, DigitalOcean Spaces)path_style_url
(optional) - Whether to use path-style URL access. This is required by some S3-compliant providers.true
orfalse
, default isfalse
.
Required S3 permissions
Below S3 permissions are needed:
- s3:GetObject
- s3:GetObjectAttributes
If the bucket is versioned, we also need:
- s3:GetObjectVersion
- s3:GetObjectVersionAttributes
Connecting to S3-compliant Providers - Supabase Storage
_10create server s3_server_10 foreign data wrapper s3_wrapper_10 options (_10 aws_access_key_id '<supabase_storage_access_key>',_10 aws_secret_access_key '<supabase_storage_secret_key>',_10 aws_region 'eu-central-1',_10 endpoint_url 'https://<project_ref>.supabase.co/storage/v1/s3',_10 path_style_url 'true'_10 );
Connecting to S3-compliant Providers - Wasabi
_10create server s3_server_10 foreign data wrapper s3_wrapper_10 options (_10 aws_access_key_id '<your_wasabi_access_key>',_10 aws_secret_access_key '<your_wasabi_secret_key>',_10 aws_region 'eu-central-1',_10 endpoint_url 'https://s3.eu-central-1.wasabisys.com'_10 );
Create a schema
We recommend creating a schema to hold all the foreign tables:
_10create schema if not exists s3;
Options
The following options are available when creating S3 foreign tables:
uri
- S3 URI, required. For example,s3://bucket/s3_table.csv
format
- File format, required.csv
,jsonl
, orparquet
has_header
- If the CSV file has header, optional.true
orfalse
, default isfalse
compress
- Compression algorithm, optional. One ofgzip
,bzip2
,xz
,zlib
, default is no compression
Entities
CSV Files
This is an object representing CSV files in S3.
Ref: AWS S3 docs
Operations
Object | Select | Insert | Update | Delete | Truncate |
---|---|---|---|---|---|
CSV | ✅ | ❌ | ❌ | ❌ | ❌ |
Usage
_13create foreign table s3.table_csv (_13 name text,_13 sex text,_13 age text,_13 height text,_13 weight text_13)_13 server s3_server_13 options (_13 uri 's3://bucket/s3_table.csv',_13 format 'csv',_13 has_header 'true'_13 );
Notes
- All columns must be defined in the foreign table
- All column types must be
text
- Optional header support via
has_header
option - Supports compression (gzip, bzip2, xz, zlib)
JSON Lines Files
This is an object representing JSONL files in S3.
Ref: JSONL docs
Operations
Object | Select | Insert | Update | Delete | Truncate |
---|---|---|---|---|---|
JSONL | ✅ | ❌ | ❌ | ❌ | ❌ |
Usage
_12create foreign table s3.table_jsonl (_12 name text,_12 sex text,_12 age text,_12 height text,_12 weight text_12)_12 server s3_server_12 options (_12 uri 's3://bucket/s3_table.jsonl',_12 format 'jsonl'_12 );
Notes
- All columns must be defined in the foreign table
- All column types must be
text
- Each line must be a valid JSON object
- Supports compression (gzip, bzip2, xz, zlib)
Parquet Files
This is an object representing Parquet files in S3.
Ref: Parquet docs
Operations
Object | Select | Insert | Update | Delete | Truncate |
---|---|---|---|---|---|
Parquet | ✅ | ❌ | ❌ | ❌ | ❌ |
Usage
_13create foreign table s3.table_parquet (_13 id integer,_13 bool_col boolean,_13 bigint_col bigint,_13 float_col real,_13 date_string_col text,_13 timestamp_col timestamp_13)_13 server s3_server_13 options (_13 uri 's3://bucket/s3_table.parquet',_13 format 'parquet'_13 );
Notes
- Not all columns need to be defined in foreign table
- Column names must match between Parquet file and foreign table
- Supports various Postgres data types (see Data Types section)
- Compressed files are loaded entirely into memory
- Supports compression (gzip, bzip2, xz, zlib)
Query Pushdown Support
This FDW doesn't support query pushdown.
Supported Data Types For Parquet File
The S3 Wrapper uses Parquet file data types from arrow_array::types, below are their mappings to Postgres data types.
Postgres Type | Parquet Type |
---|---|
boolean | BooleanType |
char | Int8Type |
smallint | Int16Type |
real | Float32Type |
integer | Int32Type |
double precision | Float64Type |
bigint | Int64Type |
numeric | Float64Type |
text | ByteArrayType |
date | Date64Type |
timestamp | TimestampNanosecondType |
timestamptz | TimestampNanosecondType |
Limitations
This section describes important limitations and considerations when using this FDW:
- Large result sets experience slower performance due to full data transfer requirement
- Compressed files are loaded entirely into memory and have additional processing overhead
- CSV and JSONL columns must be defined as
text
type only - All columns must be defined in foreign tables for CSV and JSONL
- Column names must match exactly for Parquet files
- No support for S3 Select or other S3-side filtering
- Materialized views using these foreign tables may fail during logical backups
Examples
Basic Example
This will create some "foreign table" inside your Postgres database can read data from S3:
_75-- CSV file, no compression_75create foreign table s3.table_csv (_75 name text,_75 sex text,_75 age text,_75 height text,_75 weight text_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.csv',_75 format 'csv',_75 has_header 'true'_75 );_75_75-- JSON line file, no compression_75create foreign table s3.table_jsonl (_75 name text,_75 sex text,_75 age text,_75 height text,_75 weight text_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.jsonl',_75 format 'jsonl'_75 );_75_75-- GZIP compressed CSV file_75create foreign table s3.table_csv_gzip (_75 name text,_75 sex text,_75 age text,_75 height text,_75 weight text_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.csv.gz',_75 format 'csv',_75 has_header 'true',_75 compress 'gzip'_75 );_75_75-- Parquet file, no compression_75create foreign table s3.table_parquet (_75 id integer,_75 bool_col boolean,_75 bigint_col bigint,_75 float_col real,_75 date_string_col text,_75 timestamp_col timestamp_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.parquet',_75 format 'parquet'_75 );_75_75-- GZIP compressed Parquet file_75create foreign table s3.table_parquet_gz (_75 id integer,_75 bool_col boolean,_75 bigint_col bigint,_75 float_col real,_75 date_string_col text,_75 timestamp_col timestamp_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.parquet.gz',_75 format 'parquet',_75 compress 'gzip'_75 );
Read from Supabase Storage
This example will read a CSV file stored on Supabase Storage. The access information can be found on Supabase Storage settings page.
_20create server s3_server_20 foreign data wrapper s3_wrapper_20 options (_20 aws_access_key_id '<access key ID>',_20 aws_secret_access_key '<secret access key>',_20 aws_region '<region>',_20 endpoint_url 'https://<project_ref>.supabase.co/storage/v1/s3',_20 path_style_url 'true'_20 );_20_20create foreign table s3.supabase_table_csv (_20 id text,_20 name text_20)_20 server s3_server_20 options (_20 uri 's3://mybucket/myfile.csv',_20 format 'csv',_20 has_header 'true'_20 );