S3 destination for batch exports

Last updated:

|Edit this page

With batch exports, data can be exported to an S3 bucket.

Creating the batch export

  1. Subscribe to data pipelines add-on in your billing settings if you haven't already.
  2. Click on data pipeline in the sidebar and go to the exports tab in your PostHog instance.
  3. Click "Create export workflow".
  4. Select S3 as the batch export type.
  5. Fill in the necessary configuration details.
  6. Finalize the creation by clicking on "Create".
  7. Done! The batch export will schedule its first run on the start of the next period.

S3 configuration

Configuring a batch export targeting S3 requires the following S3-specific configuration values:

  • Bucket name: The name of the S3 bucket where the data is to be exported.
  • Region: The AWS region where the bucket is located.
  • Key prefix: A key prefix to use for each S3 object created. This key can include template variables
  • Compression: Select a compression method (like gzip) to use for exported files or no compression.
  • Encryption: Select a server-side encryption method (AES256 or aws:kms) for AWS to encrypt data at rest.
  • AWS Access Key ID: An AWS access key ID with access to the S3 bucket.
  • AWS Secret Access Key: An AWS secret access key with access to the S3 bucket.
  • AWS KMS Key ID: The AWS KMS Key ID to use for server-side encryption. Only required when selecting aws:kms encryption.
  • Events to exclude: A list of events to omit from the exported data.

S3 key prefix template variables

The key prefix provided for data exporting can include template variables which are formatted at runtime. All template variables are defined between curly brackets (for example {day}). This allows you partition files in your S3 bucket, such as by date.

Template variables include:

  • Date and time variables:
    • year.
    • month.
    • day.
    • hour.
    • minute.
    • second.
  • Name of the table exported (for now, only "events"):
    • table.
  • Batch export data bounds:
    • data_interval_start.
    • data_interval_end.

So, as an example, setting {year}-{month}-{day}_{table}/ as a key prefix, will produce files prefixed with keys like 2023-07-28_events/.

S3 file formats

PostHog batch exports for S3 export data in JSON lines format. We intend to add support for other common formats like CSV and Parquet. You can follow the roadmap for updates on this and other features.

Questions?

Was this page useful?

Next article

Snowflake destination for batch exports

With batch exports, data can be exported to a Snowflake database table. Creating the batch export Subscribe to data pipelines add-on in your billing settings if you haven't already. Click on data pipeline in the sidebar and go to the exports tab in your PostHog instance. Click "Create export workflow". Select Snowflake as the batch export destination. Fill in the necessary configuration details . Finalize the creation by clicking on "Create". Done! The batch export will schedule its first…

Read next article