AWS Datalake

Availability: Core Standard Plus Pro Enterprise Flex Self-Managed Enterprise PyAirbyte
Support Level: Marketplace
Connector Version: 0.1.58 (last updated a year ago)
CDK Version: 0.84.0
Sync Success Rate: Medium
Usage Rate: High
Definition ID: 99878c90-0fbd-46d3-9d98-ffde879d17fc

This page contains the setup guide and reference information for the AWS Datalake destination connector.

The AWS Datalake destination connector allows you to sync data to AWS. It will write data as JSON files in S3 and will make it available through a Lake Formation Governed Table in the Glue Data Catalog so that the data is available throughout other AWS services such as Athena, Glue jobs, EMR, Redshift, etc.

Prerequisites

To use this destination connector, you will need:

An AWS account
An S3 bucket where the data will be written
An AWS Lake Formation database where tables will be created (one per stream)
AWS credentials in the form of either the pair Access key ID / Secret key ID or a role with the following permissions:
- Writing objects in the S3 bucket
- Updating of the Lake Formation database

Please check the Setup guide below if you need guidance creating those.

Setup guide

You should now have all the requirements needed to configure AWS Datalake as a destination in the UI. You'll need the following information to configure the destination:

Aws Account Id : The account ID of your AWS account. You will find the instructions to setup a new AWS account here.
Aws Region : The region in which your resources are deployed
Authentication mode : The AWS Datalake connector lets you authenticate with either a user or a role. In both case, you will have to make sure that appropriate policies are in place. Select "ROLE" if you are using a role, "USER" if using a user with Access key / Secret Access key.
Target Role Arn : The name of the role, if "Authentication mode" was "ROLE". You will find the instructions to create a new role here.
Access Key Id : The Access Key ID of the user if "Authentication mode" was "USER". You will find the instructions to create a new user here. Make sure to select "Programmatic Access" so that you get secret access keys.
Secret Access Key : The Secret Access Key ID of the user if "Authentication mode" was "USER"
S3 Bucket Name : The bucket in which the data will be written. You will find the instructions to create a new S3 bucket here.
Target S3 Bucket Prefix : A prefix to prepend to the file name when writing to the bucket
Database : The database in which the tables will be created. You will find the instructions to create a new Lakeformation Database here.

Assigning proper permissions

The policy used by the user or the role must have access to the following services:

AWS Lake Formation
AWS Glue
AWS S3

You can use the AWS policy generator to help you generate an appropriate policy.

Please also make sure that the role or user you will use has appropriate permissions on the database in AWS Lakeformation. You will find more information about Lake Formation permissions in the AWS Lake Formation Developer Guide.

Supported sync modes

Sync mode	Supported?
Full Refresh - Overwrite	Yes
Full Refresh - Append	Yes
Full Refresh - Overwrite + Deduped	No
Incremental Sync - Append	Yes
Incremental Sync - Append + Deduped	No

Data type map

The Glue tables will be created with schema information provided by the source, i.e : You will find the same columns and types in the destination table as in the source except for the following types which will be translated for compatibility with the Glue Data Catalog:

Type in the source	Type in the destination
number	float
integer	int

Namespace support

This destination supports namespaces.

Reference

Config fields reference

Field

Type

Property name

string

bucket_name

object

credentials

string

lakeformation_database_name

string

region

string

aws_account_id

string

bucket_prefix

object

format

boolean

glue_catalog_float_as_decimal

string

lakeformation_database_default_tag_key

string

lakeformation_database_default_tag_values

boolean

lakeformation_governed_tables

string

partitioning

Changelog

Expand to review

Version	Date	Pull Request	Subject
0.1.58	2025-05-24	59824	Update dependencies
0.1.57	2025-05-03	59366	Update dependencies
0.1.56	2025-04-26	58711	Update dependencies
0.1.55	2025-04-19	58281	Update dependencies
0.1.54	2025-04-12	57665	Update dependencies
0.1.53	2025-04-05	57136	Update dependencies
0.1.52	2025-03-29	56623	Update dependencies
0.1.51	2025-03-22	56157	Update dependencies
0.1.50	2025-03-08	55353	Update dependencies
0.1.49	2025-03-01	54848	Update dependencies
0.1.48	2025-02-22	54231	Update dependencies
0.1.47	2025-02-15	53910	Update dependencies
0.1.46	2025-02-08	53436	Update dependencies
0.1.45	2025-02-01	52881	Update dependencies
0.1.44	2025-01-25	51770	Update dependencies
0.1.43	2025-01-11	51289	Update dependencies
0.1.42	2025-01-04	50914	Update dependencies
0.1.41	2024-12-28	50458	Update dependencies
0.1.40	2024-12-21	50220	Update dependencies
0.1.39	2024-12-14	48945	Update dependencies
0.1.38	2024-11-25	48671	Update dependencies
0.1.37	2024-11-04	48243	Update dependencies
0.1.36	2024-10-29	47878	Update dependencies
0.1.35	2024-10-28	47590	Update dependencies
0.1.34	2024-10-22	47091	Update dependencies
0.1.33	2024-10-12	46790	Update dependencies
0.1.32	2024-10-05	46400	Update dependencies
0.1.31	2024-09-28	46126	Update dependencies
0.1.30	2024-09-21	45821	Update dependencies
0.1.29	2024-09-14	45533	Update dependencies
0.1.28	2024-09-07	45328	Update dependencies
0.1.27	2024-08-31	45032	Update dependencies
0.1.26	2024-08-24	44677	Update dependencies
0.1.25	2024-08-22	44530	Update test dependencies
0.1.24	2024-08-17	44341	Update dependencies
0.1.23	2024-08-12	43822	Update dependencies
0.1.22	2024-08-10	43497	Update dependencies
0.1.21	2024-08-03	43139	Update dependencies
0.1.20	2024-07-27	42821	Update dependencies
0.1.19	2024-07-20	42174	Update dependencies
0.1.18	2024-07-13	41819	Update dependencies
0.1.17	2024-07-10	41590	Update dependencies
0.1.16	2024-07-09	41083	Update dependencies
0.1.15	2024-07-06	40907	Update dependencies
0.1.14	2024-06-29	40631	Update dependencies
0.1.13	2024-06-27	40215	Replaced deprecated AirbyteLogger with logging.Logger
0.1.12	2024-06-26	40535	Update dependencies
0.1.11	2024-06-25	40458	Update dependencies
0.1.10	2024-06-22	39958	Update dependencies
0.1.9	2024-06-04	39033	[autopull] Upgrade base image to v1.2.1
0.1.8	2024-05-20	38413	[autopull] base image + poetry + up_to_date
`0.1.7`	2024-04-29	#33853	Enable STS Role Credential Refresh for Long Sync
`0.1.6`	2024-03-22	#36386	Support new state message protocol
`0.1.5`	2024-01-03	#33924	Add new ap-southeast-3 AWS region
`0.1.4`	2023-10-25	#29221	Upgrade AWSWrangler
`0.1.3`	2023-03-28	#24642	Prefer airbyte type for complex types when available
`0.1.2`	2022-09-26	#17193	Fix schema keyerror and add parquet support
`0.1.1`	2022-04-20	#11811	Fix name of required param in specification
`0.1.0`	2022-03-29	#10760	Initial release

Prerequisites​

Setup guide​

Supported sync modes​

Data type map​

Namespace support​

Reference​

Config fields reference

Changelog​

Prerequisites

Setup guide

Supported sync modes

Data type map

Namespace support

Reference

Changelog