DataGen
The DataGen source connector generates synthetic data for testing and development purposes. This connector is designed for end-to-end testing of data destinations and for testing Airbyte configurations in speed mode without requiring access to an external data source.
Prerequisites
No prerequisites are required to use this connector. DataGen generates data locally and does not connect to any external systems.
Setup guide
- Log in to your Airbyte Cloud or Airbyte Open Source account.
- Click Sources and then click + New source.
- On the Set up the source page, select DataGen from the Source type dropdown.
- Enter a name for your DataGen source.
- Configure the data generation settings:
- Data Generation Type: Choose Incremental, All Types, or Wide.
- Max Record: Specify the total number of records to generate (minimum 1, maximum 100 billion). Default is 100.
- Column Count (Wide mode only): Set the number of columns to generate, from 1 to 1000. Default is 50.
- Max Concurrency (optional): Set the maximum number of concurrent data generators. Leave empty to let Airbyte optimize performance automatically.
- Click Set up source.
Supported sync modes
The DataGen source connector supports the following sync mode:
| Feature | Supported? |
|---|---|
| Full Refresh Sync | Yes |
| Incremental Sync | No |
Supported data generation types
The connector supports three data generation patterns:
Incremental
Generates a stream named increment with a single column named id that contains monotonically increasing integers. This mode is useful for testing incremental data loading and verifying that data arrives in the expected order.
All types
Generates a stream named all types with columns for various Airbyte data types, including id, string, boolean, number, big integer, big decimal, date, time (with and without time zones), timestamp (with and without time zones), and JSON. This mode is useful for testing type handling and schema compatibility across different destinations.
Wide
Generates a stream named wide with a configurable number of columns (1–1000, default 50). Column 0 is always id (integer primary key). The remaining columns cycle through all 12 Airbyte data types: integer, string, boolean, number, big integer, big decimal, date, time with timezone, time without timezone, timestamp with timezone, timestamp without timezone, and JSON. Column names follow the pattern col_1_integer, col_2_string, etc. This mode is useful for testing wide schema handling and destination performance with many columns.
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
| Data Generation Type | enum | Incremental | The data generation pattern to use. Choose Incremental, All Types, or Wide. |
| Max Record | integer | 100 | The total number of records to generate. Minimum 1, maximum 100 billion. |
| Max Concurrency | integer | (auto) | Maximum number of concurrent data generators. Leave empty to let Airbyte optimize performance automatically. |
| Column Count | integer | 50 | Wide mode only. The number of columns to generate, including the id column. Minimum 1, maximum 1000. |
Reference
Config fields reference
Changelog
Expand to review
| Version | Date | Pull Request | Subject |
|---|---|---|---|
| 0.2.0 | 2026-04-28 | 75542 | Add wide schema flavor with configurable column count; fix null safety in partition reader; cache codec references; bump CDK to 1.1.6 |
| 0.1.6 | 2025-10-23 | 68611 | Update cdk version |
| 0.1.5 | 2025-10-21 | 68581 | Update dataChannel version |
| 0.1.4 | 2025-10-16 | 68131 | Increment naming fix |
| 0.1.3 | 2025-10-16 | 68129 | Increment encoding fix |
| 0.1.2 | 2025-10-14 | 67720 | Removal of Array type |
| 0.1.1 | 2025-10-13 | 67110 | Addition of proto types |
| 0.1.0 | 2025-09-29 | 66331 | Creation of initial DataGen Source |