Skip to main content

DataGen

The DataGen source connector generates synthetic data for testing and development purposes. This connector is designed for end-to-end testing of data destinations and for testing Airbyte configurations in speed mode without requiring access to an external data source.

Prerequisites

No prerequisites are required to use this connector. DataGen generates data locally and does not connect to any external systems.

Setup guide

  1. Log in to your Airbyte Cloud or Airbyte Open Source account.
  2. Click Sources and then click + New source.
  3. On the Set up the source page, select DataGen from the Source type dropdown.
  4. Enter a name for your DataGen source.
  5. Configure the data generation settings:
    • Data Generation Type: Choose Incremental, All Types, or Wide.
    • Max Record: Specify the total number of records to generate (minimum 1, maximum 100 billion). Default is 100.
    • Column Count (Wide mode only): Set the number of columns to generate, from 1 to 1000. Default is 50.
    • Max Concurrency (optional): Set the maximum number of concurrent data generators. Leave empty to let Airbyte optimize performance automatically.
  6. Click Set up source.

Supported sync modes

The DataGen source connector supports the following sync mode:

FeatureSupported?
Full Refresh SyncYes
Incremental SyncNo

Supported data generation types

The connector supports three data generation patterns:

Incremental

Generates a stream named increment with a single column named id that contains monotonically increasing integers. This mode is useful for testing incremental data loading and verifying that data arrives in the expected order.

All types

Generates a stream named all types with columns for various Airbyte data types, including id, string, boolean, number, big integer, big decimal, date, time (with and without time zones), timestamp (with and without time zones), and JSON. This mode is useful for testing type handling and schema compatibility across different destinations.

Wide

Generates a stream named wide with a configurable number of columns (1–1000, default 50). Column 0 is always id (integer primary key). The remaining columns cycle through all 12 Airbyte data types: integer, string, boolean, number, big integer, big decimal, date, time with timezone, time without timezone, timestamp with timezone, timestamp without timezone, and JSON. Column names follow the pattern col_1_integer, col_2_string, etc. This mode is useful for testing wide schema handling and destination performance with many columns.

Configuration

ParameterTypeDefaultDescription
Data Generation TypeenumIncrementalThe data generation pattern to use. Choose Incremental, All Types, or Wide.
Max Recordinteger100The total number of records to generate. Minimum 1, maximum 100 billion.
Max Concurrencyinteger(auto)Maximum number of concurrent data generators. Leave empty to let Airbyte optimize performance automatically.
Column Countinteger50Wide mode only. The number of columns to generate, including the id column. Minimum 1, maximum 1000.

Reference

Config fields reference

Field
Type
Property name
object
flavor
integer
max_records
integer
concurrency

Changelog

Expand to review
VersionDatePull RequestSubject
0.2.02026-04-2875542Add wide schema flavor with configurable column count; fix null safety in partition reader; cache codec references; bump CDK to 1.1.6
0.1.62025-10-2368611Update cdk version
0.1.52025-10-2168581Update dataChannel version
0.1.42025-10-1668131Increment naming fix
0.1.32025-10-1668129Increment encoding fix
0.1.22025-10-1467720Removal of Array type
0.1.12025-10-1367110Addition of proto types
0.1.02025-09-2966331Creation of initial DataGen Source