Skip to main content


Sync overview

The Sample Data (Faker) source generates sample data using the python mimesis package.

Output schema

This source will generate an "e-commerce-like" dataset with users, products, and purchases. Here's what is produced at a Postgres destination connected to this source:

CREATE TABLE "public"."users" (
"address" jsonb,
"occupation" text,
"gender" text,
"academic_degree" text,
"weight" int8,
"created_at" timestamptz,
"language" text,
"telephone" text,
"title" text,
"updated_at" timestamptz,
"nationality" text,
"blood_type" text,
"name" text,
"id" float8,
"age" int8,
"email" text,
"height" text,
-- "_airbyte_ab_id" varchar,
-- "_airbyte_emitted_at" timestamptz,
-- "_airbyte_normalized_at" timestamptz,
-- "_airbyte_users_hashid" text

CREATE TABLE "public"."users_address" (
"_airbyte_users_hashid" text,
"country_code" text,
"province" text,
"city" text,
"street_number" text,
"state" text,
"postal_code" text,
"street_name" text,
-- "_airbyte_ab_id" varchar,
-- "_airbyte_emitted_at" timestamptz,
-- "_airbyte_normalized_at" timestamptz,
-- "_airbyte_address_hashid" text

CREATE TABLE "public"."products" (
"id" float8,
"make" text,
"year" float8,
"model" text,
"price" float8,
"created_at" timestamptz,
-- "_airbyte_ab_id" varchar,
-- "_airbyte_emitted_at" timestamptz,
-- "_airbyte_normalized_at" timestamptz,
-- "_airbyte_dev_products_hashid" text,

CREATE TABLE "public"."purchases" (
"id" float8,
"user_id" float8,
"product_id" float8,
"purchased_at" timestamptz,
"added_to_cart_at" timestamptz,
"returned_at" timestamptz,
-- "_airbyte_ab_id" varchar,
-- "_airbyte_emitted_at" timestamptz,
-- "_airbyte_normalized_at" timestamptz,
-- "_airbyte_dev_purchases_hashid" text,


Full Refresh SyncYes
Incremental SyncYes

Of note, if you choose Incremental Sync, state will be maintained between syncs, and once you hit count records, no new records will be added.

You can choose a specific seed (integer) as an option for this connector which will guarantee that the same fake records are generated each time. Otherwise, random data will be created on each subsequent sync.




VersionDatePull RequestSubject
5.0.02023-08-0829213Change all *id fields and products.year to be integer
4.0.02023-07-1928485Bump to test publication
3.0.22023-07-0727807Bump to test publication
3.0.12023-06-2827807Fix bug with purchase stream updated_at
3.0.02023-06-2327684Stream cursor is now updated_at & remove records_per_sync option
2.1.02023-05-0825903Add user.address (object)
2.0.32023-02-2023259bump to test publication
2.0.22023-02-2023259bump to test publication
2.0.12023-01-3022117source-faker goes beta
2.0.02022-12-1420492 and 20741Decouple stream states for better parallelism
1.0.02022-11-2819490Faker uses the CDK; rename streams to be lower-case (breaking), add determinism to random purchases, and rename
0.2.12022-10-1419197Emit AirbyteEstimateTraceMessage
0.2.02022-10-1418021Move to mimesis for speed!
0.1.82022-10-1217889Bump to test publish command (2)
0.1.72022-10-1117848Bump to test publish command
0.1.62022-09-0716418Log start of each stream
0.1.52022-06-1013695Emit timestamps in the proper ISO format
0.1.42022-05-2713298Test publication flow
0.1.32022-05-2713248Add options for records_per_sync and page_size
0.1.22022-05-2613248Test publication flow
0.1.12022-05-2613235Publish for AMD and ARM (M1 Macs) & remove User.birthdate
0.1.02022-04-1211738The Faker Source is created