Skip to main content

Faker

Sync overview

The Sample Data (Faker) source generates sample data using the python mimesis package.

Output schema

This source will generate an "e-commerce-like" dataset with users, products, and purchases. Here's what is produced at a Postgres destination connected to this source:

CREATE TABLE "public"."users" (
"address" jsonb,
"occupation" text,
"gender" text,
"academic_degree" text,
"weight" int8,
"created_at" timestamptz,
"language" text,
"telephone" text,
"title" text,
"updated_at" timestamptz,
"nationality" text,
"blood_type" text,
"name" text,
"id" float8,
"age" int8,
"email" text,
"height" text,
-- "_airbyte_ab_id" varchar,
-- "_airbyte_emitted_at" timestamptz,
-- "_airbyte_normalized_at" timestamptz,
-- "_airbyte_users_hashid" text
);

CREATE TABLE "public"."users_address" (
"_airbyte_users_hashid" text,
"country_code" text,
"province" text,
"city" text,
"street_number" text,
"state" text,
"postal_code" text,
"street_name" text,
-- "_airbyte_ab_id" varchar,
-- "_airbyte_emitted_at" timestamptz,
-- "_airbyte_normalized_at" timestamptz,
-- "_airbyte_address_hashid" text
);

CREATE TABLE "public"."products" (
"id" float8,
"make" text,
"year" float8,
"model" text,
"price" float8,
"created_at" timestamptz,
-- "_airbyte_ab_id" varchar,
-- "_airbyte_emitted_at" timestamptz,
-- "_airbyte_normalized_at" timestamptz,
-- "_airbyte_dev_products_hashid" text,
);

CREATE TABLE "public"."purchases" (
"id" float8,
"user_id" float8,
"product_id" float8,
"purchased_at" timestamptz,
"added_to_cart_at" timestamptz,
"returned_at" timestamptz,
-- "_airbyte_ab_id" varchar,
-- "_airbyte_emitted_at" timestamptz,
-- "_airbyte_normalized_at" timestamptz,
-- "_airbyte_dev_purchases_hashid" text,
);

Features

FeatureSupported?(Yes/No)Notes
Full Refresh SyncYes
Incremental SyncYes
NamespacesNo

Of note, if you choose Incremental Sync, state will be maintained between syncs, and once you hit count records, no new records will be added.

You can choose a specific seed (integer) as an option for this connector which will guarantee that the same fake records are generated each time. Otherwise, random data will be created on each subsequent sync.

Requirements

None!

Reference

Config fields reference

Field
Type
Property name
integer
count
integer
seed
integer
records_per_slice
boolean
always_updated
integer
parallelism

Changelog

VersionDatePull RequestSubject
6.0.32024-03-1536167Make 'count' an optional config parameter.
6.0.22024-02-1235174Manage dependencies with Poetry.
6.0.12024-02-1235172Base image migration: remove Dockerfile and use the python-connector-base image
6.0.02024-01-3034644Declare 'id' columns as primary keys.
5.0.22024-01-1734344Ensure unique state messages
5.0.12023-01-0834033Add standard entrypoints for usage with AirbyteLib
5.0.02023-08-0829213Change all *id fields and products.year to be integer
4.0.02023-07-1928485Bump to test publication
3.0.22023-07-0727807Bump to test publication
3.0.12023-06-2827807Fix bug with purchase stream updated_at
3.0.02023-06-2327684Stream cursor is now updated_at & remove records_per_sync option
2.1.02023-05-0825903Add user.address (object)
2.0.32023-02-2023259bump to test publication
2.0.22023-02-2023259bump to test publication
2.0.12023-01-3022117source-faker goes beta
2.0.02022-12-1420492 and 20741Decouple stream states for better parallelism
1.0.02022-11-2819490Faker uses the CDK; rename streams to be lower-case (breaking), add determinism to random purchases, and rename
0.2.12022-10-1419197Emit AirbyteEstimateTraceMessage
0.2.02022-10-1418021Move to mimesis for speed!
0.1.82022-10-1217889Bump to test publish command (2)
0.1.72022-10-1117848Bump to test publish command
0.1.62022-09-0716418Log start of each stream
0.1.52022-06-1013695Emit timestamps in the proper ISO format
0.1.42022-05-2713298Test publication flow
0.1.32022-05-2713248Add options for records_per_sync and page_size
0.1.22022-05-2613248Test publication flow
0.1.12022-05-2613235Publish for AMD and ARM (M1 Macs) & remove User.birthdate
0.1.02022-04-1211738The Faker Source is created