Skip to main content

MotherDuck

Overview

DuckDB is an in-process SQL OLAP database management system and this destination is meant to use locally if you have multiple smaller sources such as GitHub repos, some social media and local CSVs or files you want to run analytics workloads on. This destination writes data to the MotherDuck service, or to a file on the local filesystem on the host running Airbyte.

For file-based DBs, data is written to /tmp/airbyte_local by default. To change this location, modify the LOCAL_ROOT environment variable for Airbyte.

Destinations V2

This destination implements Destinations V2, which provides improved final table structures. It's a new version of the existing DuckDB destination and works both with DuckDB and MotherDuck.

Learn more about what's new in Destinations V2 here.

Use with MotherDuck

This DuckDB destination is compatible with MotherDuck.

Specifying a MotherDuck Database

To specify a MotherDuck-hosted database as your destination, simply provide your database uri with the normal md: database prefix in the destination_path configuration option.

caution

We do not recommend providing your API token in the md: connection string, as this may cause your token to be printed to execution logs. Please use the MotherDuck API Key setting instead.

Authenticating to MotherDuck

For authentication, you will use your MotherDuck Access Token.

Sync Overview

Output schema

Each table will contain at least the following columns:

  • _airbyte_raw_id: a uuid assigned by Airbyte to each event that is processed.
  • _airbyte_extracted_at: a timestamp representing when the event was pulled from the data source.
  • _airbyte_meta: a json blob storing metadata about the record.

In addition, columns specified in the JSON schema will also be created.

Features

FeatureSupported
Full Refresh SyncYes
Incremental - Append SyncYes
Incremental - Append + DedupedYes
Typing and DeduplicationYes
NamespacesNo
Data GenerationsNo

Performance consideration

This integration will be constrained by the speed at which your filesystem accepts writes.

Working with local DuckDB files

This connector is primarily designed to work with MotherDuck and local DuckDB files for Destinations V2. If you would like to work only with local DuckDB files, you may want to consider using the DuckDB destination.

Changelog

Expand to review
VersionDatePull RequestSubject
0.1.152024-11-0748405Updated docs and hovertext for schema, api key, and database name.
0.1.142024-10-3048006Fix bug in _flush_buffer, explicitly register dataframe before inserting
0.1.132024-10-3047969Preserve Platform-generated id in state messages.
0.1.122024-10-3047987Disable PyPi publish.
0.1.112024-10-3047979Rename package.
0.1.102024-10-2947958Add state counts and other fixes.
0.1.92024-10-2947950Fix bug: add double quotes to column names that are reserved keywords.
0.1.82024-10-2947952Fix: Add max batch size for loads.
0.1.72024-10-2947706Fix bug: incorrect column names were used to create new stream table when using multiple streams.
0.1.62024-10-2947821Update dependencies
0.1.52024-10-2847694Resolve write failures, move processor classes into the connector.
0.1.42024-10-2847688Use new destination table name format, explicitly insert PyArrow table columns by name and add debug info for column mismatches.
0.1.32024-10-2347315Fix bug causing MotherDuck API key to not be correctly passed to the engine.
0.1.22024-10-2347315Use saas_only mode during connection check to reduce ram usage.
0.1.12024-10-2347312Fix: generate new unique destination ID
0.1.02024-10-2346904New MotherDuck destination