Configuring Airbyte
This section covers how to configure Airbyte, and the various configuration Airbyte accepts.
Configuration is currently via environment variables. See the below section on how to modify these variables.

Docker Deployments

The recommended way to run an Airbyte Docker deployment is via the Airbyte repo's docker-compose.yaml and .env file.
To configure the default Airbyte Docker deployment, modify the bundled .env file. The docker-compose.yaml file injects appropriate variables into the containers.
If you want to manage your own docker files, please refer to Airbyte's docker file to ensure applications get the correct variables.

Kubernetes Deployments

The recommended way to run an Airbyte Kubernetes deployment is via the Kustomize overlays.
We recommend using the overlays in the stable directory as these have preset resource limits.
To configure the default Airbyte Kubernetes deployment, modify the .env in the respective directory. Each application will consume the appropriate env var from a generated configmap.
If you want to manage your own Kube manifests, please refer to the various Kustomize overlays for examples.

Reference

The following are the possible configuration options organised by deployment type and services.
Internal-only variables have been omitted for clarity. See Configs.java for a full list.
Be careful using variables marked as alpha as they aren't meant for public consumption.

Shared

The following variables are relevant to both Docker and Kubernetes.

Core

  1. 1.
    AIRBYTE_VERSION - Defines the Airbyte deployment version.
  2. 2.
    SPEC_CACHE_BUCKET - Defines the bucket for caching specs. This immensely speeds up spec operations. This is updated when new versions are published.
  3. 3.
    WORKER_ENVIRONMENT - Defines if the deployment is Docker or Kubernetes. Airbyte behaves accordingly.
  4. 4.
    CONFIG_ROOT - Defines the configs directory. Applies only to Docker, and is present in Kubernetes for backward compatibility.
  5. 5.
    WORKSPACE_ROOT - Defines the Airbyte workspace directory. Applies only to Docker, and is present in Kubernetes for backward compatibility.

Secrets

  1. 1.
    SECRET_STORE_GCP_PROJECT_ID - Defines the GCP Project to store secrets in. Alpha support.
  2. 2.
    SECRET_STORE_GCP_CREDENTIALS - Define the JSON credentials used to read/write Airbyte Configuration to Google Secret Manager. These credentials must have Secret Manager Read/Write access. Alpha support.
  3. 3.
    SECRET_PERSISTENCE_TYPE - Defines the Secret Persistence type. Defaults to NONE. Set to GOOGLE_SECRET_MANAGER to use Google Secret Manager. Set to TESTING_CONFIG_DB_TABLE to use the database as a test. Alpha support. Undefined behavior will result if this is turned on and then off.

Database

  1. 1.
    DATABASE_USER - Define the Jobs Database user.
  2. 2.
    DATABASE_PASSWORD - Define the Jobs Database password.
  3. 3.
    DATABASE_URL - Define the Jobs Database url in the form of jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Do not include username or password.
  4. 4.
    JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS - Define the total time to wait for the Jobs Database to be initialized. This includes migrations.
  5. 5.
    CONFIG_DATABASE_USER - Define the Configs Database user. Defaults to the Jobs Database user if empty.
  6. 6.
    CONFIG_DATABASE_PASSWORD - Define the Configs Database password. Defaults to the Jobs Database password if empty.
  7. 7.
    CONFIG_DATABASE_URL - Define the Configs Database url in the form of jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Defaults to the Jobs Database url if empty.
  8. 8.
    CONFIG_DATABASE_INITIALIZATION_TIMEOUT_MS - Define the total time to wait for the Configs Database to be initialized. This includes migrations.
  9. 9.
    RUN_DATABASE_MIGRATION_ON_STARTUP - Define if the Bootloader should run migrations on start up.

Airbyte Services

  1. 1.
    TEMPORAL_HOST - Define the url where Temporal is hosted at. Please include the port. Airbyte services use this information.
  2. 2.
    INTERNAL_API_HOST - Define the url where the Airbyte Server is hosted at. Please include the port. Airbyte services use this information.
  3. 3.
    WEBAPP_URL - Define the url the Airbyte Webapp is hosted at. Please include the port. Airbyte services use this information.

Jobs

  1. 1.
    SYNC_JOB_MAX_ATTEMPTS - Define the number of attempts a sync will attempt before failing.
  2. 2.
    SYNC_JOB_MAX_TIMEOUT_DAYS - Define the number of days a sync job will execute for before timing out.
  3. 3.
    JOB_MAIN_CONTAINER_CPU_REQUEST - Define the job container's minimum CPU usage. Units follow either Docker or Kubernetes, depending on the deployment. Defaults to none.
  4. 4.
    JOB_MAIN_CONTAINER_CPU_LIMIT - Define the job container's maximum CPU usage. Units follow either Docker or Kubernetes, depending on the deployment. Defaults to none.
  5. 5.
    JOB_MAIN_CONTAINER_MEMORY_REQUEST - Define the job container's minimum RAM usage. Units follow either Docker or Kubernetes, depending on the deployment. Defaults to none.
  6. 6.
    JOB_MAIN_CONTAINER_MEMORY_LIMIT - Define the job container's maximum RAM usage. Units follow either Docker or Kubernetes, depending on the deployment. Defaults to none.

Logging

  1. 1.
    LOG_LEVEL - Define log levels. Defaults to INFO. This value is expected to be one of the various Log4J log levels.

Worker

  1. 1.
    MAX_SPEC_WORKERS - Define the maximum number of Spec workers each Airbyte Worker container can support. Defaults to 5.
  2. 2.
    MAX_CHECK_WORKERS - Define the maximum number of Check workers each Airbyte Worker container can support. Defaults to 5.
  3. 3.
    MAX_SYNC_WORKERS - Define the maximum number of Sync workers each Airbyte Worker container can support. Defaults to 5.
  4. 4.
    MAX_DISCOVER_WORKERS - Define the maximum number of Discover workers each Airbyte Worker container can support. Defaults to 5.
  5. 5.
    SENTRY_DSN - Define the DSN of necessary Sentry instance. Defaults to empty. Integration with Sentry is explained here

Scheduler

  1. 1.
    SUBMITTER_NUM_THREADS - Define the maximum number of concurrent jobs the Scheduler schedules. Defaults to 5.
  2. 2.
    MINIMUM_WORKSPACE_RETENTION_DAYS - Defines the minimum configuration file age for sweeping. The Scheduler will do it's best to now sweep files younger than this. Defaults to 1 day.
  3. 3.
    MAXIMUM_WORKSPACE_RETENTION_DAYS - Defines the oldest un-swept configuration file age. Files older than this will definitely be swept. Defaults to 60 days.
  4. 4.
    MAXIMUM_WORKSPACE_SIZE_MB - Defines the workspace size sweeping will continue until. Defaults to 5GB.

Docker-Only

  1. 1.
    WORKSPACE_DOCKER_MOUNT - Defines the name of the Airbyte docker volume.
  2. 2.
    DOCKER_NETWORK - Defines the docker network the new Scheduler launches jobs on.
  3. 3.
    LOCAL_DOCKER_MOUNT - Defines the name of the docker mount that is used for local file handling. On Docker, this allows connector pods to interact with a volume for "local file" operations.

Kubernetes-Only

Jobs

  1. 1.
    JOB_KUBE_TOLERATIONS - Define one or more Job pod tolerations. Tolerations are separated by ';'. Each toleration contains k=v pairs mentioning some/all of key, effect, operator and value and separated by ,.
  2. 2.
    JOB_KUBE_NODE_SELECTORS - Define one or more Job pod node selectors. Each kv-pair is separated by a ,.
  3. 3.
    JOB_KUBE_MAIN_CONTAINER_IMAGE_PULL_POLICY - Define the Job pod connector image pull policy.
  4. 4.
    JOB_KUBE_MAIN_CONTAINER_IMAGE_PULL_SECRET - Define the Job pod connector image pull secret. Useful when hosting private images.
  5. 5.
    JOB_KUBE_SOCAT_IMAGE - Define the Job pod socat image.
  6. 6.
    JOB_KUBE_BUSYBOX_IMAGE - Define the Job pod busybox image.
  7. 7.
    JOB_KUBE_CURL_IMAGE - Define the Job pod curl image pull.
  8. 8.
    JOB_KUBE_NAMESPACE - Define the Kubernetes namespace Job pods are created in.

Worker

  1. 1.
    TEMPORAL_WORKER_PORTS - Define the local ports the Airbyte Worker pod uses to connect to the various Job pods. Port 9001 - 9040 are exposed by default in the Kustomize deployments.

Logging

Note that Airbyte does not support logging to separate Cloud Storage providers.
Please see here for more information on configuring Kuberentes logging.
  1. 1.
    GCS_LOG_BUCKET - Define the GCS bucket to store logs.
  2. 2.
    S3_BUCKET - Define the S3 bucket to store logs.
  3. 3.
    S3_RREGION - Define the S3 region the S3 log bucket is in.
  4. 4.
    S3_AWS_KEY - Define the key used to access the S3 log bucket.
  5. 5.
    S3_AWS_SECRET - Define the secret used to access the S3 log bucket.
  6. 6.
    S3_MINIO_ENDPOINT - Define the url Minio is hosted at so Airbyte can use Minio to store logs.
  7. 7.
    S3_PATH_STYLE_ACCESS - Set to true if using Minio to store logs. Empty otherwise.
Last modified 2h ago