Deploy a data plane with Airbox in Enterprise Flex
Airbox is Airbyte's command line tool for managing Airbyte data planes on Kubernetes. It's the ideal way to deploy and manage data planes for teams that have limited Kubernetes expertise or don't want to deploy with Helm.
At the end of this guide, you'll have an Airbyte workspace that runs connections using a self-managed data plane.
Prerequisites
Before you begin, ensure you satisfy all of these requirements.
Subscription and permission requirements
- An active subscription to Airbyte Enterprise Flex
- You must be an organization Admin to manage data planes
Infrastructure requirements
-
A single-node on which to deploy your data plane. This can be a virtual machine from a Cloud provider, a bare metal server, or even your local computer.
- Minimum specs: 8 CPUs and 16 GB of RAM
- Recommended specs: 8 CPUs and 24 GB of RAM
Software requirements
-
Docker Desktop or Docker Engine (installation is described below)
-
To manage and monitor your data plane after installation, you should also install these command line tools, although this isn't strictly necessary.
Security considerations
- Self-managed data planes require egress from your network to Airbyte's managed control plane.
- Self-managed data planes only send requests to the control plane. The control plane must be able to send responses to the data plane, but not requests.
Workspaces
You should already have considered what regions and workspaces you need to satisfy your compliance and data sovereignty needs.
Part 1. Install Airbox
You can install Airbox as a binary. Downloads are available for Windows, Mac, and Linux.
Part 2: Install Docker Desktop
Install Docker Desktop on the machine that will host your data plane. Follow the steps for your operating system in Docker's online help, linked below.
- Mac
- Windows
- Linux - If you're installing on a Linux headless virtual machine, it's easier to use Docker Engine instead of Docker Desktop.
You don't need to do anything with Docker, but you do need to run it in the background. Once it's open, minimize it and proceed to Part 3.
Airbyte runs on Kubernetes. When you deploy your data plane, Airbyte uses Docker to create a Kubernetes cluster on the computer hosting the data plane.
Part 3: Set credentials
You need an Airbyte application so Airbox can access your control plane. If you don't have one, create one and note the Client ID and Client Secret. If you already have one, you can skip creating one and reuse your existing credentials.
-
In Airbyte's UI, click your user name > User Settings > Applications > Create an application.
-
Enter a descriptive application name. For example, "Data plane deployment." Airbyte creates your application. Note the Client ID and Client Secret.
-
In your terminal, set the application credentials you created as environment variables.
export AIRBYTE_CLIENT_ID="<CLIENT_ID>"
export AIRBYTE_CLIENT_SECRET="<CLIENT_SECRET>"
Part 4: Configure Airbox
After you enter your client ID and client secret, configure Airbox to access your Cloud control plane.
-
Configure Airbox to interact with your Airbyte control plane.
airbox config init
-
Select Enterprise Flex and press Enter
Part 5: Authenticate with Airbyte
After configuring Airbyte, but before you can manage data planes, you must authenticate with it. You can also log out and, if you work in multiple organizations, switch between them.
Log in
After you configure Airbyte, authenticate with it. Run the following command so Airbox can use the client ID and client secret you set earlier to authenticate with your Airbyte environment.
airbox auth login
You see the following result.
Authenticating with Airbyte
Connecting to: https://api.airbyte.com
Successfully authenticated!
Continue to Part 6.
Log out
If you need to clear the authentication token Airbox uses to access your data plane, log out.
airbox auth logout
This doesn't remove the client ID and client secret from Airbyte. If you need to rotate credentials, you must also delete your application.
Switch organizations
-
If you use multiple Airbyte organizations, you can switch between them with the following command.
airbox auth switch-organization
If you belong to multiple organizations, Airbox shows you that list. If not, Airbox automatically sets you to your single organization again.
-
Choose the new organization you want to connect to and press Enter.
Part 6: Deploy a data plane
After you authenticate with Airbyte, run the install command. This begins a process that creates a new kind cluster in Docker, registers the data plane with Airbyte's managed control plane, and deploys the data plane for use.
-
Install your data plane.
airbox install dataplane
-
Follow the prompts in the terminal.
-
Choose whether you want to create a new region or use an existing one (if you have some).
tipTo avoid confusion later, your regions in Airbyte should reflect the actual regions your data planes run in. For example, if you are installing this data plane in the AWS
us-west-1
region, you may wish to call itus-west-1
or something similar. -
Name your data plane.
The process looks similar to this.
$ airbox install dataplane
Starting interactive dataplane installation
Select region option:
Use existing region
> Create new region
Enter new region name:
> us-west-1
Enter dataplane name:
> us-west-1-dataplane-1
Dataplane Credentials:
DataplaneID: <dataplane_ID>
ClientID: <client_ID>
ClientSecret: <client_secret>
Dataplane 'us-west-1-dataplane-1' installed successfully! -
Part 7: Assign a workspace to your data plane
If this data plane is in a new region, or you want a workspace to use this region now, in Airbyte's UI, follow these steps.
-
Click Workspace settings > General.
-
Under Region, select the region you created that contains your data plane.
Part 8: Verify your data plane is running correctly
Once you assign your workspace to your data plane, verify that data plane runs syncs and creates pods correctly.
-
Create a connection.
-
Add the Sample Data source, which generates non-sensitive sample data.
-
Add the End-to-End Testing (/dev/null) destination if you don't need to see the data. If you want to see the data in the destination, Google Sheets is also a good option that's easy to set up.
-
Create a connection between that source and destination.
-
-
In your terminal, run
watch kubectl get po
orkubectl get po -w
. This allows you to watch pods progress in your Kubernetes cluster. -
In Airbyte's UI, start the sync.
-
Watch pods start, make progress, and complete. You should see something similar to this.
NAME READY STATUS RESTARTS AGE
us-west-1-airbyte-data-plane-c8858dd77-t55wn 1/1 Running 0 41m
replication-job-49346750-attempt-0 0/3 Completed 0 20m
source-faker-discover-49350414-0-cxrhx 0/2 Pending 0 0s
source-faker-discover-49350414-0-cxrhx 0/2 Pending 0 1s
source-faker-discover-49350414-0-cxrhx 0/2 Init:0/1 0 1s
source-faker-discover-49350414-0-cxrhx 0/2 Init:0/1 0 2s
source-faker-discover-49350414-0-cxrhx 0/2 PodInitializing 0 9s
source-faker-discover-49350414-0-cxrhx 2/2 Running 0 10s
source-faker-discover-49350414-0-cxrhx 1/2 NotReady 0 13s
source-faker-discover-49350414-0-cxrhx 0/2 Completed 0 19s
replication-job-49350414-attempt-0 0/3 Pending 0 0s
replication-job-49350414-attempt-0 0/3 Pending 0 0s
replication-job-49350414-attempt-0 0/3 Init:0/1 0 0s
replication-job-49350414-attempt-0 0/3 Init:0/1 0 1s
source-faker-discover-49350414-0-cxrhx 0/2 Completed 0 20s
replication-job-49350414-attempt-0 0/3 PodInitializing 0 17s
replication-job-49350414-attempt-0 3/3 Running 0 18s
replication-job-49350414-attempt-0 2/3 NotReady 0 31s
replication-job-49346750-attempt-0 0/3 Completed 0 29m
replication-job-49346750-attempt-0 0/3 Completed 0 29m
source-faker-discover-49350414-0-cxrhx 0/2 Completed 0 12m
source-faker-discover-49350414-0-cxrhx 0/2 Completed 0 12m -
In Airbyte's UI, ensure the sync completes and populates the expected number of records, based on your settings for the Sample Data source.
Where Airbox stores configuration files
Airbox stores configuration data in ~/.airbyte/airbox/config.yaml
. This includes:
- Authentication credentials
- Context settings
- Organization and workspace IDs
Restart a data plane
As long as Docker Desktop is running in the background, your data plane remains available. If you quit Docker Desktop or restart your virtual machine and want to restore your data plane, start Docker Desktop again. Once your containers are running, your data plane can resume work.
Values.yaml not currently supported
Airbox doesn't currently support deployment customization with values.yaml files.