Skip to main content

Record selector

The record selector is responsible for translating an HTTP response into a list of Airbyte records by extracting records from the response and optionally filtering and shaping records based on a heuristic. Schema:

  HttpSelector:
type: object
anyOf:
- "$ref": "#/definitions/RecordSelector"
RecordSelector:
type: object
required:
- extractor
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
extractor:
"$ref": "#/definitions/RecordExtractor"
record_filter:
"$ref": "#/definitions/RecordFilter"

The current record extraction implementation uses dpath to select records from the json-decoded HTTP response. For nested structures * can be used to iterate over array elements. Schema:

  DpathExtractor:
type: object
additionalProperties: true
required:
- field_path
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
field_path:
type: array
items:
type: string

Common recipes:

Here are some common patterns:

Selecting the whole response

If the root of the response is an array containing the records, the records can be extracted using the following definition:

selector:
extractor:
field_path: [ ]

If the root of the response is a json object representing a single record, the record can be extracted and wrapped in an array. For example, given a response body of the form

{
"id": 1
}

and a selector

selector:
extractor:
field_path: [ ]

The selected records will be

[
{
"id": 1
}
]

Selecting a field

Given a response body of the form

{
"data": [{"id": 0}, {"id": 1}],
"metadata": {"api-version": "1.0.0"}
}

and a selector

selector:
extractor:
field_path: [ "data" ]

The selected records will be

[
{
"id": 0
},
{
"id": 1
}
]

Selecting an inner field

Given a response body of the form

{
"data": {
"records": [
{
"id": 1
},
{
"id": 2
}
]
}
}

and a selector

selector:
extractor:
field_path: [ "data", "records" ]

The selected records will be

[
{
"id": 1
},
{
"id": 2
}
]

Selecting fields nested in arrays

Given a response body of the form


{
"data": [
{
"record": {
"id": "1"
}
},
{
"record": {
"id": "2"
}
}
]
}

and a selector

selector:
extractor:
field_path: [ "data", "*", "record" ]

The selected records will be

[
{
"id": 1
},
{
"id": 2
}
]

Filtering records

Records can be filtered by adding a record_filter to the selector. The expression in the filter will be evaluated to a boolean returning true if the record should be included.

In this example, all records with a created_at field greater than the stream slice's start_time will be filtered out:

selector:
extractor:
field_path: [ ]
record_filter:
condition: "{{ record['created_at'] < stream_slice['start_time'] }}"

Transformations

Fields can be added or removed from records by adding Transformations to a stream's definition.

Schema:

  RecordTransformation:
type: object
anyOf:
- "$ref": "#/definitions/AddFields"
- "$ref": "#/definitions/RemoveFields"

Adding fields

Fields can be added with the AddFields transformation. This example adds a top-level field "field1" with a value "static_value"

Schema:

  AddFields:
type: object
required:
- fields
additionalProperties: true
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
fields:
type: array
items:
"$ref": "#/definitions/AddedFieldDefinition"
AddedFieldDefinition:
type: object
required:
- path
- value
additionalProperties: true
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
path:
"$ref": "#/definitions/FieldPointer"
value:
type: string
FieldPointer:
type: array
items:
type: string

Example:

stream:
<...>
transformations:
- type: AddFields
fields:
- path: [ "field1" ]
value: "static_value"

This example adds a top-level field "start_date", whose value is evaluated from the stream slice:

stream:
<...>
transformations:
- type: AddFields
fields:
- path: [ "start_date" ]
value: { { stream_slice[ 'start_date' ] } }

Fields can also be added in a nested object by writing the fields' path as a list.

Given a record of the following shape:

{
"id": 0,
"data":
{
"field0": "some_data"
}
}

this definition will add a field in the "data" nested object:

stream:
<...>
transformations:
- type: AddFields
fields:
- path: [ "data", "field1" ]
value: "static_value"

resulting in the following record:

{
"id": 0,
"data":
{
"field0": "some_data",
"field1": "static_value"
}
}

Removing fields

Fields can be removed from records with the RemoveFields transformation.

Schema:

  RemoveFields:
type: object
required:
- field_pointers
additionalProperties: true
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
field_pointers:
type: array
items:
"$ref": "#/definitions/FieldPointer"

Given a record of the following shape:

{
"path":
{
"to":
{
"field1": "data_to_remove",
"field2": "data_to_keep"
}
},
"path2": "data_to_remove",
"path3": "data_to_keep"
}

this definition will remove the 2 instances of "data_to_remove" which are found in "path2" and "path.to.field1":

the_stream:
<...>
transformations:
- type: RemoveFields
field_pointers:
- [ "path", "to", "field1" ]
- [ "path2" ]

resulting in the following record:

{
"path":
{
"to":
{
"field2": "data_to_keep"
}
},
"path3": "data_to_keep"
}