Skip to main content
Version: Next

Rejected records

AvailableCloud AvailableSelf-Managed CommunityAvailableSelf-Managed Enterprise Compare

When syncing data to a data activation destination, you may encounter rejected records. Rejected records are records Airbyte was unable to sync to your destination, even though the sync itself was otherwise successful.

Why records get rejected

Records become rejected because they don't conform to the schema of the destination. The underlying reasons for this can be complex.

  • The destination requires a field, but that field is empty in the source.
  • The destination requires a field to be in a certain format, but that field is in an incompatible format in the source.
  • The destination requires a field to be unique, but that field isn't unique in the source.
  • A transformation error has corrupted a record at an earlier stage of your data pipeline.
  • Many other issues.

Look at the following example.

IDFirst NameLast NamePhone NumberAddress
123AlphonsoMariyam123-456-7890123 Fake Street
456EmeraldSanja234-567-8901456 Fake Street
789Sebastian Argyos345-678-9012789 Fake Street

Imagine you want to move this data into your CRM, Salesforce. However, your Salesforce object requires that everyone has a first and last name. In this case, Sebastian Argyos' last name has been combined with his first name. From Salesforce's perspective, he doesn't have a last name. As a result, it rejects this record.

Where rejected records go

Rejected records go into an S3 bucket, if you've configured one. You configure this bucket when you set up your destination. If you haven't configured one yet, you can do this later on, and rejected records begin to populate with subsequent syncs.

You should decide on a strategy for managing these records at scale. You might want to populate all of them to a single bucket for ease of observability, or you may want different destinations to use different buckets.

Find out if the destination rejected records after a sync

Airbyte shows you rejected records on the connections Timeline page and the sync summary in the log for each sync.

If you've configured a storage bucket for rejected records, Airbyte links to it on the Timeline.

Screenshot of rejected records in the connection Timeline

You can also monitor logs for them.

snowflake_salesforce_logs_12345_txt.txt
Sync summary: {
// ...
"totalStats" : {
// ...
"recordsRejected" : 1000
},
"streamStats" : [ {
"streamName" : "USERS",
"streamNamespace" : "DATA_PRODUCT",
"stats" : {
// ...
"recordsRejected" : 1000
}
} ],
"performanceMetrics" : {
"mappers" : {
"field-renaming" : 0
}
}
}

When Airbyte can't display rejected record statistics

Airbyte can only display rejected records statistics and a link to your storage bucket if the source connector sends state messages back to Airbyte correctly.

Regardless of whether the connector reports statistics back to Airbyte, rejected records are still populated in your storage bucket if you set this up in the destination connector.

Fixing rejected records so Airbyte can sync them

In most cases, it's important to repair rejected records if you can. They may contain valuable data that you want to sync, and in large numbers, can erode the effectiveness of your data activation initiative.

You can repair rejected records in your source data warehouse or the upstream source that syncs to your data warehouse. Once you repair them, Airbyte can process them again during your next sync.