Watermarks and Data Warehousing

Watermarks #

The term “watermark” in data engineering borrows from the notion of markings on water level sticks to assess how high water levels are.

In the context of data engineering, a watermark refers to how high the data gets in a data store - the target. The watermark refers to the number of distinct records that have been loaded into a particular data store.

The data in that data store would have originated from somewhere else - the data source.

Once the watermark is established, any incremental additions of data are added on top of the established watermark.

Note: this is a veeeeery colloquial explanation of watermarks, but it’s simple and I think it conceptually gets the point across.

References #