Watermarks #
The term “watermark” in data engineering borrows from the notion of markings on water level sticks to assess how high water levels are.
In the context of data engineering, a watermark refers to how high the data gets in a data store - the target. The watermark refers to the number of distinct records that have been loaded into a particular data store.
The data in that data store would have originated from somewhere else - the data source.
Once the watermark is established, any incremental additions of data are added on top of the established watermark.
Note: this is a veeeeery colloquial explanation of watermarks, but it’s simple and I think it conceptually gets the point across.
References #
-
Watermark in Data Warehousing by Vincent Rainardi