dbt models #
dbt models are specifications of the data model.
Model files #
Generally, model files are sql scripts with Jinja included.
with <CTE_name> as (
SELECT ...
FROM ...
WHERE ...
)
SELECT *
FROM <CTE_name>
Best Practices #
Every model must have a unique model. A common way to ensure uniqueness is to prepend the model name with the directory.
models/
models/<data directory>/
modes<data directory>/<data_directory>__<model_name>.sql
Incremental Models #
Rebuild only the most recently updated data sources, rather than having to take the slow and costly apporach of rebuilding the entirety of the data sources.
This way, only new records are updated - limiting the amount of data that needs to be transformed.
An example of a snippet of something to run incrementally, possibly located at models/<model>.sql
:
{{
config(
materialized='incremental'
)
}}
<some sort of query to run if incremenetal records found>
{% if is_incremeneta() %} --Jinja statement
<some query filter to identify incremental records>
-- e.g., where event_time > (select (max(event_time) from {{this}})
-- where {{this}} is the table under consideration
{% endif %}
This ^ above could be structed as a common table expression (CTE).
is_incremental() macro #
The is_incremental()
macro returns True
if certain conditions are met:
-
Destination table exists
-
dbt is not run in
full refresh
model -
The model in question is configured with
materialized='incremental
.
incremental strategy #
full refresh #
Sometimes, we need to bypass incremental builds by running a full model refresh.
dbt run -m <model> --full-refresh
Intermediate Models #
Intermediate models are precursor models leading into marts.
Intermediate models effectively offload complexity and processing to a precursor to the marts.
The flow might look something like:
Sources -> Staging -> Intermediate -> Marts
Sources -> Staging -^
Implementation #
Intermediate models are usually found in models/intermediate/<domain>
, where domain
refers to something like customers
or sales
. Intermediate models are prepended with int_
.
models/intermediate
models/intermediate/<domain>
models/intermediate/<domain>/int_<domain>_models.yml
models/intermediate/<domain>/int_<descriptor>.sql
Ephemeral #
Intermediate models can be materialized in an ephemeral manner, such that they effectively disappear when they’re done being used in the overall data flow.
Think of these like CTEs.
It helps keep things clean, but might make debugging tougher since there’s no artificact to examine.
To make a dbt model ephemeral, modify dbt_project.yml
such that it contains:
models:
dbt_training:
...
intermedaite:
+materialized: ephemeral