dbt Models

dbt models #

dbt models are specifications of the data model.

Model files #

Generally, model files are sql scripts with Jinja included.

with <CTE_name> as (
    SELECT ...
    FROM ...
    WHERE ...
)

SELECT *
FROM <CTE_name>

Best Practices #

Every model must have a unique model. A common way to ensure uniqueness is to prepend the model name with the directory.

models/
models/<data directory>/
modes<data directory>/<data_directory>__<model_name>.sql

Incremental Models #

Rebuild only the most recently updated data sources, rather than having to take the slow and costly apporach of rebuilding the entirety of the data sources.

This way, only new records are updated - limiting the amount of data that needs to be transformed.

An example of a snippet of something to run incrementally, possibly located at models/<model>.sql:

{{

    config(
        materialized='incremental'
    )
}}

    <some sort of query to run if incremenetal records found>

{% if is_incremeneta() %} --Jinja statement

    <some query filter to identify incremental records>
    -- e.g., where event_time > (select (max(event_time) from {{this}})
    -- where {{this}} is the table under consideration

{% endif %}

This ^ above could be structed as a common table expression (CTE).

is_incremental() macro #

The is_incremental() macro returns True if certain conditions are met:

  • Destination table exists

  • dbt is not run in full refresh model

  • The model in question is configured with materialized='incremental.

incremental strategy #

full refresh #

Sometimes, we need to bypass incremental builds by running a full model refresh.

dbt run -m <model> --full-refresh

Intermediate Models #

Intermediate models are precursor models leading into marts.

Intermediate models effectively offload complexity and processing to a precursor to the marts.

The flow might look something like:

Sources -> Staging -> Intermediate -> Marts
Sources -> Staging -^

Implementation #

Intermediate models are usually found in models/intermediate/<domain>, where domain refers to something like customers or sales. Intermediate models are prepended with int_.

models/intermediate
models/intermediate/<domain>
models/intermediate/<domain>/int_<domain>_models.yml
models/intermediate/<domain>/int_<descriptor>.sql

Ephemeral #

Intermediate models can be materialized in an ephemeral manner, such that they effectively disappear when they’re done being used in the overall data flow.

Think of these like CTEs.

It helps keep things clean, but might make debugging tougher since there’s no artificact to examine.

To make a dbt model ephemeral, modify dbt_project.yml such that it contains:

models:
  dbt_training:
    ...
    intermedaite:
      +materialized: ephemeral