Dbt Primer

dbt Primer #

A bite sized chunk of knowledge for using dbt.

Installation #

First up, install dbt tools from the command line.

pip install dbt-core

We’ll need to also specific adapters for the type of data platform (BigQuery, Snowflake, Spark, etc.):

Check the documentation for the types of supported data platforms.

Let’s say we’re using BigQuery:

pip install dbt-bigquery

Check version with:

dbt --version

Profile #

The profile defines how to connect a data platform.

This exists in the home directory’s .dbt/ directory in file profiles.yml. The top line should match the project name (from above).

(As always, check out the documentation for more details).

For a BigQuery implementation, the profile might look something like:

<project name>:
    target: test

    outputs:
        test:
            type: bigquery
            method: service-account
            keyfile: <path>.json
            project: <project id>
            dataset: dbt_<name>
            threads: 1
            timeout_seconds: 300
            location: US
            priority: interactive # or interactive

        dev:
            ...

        prod:
            ...

Where ^ project name would relate to the GCP project name you’ve set up, dataset is the BigQuery dataset name (within the database).

To check that the connection works, run dbt debug. If everything works, the tests should pass.

Projects #

To create a dbt project:

dbt init <project name>

You’ll be asked to select from a list of choices the data platform you want to use (based on the adapters that you previously installed.)

Once it gets created, there will be a skeleton full of files and directories.

dbt_project.yml - every project needs one. This file dictates how the project works.

This ^ will include a chunk that handles models (see below). The + is recursive downstream.

models:
    <project>:
        <models sub-directory>:
            +materialized: view # or table
            <models sub-sub-directory, assuming there's no + in the parent dir specification>:
                materialized: table

Models #

Basically a bunch of SQL files. Each file is a model (table).

models folder in the project contains all the SQL files. This is truly the heart of dbt.

Common formats int he SQL files are:

with <something> as (
    select ...
)

select *
from <something>

Models can reference other models. For instance, if there’s something called model_one.sql, another sql file could include something like:

select * 
from {{ ref(model_one) }}
where id = <some unique identifier>

Staging #

Run #

dbt run commmand line run executes models in the models directory.

For a full refresh:

dbt run --full-refresh

For select updates of particular models:

dbt run --select  <model name(s)>

Materialization #

Specification of how something gets created, such as as a table.

Jinja #

Macros galore.

Testing #

Validate that the models are working correctly.

schema.ymlversion: 2 at the top

Beneath that,

models:

- name: <model/table name>
    columns:
        - name: <name>
            tests:
                - unique
                - not_null

Run tests with:

dbt test

Documentation #

To build dbt documentation:

dbt docs generate

Launch it locally with:

dbt docs serve

There’s a handy graph to show the relationship between the models.