dbt Sources

dbt Sources #

Sources are the raw data tables that dbt builds upon.

Definition #

Sources get defined in the dbt_project.yml file.

source-paths: ["models"]

^ The path to source data can be changed.

The path structure might look like:

./models/<source directory>/schema.yml

Where <source directory> might be something like Finance, Ops, Sales, Customers, etc. Anything really.

The schema.yml in the source directory:

  • Describes the

Example of a schema.yml file:

version: 2 #why '2'?

sources:
    - name: <source name, often matches database name> # essential; everything else is optional
      description: '<text>'
      database: <database name, in BigQuery, Snowflake, etc.>
      schema: <schema name>
      tables:
            - name: <table name> # essential
              description: '<text>'
                columns: ...
                

Next to the schema.sql is a model, like <data model>.sql, such as ./models/<source directory>/<data model>.sql

References #

Sources can be referenced later on in data models.

select * from {{ source(<source_name>, <table_name>)}} 

Where source_name and table_name are named in the schema.yml discussed above. Be sure that these are truly unique.

The whole thing might look like:


with <cte_alias> (
  select * from {{ source(<source_name>, <table_name>)}} 
),

final as (
  select * from <cte_alias>
)

select * from final

Resources: #