dbt version 0.4.1
dbt v0.4.1 provides improvements to incremental models, performance improvements, and ssh support for db connections.
0. tl;dr
- slightly modified dbt command structure
unique_key
setting for incremental models- connect to your db over ssh
- no more model-defaults
- multithreaded schema tests
If you encounter an SSL/cryptography error while upgrading to this version of dbt, check that your version of pip is up-to-date
pip install -U pip
pip install -U dbt
1. new dbt command structure #109
# To run models
dbt run # same as before
# to dry-run models
dbt run --dry # previously dbt test
# to run schema tests
dbt test # previously dbt test --validate
2. Incremental model improvements #101
Previously, dbt calculated "new" incremental records to insert by querying for rows which matched some sql_where
condition defined in the model configuration. This works really well for atomic datasets like a clickstream event log -- once inserted, these records will never change. Other datasets, like a sessions table comprised of many pageviews for many users, can change over time. Consider the following scenario:
User 1 Session 1 Event 1 @ 12:00
User 1 Session 1 Event 2 @ 12:01
-- dbt run --
User 1 Session 1 Event 3 @ 12:02
In this scenario, there are two possible outcomes depending on the sql_where
chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate!
With this release, you can now add a unique_key
expression to an incremental model config. Records matching the unique_key
will be delete
d from the incremental table, then insert
ed as usual. This makes it possible to maintain data accuracy without recalculating the entire table on every run.
The unique_key
can be any expression which uniquely defines the row, eg:
sessions:
materialized: incremental
sql_where: "session_end_tstamp > (select max(session_end_tstamp) from {{this}})"
unique_key: user_id || session_index
3. Run schema validations concurrently #100
The threads
run-target config now applies to schema validations too. Try it with dbt test
4. Connect to database over ssh #93
Add an ssh-host
parameter to a run-target to connect to a database over ssh. The ssh-host
parameter should be the name of a Host
in your ~/.ssh/config
file more info
warehouse:
outputs:
dev:
type: redshift
host: my-redshift.amazonaws.com
port: 5439
user: my-user
pass: my-pass
dbname: my-db
schema: dbt_dbanin
threads: 8
ssh-host: ssh-host-name # <------ Add this line
run-target: dev
Remove the model-defaults config #111
The model-defaults
config doesn't make sense in a dbt world with dependencies. To apply default configs to your package, add the configs immediately under the package definition:
models:
My_Package:
enabled: true
materialized: table
snowplow:
...