-
Notifications
You must be signed in to change notification settings - Fork 0
How to use dbt with Databend Cloud
This is a primer tutorial about how to use dbt-databend-cloud. Here we use the official dbt tutorial as an example to introduce the use of Databend in dbt together. The software used in this example and its version requirements:
- dbt 1.01 or upper
- dbt-databend-cloud 0.0.1 Installation Installing dbt and dbt-databend-cloud requires only one command because dbt is installed as a dependency when we install dbt-databend-cloud. pip3 install dbt-databend-cloud dbt can also be installed by yourself. Please refer to the official installation guide. Creating project: jaffle shop The jaffle_shop is a project provided by dbt-lab to demonstrate dbt functionality. You can get it directly from GitHub. $ git clone https://github.com/dbt-labs/jaffle_shop $ cd jaffle_shop All files in the jaffle_shop project directory are as follows.
- dbt_project.yml is the dbt project configuration file, which holds the project name, database configuration file path information, etc.
- models directory contains the SQL model and table schema for the project. Note that this section was written by the data analyst himself.
- seed directory stores CSV files. Such CSV files can be dumped from database export tools. In the jaffle shop project, these CSV files are used as raw data to be processed. I'll go into more detail about them later when I use one of the files or directories above. Configuring project
- Global Configuration dbt has a default global profile: ~/.dbt/profiles.yml, which we first set up in the user directory and configured the connection information for the Databend database. If you do not have ~/.dbt/profiles.yml please create it first. jaffle_shop_databend: target: dev outputs: dev: type: databend host: tnc7yee14--query-perf.ch.datafusecloud.com port: 443 schema: default user: cloudapp pass: ******** You can find your connect host, user, pass information by reading Connecting to a Warehouse | Databend Cloud .
- Project configuration In the jaffle_shop project directory, the project configuration file dbt_project.yml is available. Change the profile configuration item to jaffle_shop_databend, which is the project name in profiles.yml. Then, the project will query the database connection configuration in the ~/.dbt/profiles.yml file. cat dbt_project.yml name: 'jaffle_shop_databend'
config-version: 2 version: '0.1'
profile: 'jaffle_shop_databend'
model-paths: ["models"] seed-paths: ["seeds"] test-paths: ["tests"] analysis-paths: ["analysis"] macro-paths: ["macros"]
target-path: "target" clean-targets: - "target" - "dbt_modules" - "logs"
require-dbt-version: [">=1.0.0", "<2.0.0"]
models:
jaffle_shop:
materialized: table
staging:
materialized: view
3. Verify the configuration
You can run dbt debug
to check whether the database and project configuration is correct.
[图片]
Loading CSV
Load the CSV data and materialize the CSV as a table in the target database.
Note: In general, dbt projects do not need this step because the data for your pending projects is in the database.
dbt seed
03:04:20 Running with dbt=1.3.0
03:04:20 Unable to do partial parsing because profile has changed
03:04:20 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.jaffle_shop
- models.jaffle_shop.staging
03:04:20 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 282 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics 03:04:20 03:04:29 Concurrency: 1 threads (target='dev') 03:04:29 03:04:29 1 of 3 START seed file sjh_dbt.raw_customers ................................... [RUN] sync ec1e4a32-9eb6-4f8c-88d7-35c40cfa89d1.csv duration:9s 03:04:42 1 of 3 OK loaded seed file sjh_dbt.raw_customers ............................... [INSERT 100 in 12.85s] 03:04:42 2 of 3 START seed file sjh_dbt.raw_orders ...................................... [RUN] sync 3fd97eae-d20b-4d92-9dd7-9926e3ae4c86.csv duration:9s 03:04:55 2 of 3 OK loaded seed file sjh_dbt.raw_orders .................................. [INSERT 99 in 12.54s] 03:04:55 3 of 3 START seed file sjh_dbt.raw_payments .................................... [RUN] sync 8da2b4b5-78c7-4230-b8c6-c1f030bdcb8f.csv duration:9s 03:05:07 3 of 3 OK loaded seed file sjh_dbt.raw_payments ................................ [INSERT 113 in 12.77s] 03:05:07 03:05:07 Finished running 3 seeds in 0 hours 0 minutes and 47.17 seconds (47.17s). 03:05:07 03:05:07 Completed successfully 03:05:07 03:05:07 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3 In the above result, it is clear that three tasks have been performed and three tables ( sjh_dbt.raw_customers, sjh_dbt.raw_orders, sjh_dbt.raw_payments) were loaded. Next, go to the Databend Cloud database to see what happens. We found an extra sjh_dbt database, which is the engineering database that dbt created for us. [图片]
There are three tables in sjh_dbt database, corresponding to the above three task results respectively. [图片] model Before skipping to the next step, it's important to understand what role the model plays in dbt. In dbt, models are used to describe the structure of a set of tables or views, and there are two main types of files: SQL and YML. Also note that in the jaffle_shop project, table structures are held under the models/ directory and view structures are held under the models/staging/ directory according to the materialization configuration. For example, models/orders.sql is an SQL query statement with jinja syntax. We will create a table by this query statement. $ cat models/orders.sql {% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
with orders as (
select * from {{ ref('stg_orders') }}
),
payments as (
select * from {{ ref('stg_payments') }}
),
order_payments as (
select
order_id,
{% for payment_method in payment_methods -%}
sum(case when payment_method = '{{ payment_method }}' then amount else 0 end) as {{ payment_method }}_amount,
{% endfor -%}
sum(amount) as total_amount
from payments
group by order_id
),
final as (
select
orders.order_id,
orders.customer_id,
orders.order_date,
orders.status,
{% for payment_method in payment_methods -%}
order_payments.{{ payment_method }}_amount,
{% endfor -%}
order_payments.total_amount as amount
from orders
left join order_payments
on orders.order_id = order_payments.order_id
)
select * from final Also, the constraint information that goes with this SQL is in the models/schema.yml file. The schema.yml is a registry of all models in the current directory. All models are organized into a tree structure that describes the description and attributes of each field. The tests items represent constraints for this field, which can be tested using the dbt test command. See the official documentation for more information. Running The result shows three views (sjh_dbt.stg_customers, sjh_dbt.stg_orders, sjh_dbt.stg_payments) and two tables (sjh_dbt.customers, sjh_dbt.orders). dbt seed 03:04:20 Running with dbt=1.3.0 03:04:20 Unable to do partial parsing because profile has changed 03:04:20 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources. There are 2 unused configuration paths:
- models.jaffle_shop
- models.jaffle_shop.staging
03:04:20 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 282 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics 03:04:20 03:04:29 Concurrency: 1 threads (target='dev') 03:04:29 03:04:29 1 of 3 START seed file sjh_dbt.raw_customers ................................... [RUN] sync ec1e4a32-9eb6-4f8c-88d7-35c40cfa89d1.csv duration:9s 03:04:42 1 of 3 OK loaded seed file sjh_dbt.raw_customers ............................... [INSERT 100 in 12.85s] 03:04:42 2 of 3 START seed file sjh_dbt.raw_orders ...................................... [RUN] sync 3fd97eae-d20b-4d92-9dd7-9926e3ae4c86.csv duration:9s 03:04:55 2 of 3 OK loaded seed file sjh_dbt.raw_orders .................................. [INSERT 99 in 12.54s] 03:04:55 3 of 3 START seed file sjh_dbt.raw_payments .................................... [RUN] sync 8da2b4b5-78c7-4230-b8c6-c1f030bdcb8f.csv duration:9s 03:05:07 3 of 3 OK loaded seed file sjh_dbt.raw_payments ................................ [INSERT 113 in 12.77s] 03:05:07 03:05:07 Finished running 3 seeds in 0 hours 0 minutes and 47.17 seconds (47.17s). 03:05:07 03:05:07 Completed successfully 03:05:07 03:05:07 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3 ❯ dbt run 03:29:28 Running with dbt=1.3.0 03:29:28 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources. There are 2 unused configuration paths:
- models.jaffle_shop.staging
- models.jaffle_shop
03:29:28 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 282 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics 03:29:28 03:29:38 Concurrency: 1 threads (target='dev') 03:29:38 03:29:38 1 of 5 START sql view model sjh_dbt.stg_customers .............................. [RUN] 03:29:52 1 of 5 OK created sql view model sjh_dbt.stg_customers ......................... [OK in 13.76s] 03:29:52 2 of 5 START sql view model sjh_dbt.stg_orders ................................. [RUN] 03:30:08 2 of 5 OK created sql view model sjh_dbt.stg_orders ............................ [OK in 16.28s] 03:30:08 3 of 5 START sql view model sjh_dbt.stg_payments ............................... [RUN] 03:30:22 3 of 5 OK created sql view model sjh_dbt.stg_payments .......................... [OK in 13.62s] 03:30:22 4 of 5 START sql view model sjh_dbt.customers .................................. [RUN] 03:30:35 4 of 5 OK created sql view model sjh_dbt.customers ............................. [OK in 13.29s] 03:30:35 5 of 5 START sql view model sjh_dbt.orders ..................................... [RUN] 03:30:48 5 of 5 OK created sql view model sjh_dbt.orders ................................ [OK in 13.32s] 03:30:48 03:30:48 Finished running 5 view models in 0 hours 1 minutes and 19.96 seconds (79.96s). 03:30:48 03:30:48 Completed successfully 03:30:48 03:30:48 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5 Go to the Databend database to verify that the creation is successful. The result illustrates that five more tables or views, such as customers, have been added, and the data in the tables or views have been transformed. [图片] Generating doc dbt also supports the generation of visual documents, using the following command.
- Generate dbt docs generate 04:01:20 Running with dbt=1.3.0 04:01:20 Unable to do partial parsing because profile has changed 04:01:20 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources. There are 2 unused configuration paths:
- models.jaffle_shop
- models.jaffle_shop.staging
04:01:20 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 282 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics
04:01:20
04:01:26 Concurrency: 1 threads (target='dev')
04:01:26
04:01:26 Done.
04:01:26 Building catalog
04:01:33 Catalog written to /Users/hanshanjie/git-works/jaffle_shop/target/catalog.json
2. Start server
Run dbt docs serve
.
The document, which contains the overall structure of the jaffle_shop project and a description of all the tables and views, can be viewed in a browser.
[图片]
Conclusion
The use of Databend Cloud in dbt includes the following steps:
- Install dbt and dbt-databend-cloud
- Configuration
- Write SQL and YML files
- Running