You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/google-dataflow/templates.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ Google Dataflow templates provide a convenient way to execute prebuilt, ready-to
22
22
23
23
## How to Run Dataflow Templates {#how-to-run-dataflow-templates}
24
24
25
-
As of today, the ClickHouse official template is available via the Google Cloud CLI or Dataflow REST API.
25
+
As of today, the ClickHouse official template is available via the Google Cloud Console, CLI or Dataflow REST API.
26
26
For detailed step-by-step instructions, refer to the [Google Dataflow Run Pipeline From a Template Guide](https://cloud.google.com/dataflow/docs/templates/provided-templates).
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md
+49-12Lines changed: 49 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -9,19 +9,25 @@ title: 'Dataflow BigQuery to ClickHouse template'
9
9
import TOCInline from '@theme/TOCInline';
10
10
import Image from '@theme/IdealImage';
11
11
import dataflow_inqueue_job from '@site/static/images/integrations/data-ingestion/google-dataflow/dataflow-inqueue-job.png'
12
+
import dataflow_create_job_from_template_button from '@site/static/images/integrations/data-ingestion/google-dataflow/create_job_from_template_button.png'
13
+
import dataflow_template_clickhouse_search from '@site/static/images/integrations/data-ingestion/google-dataflow/template_clickhouse_search.png'
14
+
import dataflow_template_initial_form from '@site/static/images/integrations/data-ingestion/google-dataflow/template_initial_form.png'
15
+
import dataflow_extended_template_form from '@site/static/images/integrations/data-ingestion/google-dataflow/extended_template_form.png'
16
+
import Tabs from '@theme/Tabs';
17
+
import TabItem from '@theme/TabItem';
12
18
13
19
# Dataflow BigQuery to ClickHouse template
14
20
15
-
The BigQuery to ClickHouse template is a batch pipeline that ingests data from BigQuery table into ClickHouse table.
16
-
The template can either read the entire table or read specific records using a provided query.
21
+
The BigQuery to ClickHouse template is a batch pipeline that ingests data from a BigQuery table into a ClickHouse table.
22
+
The template can read the entire table or filter specific records using a provided SQL query.
* The ClickHouse host Must be accessible from the Dataflow worker machines.
30
+
* The ClickHouse host must be accessible from the Dataflow worker machines.
25
31
26
32
## Template Parameters {#template-parameters}
27
33
@@ -33,7 +39,7 @@ The template can either read the entire table or read specific records using a p
33
39
|`jdbcUrl`| The ClickHouse JDBC URL in the format `jdbc:clickhouse://<host>:<port>/<schema>`. | ✅ | Don't add the username and password as JDBC options. Any other JDBC option could be added at the end of the JDBC URL. For ClickHouse Cloud users, add `ssl=true&sslmode=NONE` to the `jdbcUrl`. |
34
40
|`clickHouseUsername`| The ClickHouse username to authenticate with. | ✅ ||
35
41
|`clickHousePassword`| The ClickHouse password to authenticate with. | ✅ ||
36
-
|`clickHouseTable`| The target ClickHouse table name to insert the data to. | ✅ ||
42
+
|`clickHouseTable`| The target ClickHouse table into which data will be inserted.| ✅ ||
37
43
|`maxInsertBlockSize`| The maximum block size for insertion, if we control the creation of blocks for insertion (ClickHouseIO option). || A `ClickHouseIO` option. |
38
44
|`insertDistributedSync`| If setting is enabled, insert query into distributed waits until data will be sent to all nodes in cluster. (ClickHouseIO option). || A `ClickHouseIO` option. |
39
45
|`insertQuorum`| For INSERT queries in the replicated table, wait writing for the specified number of replicas and linearize the addition of the data. 0 - disabled. || A `ClickHouseIO` option. This setting is disabled in default server settings. |
@@ -49,16 +55,15 @@ The template can either read the entire table or read specific records using a p
49
55
50
56
51
57
:::note
52
-
All `ClickHouseIO` parameters default values could be found in [`ClickHouseIO` Apache Beam Connector](/integrations/apache-beam#clickhouseiowrite-parameters)
58
+
Default values for all `ClickHouseIO` parameters can be found in [`ClickHouseIO` Apache Beam Connector](/integrations/apache-beam#clickhouseiowrite-parameters)
53
59
:::
54
60
55
61
## Source and Target Tables Schema {#source-and-target-tables-schema}
56
62
57
-
In order to effectively load the BigQuery dataset to ClickHouse, and a column infestation process is conducted with the
58
-
following phases:
63
+
To effectively load the BigQuery dataset into ClickHouse, the pipeline performs a column inference process with the following phases:
59
64
60
65
1. The templates build a schema object based on the target ClickHouse table.
61
-
2. The templates iterate over the BigQuery dataset, and tried to match between column based on their names.
66
+
2. The templates iterate over the BigQuery dataset, and attempts to match columns based on their names.
62
67
63
68
<br/>
64
69
@@ -92,6 +97,36 @@ requirements and prerequisites.
- If not already installed, install the [`gcloud` CLI](https://cloud.google.com/sdk/docs/install).
@@ -134,6 +169,9 @@ job:
134
169
startTime: '2025-01-26T14:34:04.608442Z'
135
170
```
136
171
172
+
</TabItem>
173
+
</Tabs>
174
+
137
175
### Monitor the Job {#monitor-the-job}
138
176
139
177
Navigate to the [Dataflow Jobs tab](https://console.cloud.google.com/dataflow/jobs) in your Google Cloud Console to
@@ -147,9 +185,8 @@ monitor the status of the job. You'll find the job details, including progress a
147
185
148
186
This error occurs when ClickHouse runs out of memory while processing large batches of data. To resolve this issue:
149
187
150
-
* Increase the instance resources: Upgrade your ClickHouse server to a larger instance with more memory to handle the data processing load.
151
-
* Decrease the batch size: Adjust the batch size in your Dataflow job configuration to send smaller chunks of data to ClickHouse, reducing memory consumption per batch.
152
-
These changes might help balance resource usage during data ingestion.
188
+
* Increase the instance resources: Upgrade your ClickHouse server to a larger instance with more memory to handle the data processing load.
189
+
* Decrease the batch size: Adjust the batch size in your Dataflow job configuration to send smaller chunks of data to ClickHouse, reducing memory consumption per batch. These changes can help balance resource usage during data ingestion.
0 commit comments