diff --git a/contribute/style-guide.md b/contribute/style-guide.md index 3d8fc8be8dc..37ccb071d90 100644 --- a/contribute/style-guide.md +++ b/contribute/style-guide.md @@ -340,7 +340,7 @@ When using URL parameters to control which version of documentation is displayed there are conventions to follow for reliable functionality. Here's how the `?v=v08` parameter relates to the snippet selection: -#### How It Works +#### How it works The URL parameter acts as a selector that matches against the `version` property in your component configuration. For example: diff --git a/docs/_snippets/_GCS_authentication_and_bucket.md b/docs/_snippets/_GCS_authentication_and_bucket.md index 6e3a45cd436..546666a8049 100644 --- a/docs/_snippets/_GCS_authentication_and_bucket.md +++ b/docs/_snippets/_GCS_authentication_and_bucket.md @@ -19,7 +19,7 @@ import Image from '@theme/IdealImage'; Creating a GCS bucket in US East 4 -### Generate an Access key {#generate-an-access-key} +### Generate an access key {#generate-an-access-key} ### Create a service account HMAC key and secret {#create-a-service-account-hmac-key-and-secret} diff --git a/docs/_snippets/_add_superset_detail.md b/docs/_snippets/_add_superset_detail.md index 9df8b5c920a..88e64ec9be1 100644 --- a/docs/_snippets/_add_superset_detail.md +++ b/docs/_snippets/_add_superset_detail.md @@ -13,7 +13,7 @@ There are a few tasks to be done before running `docker compose`: The commands below are to be run from the top level of the GitHub repo, `superset`. ::: -## Official ClickHouse Connect driver {#official-clickhouse-connect-driver} +## Official ClickHouse connect driver {#official-clickhouse-connect-driver} To make the ClickHouse Connect driver available in the Superset deployment add it to the local requirements file: diff --git a/docs/_snippets/_users-and-roles-common.md b/docs/_snippets/_users-and-roles-common.md index 0a78e88d7aa..29726229be9 100644 --- a/docs/_snippets/_users-and-roles-common.md +++ b/docs/_snippets/_users-and-roles-common.md @@ -269,7 +269,7 @@ Roles are used to define groups of users for certain privileges instead of manag Verify that only the above two rows are returned, rows with the value `B` in `column1` should be excluded. ::: -## Modifying Users and Roles {#modifying-users-and-roles} +## Modifying users and roles {#modifying-users-and-roles} Users can be assigned multiple roles for a combination of privileges needed. When using multiple roles, the system will combine the roles to determine privileges, the net effect will be that the role permissions will be cumulative. diff --git a/docs/about-us/beta-and-experimental-features.md b/docs/about-us/beta-and-experimental-features.md index 88ac43934e8..74361046da7 100644 --- a/docs/about-us/beta-and-experimental-features.md +++ b/docs/about-us/beta-and-experimental-features.md @@ -14,7 +14,7 @@ Due to the uncertainty of when features are classified as generally available, w The sections below explicitly describe the properties of **Beta** and **Experimental** features: -## Beta Features {#beta-features} +## Beta features {#beta-features} - Under active development to make them generally available (GA) - Main known issues can be tracked on GitHub @@ -26,7 +26,7 @@ The following features are considered Beta in ClickHouse Cloud and are available Note: please be sure to be using a current version of the ClickHouse [compatibility](/operations/settings/settings#compatibility) setting to be using a recently introduced feature. -## Experimental Features {#experimental-features} +## Experimental features {#experimental-features} - May never become GA - May be removed @@ -44,4 +44,110 @@ Please note: no additional experimental features are allowed to be enabled in Cl --> +## Beta settings {#beta-settings} + +| Name | Default | +|------|--------| +| [geotoh3_argument_order](/operations/settings/settings#geotoh3_argument_order) | `lat_lon` | +| [allow_experimental_parallel_reading_from_replicas](/operations/settings/settings#allow_experimental_parallel_reading_from_replicas) | `0` | +| [parallel_replicas_mode](/operations/settings/settings#parallel_replicas_mode) | `read_tasks` | +| [parallel_replicas_count](/operations/settings/settings#parallel_replicas_count) | `0` | +| [parallel_replica_offset](/operations/settings/settings#parallel_replica_offset) | `0` | +| [parallel_replicas_custom_key](/operations/settings/settings#parallel_replicas_custom_key) | `` | +| [parallel_replicas_custom_key_range_lower](/operations/settings/settings#parallel_replicas_custom_key_range_lower) | `0` | +| [parallel_replicas_custom_key_range_upper](/operations/settings/settings#parallel_replicas_custom_key_range_upper) | `0` | +| [cluster_for_parallel_replicas](/operations/settings/settings#cluster_for_parallel_replicas) | `` | +| [parallel_replicas_allow_in_with_subquery](/operations/settings/settings#parallel_replicas_allow_in_with_subquery) | `1` | +| [parallel_replicas_for_non_replicated_merge_tree](/operations/settings/settings#parallel_replicas_for_non_replicated_merge_tree) | `0` | +| [parallel_replicas_min_number_of_rows_per_replica](/operations/settings/settings#parallel_replicas_min_number_of_rows_per_replica) | `0` | +| [parallel_replicas_prefer_local_join](/operations/settings/settings#parallel_replicas_prefer_local_join) | `1` | +| [parallel_replicas_mark_segment_size](/operations/settings/settings#parallel_replicas_mark_segment_size) | `0` | +| [parallel_replicas_local_plan](/operations/settings/settings#parallel_replicas_local_plan) | `1` | +| [parallel_replicas_index_analysis_only_on_coordinator](/operations/settings/settings#parallel_replicas_index_analysis_only_on_coordinator) | `1` | +| [parallel_replicas_only_with_analyzer](/operations/settings/settings#parallel_replicas_only_with_analyzer) | `1` | +| [parallel_replicas_insert_select_local_pipeline](/operations/settings/settings#parallel_replicas_insert_select_local_pipeline) | `1` | +| [parallel_replicas_connect_timeout_ms](/operations/settings/settings#parallel_replicas_connect_timeout_ms) | `300` | +| [session_timezone](/operations/settings/settings#session_timezone) | `` | +| [low_priority_query_wait_time_ms](/operations/settings/settings#low_priority_query_wait_time_ms) | `1000` | +| [max_limit_for_vector_search_queries](/operations/settings/settings#max_limit_for_vector_search_queries) | `1000` | +| [hnsw_candidate_list_size_for_search](/operations/settings/settings#hnsw_candidate_list_size_for_search) | `256` | +| [vector_search_filter_strategy](/operations/settings/settings#vector_search_filter_strategy) | `auto` | +| [vector_search_postfilter_multiplier](/operations/settings/settings#vector_search_postfilter_multiplier) | `1` | +| [allow_experimental_delta_kernel_rs](/operations/settings/settings#allow_experimental_delta_kernel_rs) | `1` | +| [allow_remote_fs_zero_copy_replication](/operations/settings/merge-tree-settings#allow_remote_fs_zero_copy_replication) | `0` | + + +## Experimental settings {#experimental-settings} + +| Name | Default | +|------|--------| +| [allow_experimental_kafka_offsets_storage_in_keeper](/operations/settings/settings#allow_experimental_kafka_offsets_storage_in_keeper) | `0` | +| [allow_experimental_correlated_subqueries](/operations/settings/settings#allow_experimental_correlated_subqueries) | `0` | +| [allow_experimental_materialized_postgresql_table](/operations/settings/settings#allow_experimental_materialized_postgresql_table) | `0` | +| [allow_experimental_funnel_functions](/operations/settings/settings#allow_experimental_funnel_functions) | `0` | +| [allow_experimental_nlp_functions](/operations/settings/settings#allow_experimental_nlp_functions) | `0` | +| [allow_experimental_hash_functions](/operations/settings/settings#allow_experimental_hash_functions) | `0` | +| [allow_experimental_object_type](/operations/settings/settings#allow_experimental_object_type) | `0` | +| [allow_experimental_time_series_table](/operations/settings/settings#allow_experimental_time_series_table) | `0` | +| [allow_experimental_vector_similarity_index](/operations/settings/settings#allow_experimental_vector_similarity_index) | `0` | +| [allow_experimental_codecs](/operations/settings/settings#allow_experimental_codecs) | `0` | +| [throw_on_unsupported_query_inside_transaction](/operations/settings/settings#throw_on_unsupported_query_inside_transaction) | `1` | +| [wait_changes_become_visible_after_commit_mode](/operations/settings/settings#wait_changes_become_visible_after_commit_mode) | `wait_unknown` | +| [implicit_transaction](/operations/settings/settings#implicit_transaction) | `0` | +| [grace_hash_join_initial_buckets](/operations/settings/settings#grace_hash_join_initial_buckets) | `1` | +| [grace_hash_join_max_buckets](/operations/settings/settings#grace_hash_join_max_buckets) | `1024` | +| [join_to_sort_minimum_perkey_rows](/operations/settings/settings#join_to_sort_minimum_perkey_rows) | `40` | +| [join_to_sort_maximum_table_rows](/operations/settings/settings#join_to_sort_maximum_table_rows) | `10000` | +| [allow_experimental_join_right_table_sorting](/operations/settings/settings#allow_experimental_join_right_table_sorting) | `0` | +| [allow_statistics_optimize](/operations/settings/settings#allow_statistics_optimize) | `0` | +| [allow_experimental_statistics](/operations/settings/settings#allow_experimental_statistics) | `0` | +| [allow_experimental_inverted_index](/operations/settings/settings#allow_experimental_inverted_index) | `0` | +| [allow_experimental_full_text_index](/operations/settings/settings#allow_experimental_full_text_index) | `0` | +| [allow_experimental_lightweight_update](/operations/settings/settings#allow_experimental_lightweight_update) | `0` | +| [allow_experimental_join_condition](/operations/settings/settings#allow_experimental_join_condition) | `0` | +| [allow_experimental_live_view](/operations/settings/settings#allow_experimental_live_view) | `0` | +| [live_view_heartbeat_interval](/operations/settings/settings#live_view_heartbeat_interval) | `15` | +| [max_live_view_insert_blocks_before_refresh](/operations/settings/settings#max_live_view_insert_blocks_before_refresh) | `64` | +| [allow_experimental_window_view](/operations/settings/settings#allow_experimental_window_view) | `0` | +| [window_view_clean_interval](/operations/settings/settings#window_view_clean_interval) | `60` | +| [window_view_heartbeat_interval](/operations/settings/settings#window_view_heartbeat_interval) | `15` | +| [wait_for_window_view_fire_signal_timeout](/operations/settings/settings#wait_for_window_view_fire_signal_timeout) | `10` | +| [stop_refreshable_materialized_views_on_startup](/operations/settings/settings#stop_refreshable_materialized_views_on_startup) | `0` | +| [allow_experimental_database_materialized_postgresql](/operations/settings/settings#allow_experimental_database_materialized_postgresql) | `0` | +| [allow_experimental_query_deduplication](/operations/settings/settings#allow_experimental_query_deduplication) | `0` | +| [allow_experimental_database_iceberg](/operations/settings/settings#allow_experimental_database_iceberg) | `0` | +| [allow_experimental_database_unity_catalog](/operations/settings/settings#allow_experimental_database_unity_catalog) | `0` | +| [allow_experimental_database_glue_catalog](/operations/settings/settings#allow_experimental_database_glue_catalog) | `0` | +| [allow_experimental_database_hms_catalog](/operations/settings/settings#allow_experimental_database_hms_catalog) | `0` | +| [allow_experimental_kusto_dialect](/operations/settings/settings#allow_experimental_kusto_dialect) | `0` | +| [allow_experimental_prql_dialect](/operations/settings/settings#allow_experimental_prql_dialect) | `0` | +| [enable_adaptive_memory_spill_scheduler](/operations/settings/settings#enable_adaptive_memory_spill_scheduler) | `0` | +| [make_distributed_plan](/operations/settings/settings#make_distributed_plan) | `0` | +| [distributed_plan_execute_locally](/operations/settings/settings#distributed_plan_execute_locally) | `0` | +| [distributed_plan_default_shuffle_join_bucket_count](/operations/settings/settings#distributed_plan_default_shuffle_join_bucket_count) | `8` | +| [distributed_plan_default_reader_bucket_count](/operations/settings/settings#distributed_plan_default_reader_bucket_count) | `8` | +| [distributed_plan_force_exchange_kind](/operations/settings/settings#distributed_plan_force_exchange_kind) | `` | +| [allow_experimental_ts_to_grid_aggregate_function](/operations/settings/settings#allow_experimental_ts_to_grid_aggregate_function) | `0` | +| [allow_experimental_replacing_merge_with_cleanup](/operations/settings/merge-tree-settings#allow_experimental_replacing_merge_with_cleanup) | `0` | +| [allow_experimental_reverse_key](/operations/settings/merge-tree-settings#allow_experimental_reverse_key) | `0` | +| [enable_replacing_merge_with_cleanup_for_min_age_to_force_merge](/operations/settings/merge-tree-settings#enable_replacing_merge_with_cleanup_for_min_age_to_force_merge) | `0` | +| [force_read_through_cache_for_merges](/operations/settings/merge-tree-settings#force_read_through_cache_for_merges) | `0` | +| [merge_selector_algorithm](/operations/settings/merge-tree-settings#merge_selector_algorithm) | `Simple` | +| [notify_newest_block_number](/operations/settings/merge-tree-settings#notify_newest_block_number) | `0` | +| [part_moves_between_shards_delay_seconds](/operations/settings/merge-tree-settings#part_moves_between_shards_delay_seconds) | `30` | +| [part_moves_between_shards_enable](/operations/settings/merge-tree-settings#part_moves_between_shards_enable) | `0` | +| [remote_fs_zero_copy_path_compatible_mode](/operations/settings/merge-tree-settings#remote_fs_zero_copy_path_compatible_mode) | `0` | +| [remote_fs_zero_copy_zookeeper_path](/operations/settings/merge-tree-settings#remote_fs_zero_copy_zookeeper_path) | `/clickhouse/zero_copy` | +| [remove_rolled_back_parts_immediately](/operations/settings/merge-tree-settings#remove_rolled_back_parts_immediately) | `1` | +| [shared_merge_tree_enable_coordinated_merges](/operations/settings/merge-tree-settings#shared_merge_tree_enable_coordinated_merges) | `0` | +| [shared_merge_tree_enable_keeper_parts_extra_data](/operations/settings/merge-tree-settings#shared_merge_tree_enable_keeper_parts_extra_data) | `0` | +| [shared_merge_tree_merge_coordinator_election_check_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_election_check_period_ms) | `30000` | +| [shared_merge_tree_merge_coordinator_factor](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_factor) | `2` | +| [shared_merge_tree_merge_coordinator_fetch_fresh_metadata_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_fetch_fresh_metadata_period_ms) | `10000` | +| [shared_merge_tree_merge_coordinator_max_merge_request_size](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_max_merge_request_size) | `20` | +| [shared_merge_tree_merge_coordinator_max_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_max_period_ms) | `10000` | +| [shared_merge_tree_merge_coordinator_merges_prepare_count](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_merges_prepare_count) | `100` | +| [shared_merge_tree_merge_coordinator_min_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_min_period_ms) | `1` | +| [shared_merge_tree_merge_worker_fast_timeout_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_worker_fast_timeout_ms) | `100` | +| [shared_merge_tree_merge_worker_regular_timeout_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_worker_regular_timeout_ms) | `10000` | diff --git a/docs/about-us/cloud.md b/docs/about-us/cloud.md index f495f79b8bb..9452ce55f74 100644 --- a/docs/about-us/cloud.md +++ b/docs/about-us/cloud.md @@ -11,7 +11,7 @@ title: 'ClickHouse Cloud' ClickHouse Cloud is the cloud offering created by the original creators of the popular open-source OLAP database ClickHouse. You can experience ClickHouse Cloud by [starting a free trial](https://console.clickhouse.cloud/signUp). -### ClickHouse Cloud benefits: {#clickhouse-cloud-benefits} +### ClickHouse Cloud benefits {#clickhouse-cloud-benefits} Some of the benefits of using ClickHouse Cloud are described below: diff --git a/docs/about-us/distinctive-features.md b/docs/about-us/distinctive-features.md index 77b13be8702..75e4413cb47 100644 --- a/docs/about-us/distinctive-features.md +++ b/docs/about-us/distinctive-features.md @@ -7,9 +7,9 @@ title: 'Distinctive Features of ClickHouse' keywords: ['compression', 'secondary-indexes','column-oriented'] --- -# Distinctive Features of ClickHouse +# Distinctive features of ClickHouse -## True Column-Oriented Database Management System {#true-column-oriented-database-management-system} +## True column-oriented database management system {#true-column-oriented-database-management-system} In a real column-oriented DBMS, no extra data is stored with the values. This means that constant-length values must be supported to avoid storing their length "number" next to the values. For example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any "garbage") even when uncompressed since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data. @@ -17,29 +17,29 @@ This is in contrast to systems that can store values of different columns separa Finally, ClickHouse is a database management system, not a single database. It allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server. -## Data Compression {#data-compression} +## Data compression {#data-compression} Some column-oriented DBMSs do not use data compression. However, data compression plays a key role in achieving excellent performance. In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones. -## Disk Storage of Data {#disk-storage-of-data} +## Disk storage of data {#disk-storage-of-data} Keeping data physically sorted by primary key makes it possible to extract data based on specific values or value ranges with low latency in less than a few dozen milliseconds. Some column-oriented DBMSs, such as SAP HANA and Google PowerDrill, can only work in RAM. This approach requires allocation of a larger hardware budget than necessary for real-time analysis. ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available. -## Parallel Processing on Multiple Cores {#parallel-processing-on-multiple-cores} +## Parallel processing on multiple cores {#parallel-processing-on-multiple-cores} Large queries are parallelized naturally, taking all the necessary resources available on the current server. -## Distributed Processing on Multiple Servers {#distributed-processing-on-multiple-servers} +## Distributed processing on multiple servers {#distributed-processing-on-multiple-servers} Almost none of the columnar DBMSs mentioned above have support for distributed query processing. In ClickHouse, data can reside on different shards. Each shard can be a group of replicas used for fault tolerance. All shards are used to run a query in parallel, transparently for the user. -## SQL Support {#sql-support} +## SQL support {#sql-support} ClickHouse supports [SQL language](/sql-reference/) that is mostly compatible with the ANSI SQL standard. @@ -47,29 +47,29 @@ Supported queries include [GROUP BY](../sql-reference/statements/select/group-by Correlated (dependent) subqueries are not supported at the time of writing but might become available in the future. -## Vector Computation Engine {#vector-engine} +## Vector computation engine {#vector-engine} Data is not only stored by columns but is processed by vectors (parts of columns), which allows achieving high CPU efficiency. -## Real-Time Data Inserts {#real-time-data-updates} +## Real-time data inserts {#real-time-data-updates} ClickHouse supports tables with a primary key. To quickly perform queries on the range of the primary key, the data is sorted incrementally using the merge tree. Due to this, data can continually be added to the table. No locks are taken when new data is ingested. -## Primary Indexes {#primary-index} +## Primary indexes {#primary-index} Having data physically sorted by primary key makes it possible to extract data based on specific values or value ranges with low latency in less than a few dozen milliseconds. -## Secondary Indexes {#secondary-indexes} +## Secondary indexes {#secondary-indexes} Unlike other database management systems, secondary indexes in ClickHouse do not point to specific rows or row ranges. Instead, they allow the database to know in advance that all rows in some data parts would not match the query filtering conditions and do not read them at all, thus they are called [data skipping indexes](../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-data_skipping-indexes). -## Suitable for Online Queries {#suitable-for-online-queries} +## Suitable for online queries {#suitable-for-online-queries} Most OLAP database management systems do not aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more time, which forces systems to prepare reports offline (in advance or by responding with "come back later"). In ClickHouse "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment as the user interface page is loading. In other words, online. -## Support for Approximated Calculations {#support-for-approximated-calculations} +## Support for approximated calculations {#support-for-approximated-calculations} ClickHouse provides various ways to trade accuracy for performance: @@ -77,11 +77,11 @@ ClickHouse provides various ways to trade accuracy for performance: 2. Running a query based on a part ([SAMPLE](../sql-reference/statements/select/sample.md)) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk. 3. Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources. -## Adaptive Join Algorithm {#adaptive-join-algorithm} +## Adaptive join algorithm {#adaptive-join-algorithm} ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there's more than one large table. -## Data Replication and Data Integrity Support {#data-replication-and-data-integrity-support} +## Data replication and data integrity support {#data-replication-and-data-integrity-support} ClickHouse uses asynchronous multi-master replication. After being written to any available replica, all the remaining replicas retrieve their copy in the background. The system maintains identical data on different replicas. Recovery after most failures is performed automatically, or semi-automatically in complex cases. @@ -91,7 +91,7 @@ For more information, see the section [Data replication](../engines/table-engine ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](/guides/sre/user-management/index.md) similar to what can be found in ANSI SQL standard and popular relational database management systems. -## Features that Can Be Considered Disadvantages {#clickhouse-features-that-can-be-considered-disadvantages} +## Features that can be considered disadvantages {#clickhouse-features-that-can-be-considered-disadvantages} 1. No full-fledged transactions. 2. Lack of ability to modify or delete already inserted data with a high rate and low latency. There are batch deletes and updates available to clean up or modify data, for example, to comply with [GDPR](https://gdpr-info.eu). diff --git a/docs/about-us/history.md b/docs/about-us/history.md index 3546af768e7..9a888930af6 100644 --- a/docs/about-us/history.md +++ b/docs/about-us/history.md @@ -7,7 +7,7 @@ keywords: ['history','development','Metrica'] title: 'ClickHouse History' --- -# ClickHouse History {#clickhouse-history} +# ClickHouse history {#clickhouse-history} ClickHouse was initially developed to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be its core component. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article briefly covers the goals of ClickHouse in the early stages of its development. @@ -15,7 +15,7 @@ Yandex.Metrica builds customized reports on the fly based on hits and sessions, As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily. All these events needed to be stored, in order to build custom reports. A single query may have required scanning millions of rows within a few hundred milliseconds, or hundreds of millions of rows in just a few seconds. -## Usage in Yandex.Metrica and Other Yandex Services {#usage-in-yandex-metrica-and-other-yandex-services} +## Usage in Yandex.Metrica and other Yandex services {#usage-in-yandex-metrica-and-other-yandex-services} ClickHouse serves multiple purposes in Yandex.Metrica. Its main task is to build reports in online mode using non-aggregated data. It uses a cluster of 374 servers, which store over 20.3 trillion rows in the database. The volume of compressed data is about 2 PB, without accounting for duplicates and replicas. The volume of uncompressed data (in TSV format) would be approximately 17 PB. @@ -30,7 +30,7 @@ ClickHouse also plays a key role in the following processes: Nowadays, there are a multiple dozen ClickHouse installations in other Yandex services and departments: search verticals, e-commerce, advertisement, business analytics, mobile development, personal services, and others. -## Aggregated and Non-aggregated Data {#aggregated-and-non-aggregated-data} +## Aggregated and non-aggregated data {#aggregated-and-non-aggregated-data} There is a widespread opinion that to calculate statistics effectively, you must aggregate data since this reduces the volume of data. diff --git a/docs/about-us/index.md b/docs/about-us/index.md index 6577c1464ab..d61d835efd8 100644 --- a/docs/about-us/index.md +++ b/docs/about-us/index.md @@ -9,11 +9,11 @@ description: 'Landing page for About ClickHouse' In this section of the docs you'll find information about ClickHouse. Refer to the table of contents below for a list of pages in this section of the docs. -| Page | Description | -|------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [What is ClickHouse](/about-clickhouse) | Introduces ClickHouse's core features, architecture, and uses, providing a concise overview for new users. | -| [Adopters](/about-us/adopters) | A list of companies using ClickHouse and their success stories, assembled from public sources | -| [Support](/about-us/support) | An introduction to ClickHouse Cloud Support Services and their mission. | -| [Beta Features and Experimental](/beta-and-experimental-features) | Learn about how ClickHouse uses "Beta" and "Experimental" labels to distinguish between officially supported and early-stage, unsupported features due to varied development speeds from community contributions. | -| [Cloud Service](/about-us/cloud) | Discover ClickHouse Cloud - a fully managed service that allows users to spin up open-source ClickHouse databases and offers benefits like fast time to value, seamless scaling, and serverless operations. | -| [ClickHouse History](/about-us/history) | Learn more about the history of ClickHouse. | +| Page | Description | +|----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [What is ClickHouse](/about-clickhouse) | Introduces ClickHouse's core features, architecture, and uses, providing a concise overview for new users. | +| [Adopters](/about-us/adopters) | A list of companies using ClickHouse and their success stories, assembled from public sources | +| [Support](/about-us/support) | An introduction to ClickHouse Cloud support services and their mission. | +| [Beta features and experimental features](/beta-and-experimental-features) | Learn about how ClickHouse uses "Beta" and "Experimental" labels to distinguish between officially supported and early-stage, unsupported features due to varied development speeds from community contributions. | +| [Cloud service](/about-us/cloud) | Discover ClickHouse Cloud - a fully managed service that allows users to spin up open-source ClickHouse databases and offers benefits like fast time to value, seamless scaling, and serverless operations. | +| [ClickHouse history](/about-us/history) | Learn more about the history of ClickHouse. | diff --git a/docs/about-us/support.md b/docs/about-us/support.md index 2b2b4225e1f..fb02f17832c 100644 --- a/docs/about-us/support.md +++ b/docs/about-us/support.md @@ -1,12 +1,12 @@ --- slug: /about-us/support sidebar_label: 'Support' -title: 'ClickHouse Cloud Support Services' +title: 'ClickHouse Cloud support services' sidebar_position: 30 description: 'Information on ClickHouse Cloud support services' --- -# ClickHouse Cloud Support Services +# ClickHouse Cloud support services ClickHouse provides Support Services for our ClickHouse Cloud users and customers. Our objective is a Support Services team that represents the ClickHouse product – unparalleled performance, ease of use, and exceptionally fast, high-quality results. For details, [visit our ClickHouse Support Program](https://clickhouse.com/support/program/) page. diff --git a/docs/architecture/cluster-deployment.md b/docs/architecture/cluster-deployment.md index af00312b8ac..4b7a2f81068 100644 --- a/docs/architecture/cluster-deployment.md +++ b/docs/architecture/cluster-deployment.md @@ -1,8 +1,8 @@ --- slug: /architecture/cluster-deployment -sidebar_label: 'Cluster Deployment' +sidebar_label: 'Cluster deployment' sidebar_position: 100 -title: 'Cluster Deployment' +title: 'Cluster deployment' description: 'By going through this tutorial, you will learn how to set up a simple ClickHouse cluster.' --- @@ -10,7 +10,7 @@ This tutorial assumes you've already set up a [local ClickHouse server](../getti By going through this tutorial, you'll learn how to set up a simple ClickHouse cluster. It'll be small, but fault-tolerant and scalable. Then we will use one of the example datasets to fill it with data and execute some demo queries. -## Cluster Deployment {#cluster-deployment} +## Cluster deployment {#cluster-deployment} This ClickHouse cluster will be a homogeneous cluster. Here are the steps: diff --git a/docs/best-practices/_snippets/_async_inserts.md b/docs/best-practices/_snippets/_async_inserts.md index 02e12fb39ee..186e699c890 100644 --- a/docs/best-practices/_snippets/_async_inserts.md +++ b/docs/best-practices/_snippets/_async_inserts.md @@ -21,7 +21,7 @@ When enabled (1), inserts are buffered and only written to disk once one of the This batching process is invisible to clients and helps ClickHouse efficiently merge insert traffic from multiple sources. However, until a flush occurs, the data cannot be queried. Importantly, there are multiple buffers per insert shape and settings combination, and in clusters, buffers are maintained per node - enabling fine-grained control across multi-tenant environments. Insert mechanics are otherwise identical to those described for [synchronous inserts](/best-practices/selecting-an-insert-strategy#synchronous-inserts-by-default). -### Choosing a Return Mode {#choosing-a-return-mode} +### Choosing a return mode {#choosing-a-return-mode} The behavior of asynchronous inserts is further refined using the [`wait_for_async_insert`](/operations/settings/settings#wait_for_async_insert) setting. diff --git a/docs/best-practices/_snippets/_avoid_optimize_final.md b/docs/best-practices/_snippets/_avoid_optimize_final.md index 262f1f8f7d5..9cc119a9bd9 100644 --- a/docs/best-practices/_snippets/_avoid_optimize_final.md +++ b/docs/best-practices/_snippets/_avoid_optimize_final.md @@ -18,7 +18,7 @@ OPTIMIZE TABLE FINAL; **you should avoid this operation in most cases** as it initiates resource intensive operations which may impact cluster performance. -## Why Avoid? {#why-avoid} +## Why avoid? {#why-avoid} ### It's expensive {#its-expensive} diff --git a/docs/best-practices/index.md b/docs/best-practices/index.md index 4f672250e63..5a3ae78ab5f 100644 --- a/docs/best-practices/index.md +++ b/docs/best-practices/index.md @@ -1,6 +1,6 @@ --- slug: /best-practices -keywords: ['Cloud', 'Primary key', 'Ordering key', 'Materialized Views', 'Best Practices', 'Bulk Inserts', 'Asynchronous Inserts', 'Avoid Mutations', 'Avoid Nullable Columns', 'Avoid Optimize Final', 'Partitioning Key'] +keywords: ['Cloud', 'Primary key', 'Ordering key', 'Materialized Views', 'Best Practices', 'Bulk Inserts', 'Asynchronous Inserts', 'Avoid Mutations', 'Avoid nullable Columns', 'Avoid Optimize Final', 'Partitioning Key'] title: 'Overview' hide_title: true description: 'Landing page for Best Practices section in ClickHouse' diff --git a/docs/best-practices/select_data_type.md b/docs/best-practices/select_data_type.md index d9191294453..fa55d0a8a45 100644 --- a/docs/best-practices/select_data_type.md +++ b/docs/best-practices/select_data_type.md @@ -18,7 +18,7 @@ Some straightforward guidelines can significantly enhance the schema: * **Use Strict Types:** Always select the correct data type for columns. Numeric and date fields should use appropriate numeric and date types rather than general-purpose String types. This ensures correct semantics for filtering and aggregations. -* **Avoid Nullable Columns:** Nullable columns introduce additional overhead by maintaining separate columns for tracking null values. Only use Nullable if explicitly required to distinguish between empty and null states. Otherwise, default or zero-equivalent values typically suffice. For further information on why this type should be avoided unless needed, see [Avoid Nullable Columns](/best-practices/select-data-types#avoid-nullable-columns). +* **Avoid nullable Columns:** Nullable columns introduce additional overhead by maintaining separate columns for tracking null values. Only use Nullable if explicitly required to distinguish between empty and null states. Otherwise, default or zero-equivalent values typically suffice. For further information on why this type should be avoided unless needed, see [Avoid nullable Columns](/best-practices/select-data-types#avoid-nullable-columns). * **Minimize Numeric Precision:** Select numeric types with minimal bit-width that still accommodate the expected data range. For instance, prefer [UInt16 over Int32](/sql-reference/data-types/int-uint) if negative values aren't needed, and the range fits within 0–65535. @@ -136,6 +136,6 @@ ENGINE = MergeTree ORDER BY tuple() ``` -## Avoid Nullable columns {#avoid-nullable-columns} +## Avoid nullable columns {#avoid-nullable-columns} diff --git a/docs/best-practices/selecting_an_insert_strategy.md b/docs/best-practices/selecting_an_insert_strategy.md index e4dfebf8ae7..f9ffca9edf5 100644 --- a/docs/best-practices/selecting_an_insert_strategy.md +++ b/docs/best-practices/selecting_an_insert_strategy.md @@ -130,7 +130,7 @@ When data arrives pre-sorted, ClickHouse can skip or simplify the internal sorti -## Choose an interface - HTTP or Native {#choose-an-interface} +## Choose an interface - HTTP or native {#choose-an-interface} ### Native {#choose-an-interface-native} diff --git a/docs/best-practices/sizing-and-hardware-recommendations.md b/docs/best-practices/sizing-and-hardware-recommendations.md index 6ef842bb49e..6b42578a159 100644 --- a/docs/best-practices/sizing-and-hardware-recommendations.md +++ b/docs/best-practices/sizing-and-hardware-recommendations.md @@ -1,12 +1,12 @@ --- slug: /guides/sizing-and-hardware-recommendations -sidebar_label: 'Sizing and Hardware Recommendations' +sidebar_label: 'Sizing and hardware recommendations' sidebar_position: 4 -title: 'Sizing and Hardware Recommendations' +title: 'Sizing and hardware recommendations' description: 'This guide discusses our general recommendations regarding hardware, compute, memory, and disk configurations for open-source users.' --- -# Sizing and Hardware Recommendations +# Sizing and hardware recommendations' This guide discusses our general recommendations regarding hardware, compute, memory, and disk configurations for open-source users. If you would like to simplify your setup, we recommend using [ClickHouse Cloud](https://clickhouse.com/cloud) as it automatically scales and adapts to your workloads while minimizing costs pertaining to infrastructure management. diff --git a/docs/best-practices/use_materialized_views.md b/docs/best-practices/use_materialized_views.md index 18737240b4c..ed49b160a94 100644 --- a/docs/best-practices/use_materialized_views.md +++ b/docs/best-practices/use_materialized_views.md @@ -28,7 +28,7 @@ ClickHouse supports two types of materialized views: [**incremental**](/material The choice between incremental and refreshable materialized views depends largely on the nature of the query, how frequently data changes, and whether updates to the view must reflect every row as it is inserted, or if a periodic refresh is acceptable. Understanding these trade-offs is key to designing performant, scalable materialized views in ClickHouse. -## When to Use Incremental Materialized Views {#when-to-use-incremental-materialized-views} +## When to use incremental materialized views {#when-to-use-incremental-materialized-views} Incremental materialized views are generally preferred, as they update automatically in real-time whenever the source tables receive new data. They support all aggregation functions and are particularly effective for aggregations over a single table. By computing results incrementally at insert-time, queries run against significantly smaller data subsets, allowing these views to scale effortlessly even to petabytes of data. In most cases they will have no appreciable impact on overall cluster performance. @@ -40,7 +40,7 @@ Use incremental materialized views when: For examples of incremental materialized views see [here](/materialized-view/incremental-materialized-view). -## When to Use Refreshable Materialized Views {#when-to-use-refreshable-materialized-views} +## When to use refreshable materialized views {#when-to-use-refreshable-materialized-views} Refreshable materialized views execute their queries periodically rather than incrementally, storing the query result set for rapid retrieval. @@ -60,7 +60,7 @@ In summary, use refreshable materialized views when: For examples of refreshable materialized views see [here](/materialized-view/refreshable-materialized-view). -### APPEND vs REPLACE Mode {#append-vs-replace-mode} +### APPEND vs REPLACE mode {#append-vs-replace-mode} Refreshable materialized views support two modes for writing data to the target table: `APPEND` and `REPLACE`. These modes define how the result of the view's query is written when the view is refreshed. diff --git a/docs/chdb/install/bun.md b/docs/chdb/install/bun.md index 6f421e7fec7..f8e9e134621 100644 --- a/docs/chdb/install/bun.md +++ b/docs/chdb/install/bun.md @@ -36,7 +36,9 @@ var result = query("SELECT version()", "CSV"); console.log(result); // 23.10.1.1 ``` + ### Session.Query(query, *format) {#sessionqueryquery-format} + ```javascript import { Session } from 'chdb-bun'; diff --git a/docs/chdb/install/c.md b/docs/chdb/install/c.md index cc0a019b24a..9ff218aab17 100644 --- a/docs/chdb/install/c.md +++ b/docs/chdb/install/c.md @@ -6,7 +6,9 @@ description: 'How to install chDB for C and C++' keywords: ['chdb', 'embedded', 'clickhouse-lite', 'install'] --- + # Installing chDB for C and C++ + ## Requirements {#requirements} diff --git a/docs/chdb/install/python.md b/docs/chdb/install/python.md index a62b2cdb9d1..446251f7c2c 100644 --- a/docs/chdb/install/python.md +++ b/docs/chdb/install/python.md @@ -73,7 +73,7 @@ print(f"SQL read {res.rows_read()} rows, {res.bytes_read()} bytes, elapsed {res. chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe') ``` -### Query On Table (Pandas DataFrame, Parquet file/bytes, Arrow bytes) {#query-on-table-pandas-dataframe-parquet-filebytes-arrow-bytes} +### Query on table (Pandas DataFrame, Parquet file/bytes, Arrow bytes) {#query-on-table-pandas-dataframe-parquet-filebytes-arrow-bytes} **Query On Pandas DataFrame** @@ -90,7 +90,7 @@ print(ret_tbl) print(ret_tbl.query('select b, sum(a) from __table__ group by b')) ``` -### Query with Stateful Session {#query-with-stateful-session} +### Query with stateful session {#query-with-stateful-session} Sessions will keep the state of query. All DDL and DML state will be kept in a directory. Directory path can be passed in as an argument. If it is not passed, a temporary directory will be created. @@ -169,7 +169,7 @@ Some notes on the chDB Python UDF (User Defined Function) decorator. see also: [test_udf.py](https://github.com/chdb-io/chdb/blob/main/tests/test_udf.py). -### Python Table Engine {#python-table-engine} +### Python table engine {#python-table-engine} ### Query on Pandas DataFrame {#query-on-pandas-dataframe} diff --git a/docs/cloud/bestpractices/index.md b/docs/cloud/bestpractices/index.md index c1d00a2b712..965e290d0d3 100644 --- a/docs/cloud/bestpractices/index.md +++ b/docs/cloud/bestpractices/index.md @@ -1,6 +1,6 @@ --- slug: /cloud/bestpractices -keywords: ['Cloud', 'Best Practices', 'Bulk Inserts', 'Asynchronous Inserts', 'Avoid Mutations', 'Avoid Nullable Columns', 'Avoid Optimize Final', 'Low Cardinality Partitioning Key', 'Multi Tenancy', 'Usage Limits'] +keywords: ['Cloud', 'Best Practices', 'Bulk Inserts', 'Asynchronous Inserts', 'Avoid mutations', 'Avoid nullable columns', 'Avoid Optimize Final', 'Low Cardinality Partitioning Key', 'Multi Tenancy', 'Usage Limits'] title: 'Overview' hide_title: true description: 'Landing page for Best Practices section in ClickHouse Cloud' diff --git a/docs/cloud/bestpractices/multitenancy.md b/docs/cloud/bestpractices/multitenancy.md index c0a82c725c4..9f47dc580d1 100644 --- a/docs/cloud/bestpractices/multitenancy.md +++ b/docs/cloud/bestpractices/multitenancy.md @@ -305,7 +305,7 @@ User management is similar to the approaches described previously, since all ser Note the number of child services in a warehouse is limited to a small number. See [Warehouse limitations](/cloud/reference/warehouses#limitations). -## Separate Cloud service {#separate-service} +## Separate cloud service {#separate-service} The most radical approach is to use a different ClickHouse service per tenant. diff --git a/docs/cloud/changelogs/changelog-24-10.md b/docs/cloud/changelogs/changelog-24-10.md index e75eec40655..4cf3c0549b9 100644 --- a/docs/cloud/changelogs/changelog-24-10.md +++ b/docs/cloud/changelogs/changelog-24-10.md @@ -8,7 +8,7 @@ sidebar_label: 'v24.10' Relevant changes for ClickHouse Cloud services based on the v24.10 release. -## Backward Incompatible Change {#backward-incompatible-change} +## Backward incompatible change {#backward-incompatible-change} - Allow to write `SETTINGS` before `FORMAT` in a chain of queries with `UNION` when subqueries are inside parentheses. This closes [#39712](https://github.com/ClickHouse/ClickHouse/issues/39712). Change the behavior when a query has the SETTINGS clause specified twice in a sequence. The closest SETTINGS clause will have a preference for the corresponding subquery. In the previous versions, the outermost SETTINGS clause could take a preference over the inner one. [#60197](https://github.com/ClickHouse/ClickHouse/pull/60197)[#68614](https://github.com/ClickHouse/ClickHouse/pull/68614) ([Alexey Milovidov](https://github.com/alexey-milovidov)). - Reimplement Dynamic type. Now when the limit of dynamic data types is reached new types are not cast to String but stored in a special data structure in binary format with binary encoded data type. Now any type ever inserted into Dynamic column can be read from it as subcolumn. [#68132](https://github.com/ClickHouse/ClickHouse/pull/68132) ([Pavel Kruglov](https://github.com/Avogar)). - Expressions like `a[b].c` are supported for named tuples, as well as named subscripts from arbitrary expressions, e.g., `expr().name`. This is useful for processing JSON. This closes [#54965](https://github.com/ClickHouse/ClickHouse/issues/54965). In previous versions, an expression of form `expr().name` was parsed as `tupleElement(expr(), name)`, and the query analyzer was searching for a column `name` rather than for the corresponding tuple element; while in the new version, it is changed to `tupleElement(expr(), 'name')`. In most cases, the previous version was not working, but it is possible to imagine a very unusual scenario when this change could lead to incompatibility: if you stored names of tuple elements in a column or an alias, that was named differently than the tuple element's name: `SELECT 'b' AS a, CAST([tuple(123)] AS 'Array(Tuple(b UInt8))') AS t, t[1].a`. It is very unlikely that you used such queries, but we still have to mark this change as potentially backward incompatible. [#68435](https://github.com/ClickHouse/ClickHouse/pull/68435) ([Alexey Milovidov](https://github.com/alexey-milovidov)). @@ -17,7 +17,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.10 release. - Fix `optimize_functions_to_subcolumns` optimization (previously could lead to `Invalid column type for ColumnUnique::insertRangeFrom. Expected String, got LowCardinality(String)` error), by preserving `LowCardinality` type in `mapKeys`/`mapValues`. [#70716](https://github.com/ClickHouse/ClickHouse/pull/70716) ([Azat Khuzhin](https://github.com/azat)). -## New Feature {#new-feature} +## New feature {#new-feature} - Refreshable materialized views are production ready. [#70550](https://github.com/ClickHouse/ClickHouse/pull/70550) ([Michael Kolupaev](https://github.com/al13n321)). Refreshable materialized views are now supported in Replicated databases. [#60669](https://github.com/ClickHouse/ClickHouse/pull/60669) ([Michael Kolupaev](https://github.com/al13n321)). - Function `toStartOfInterval()` now has a new overload which emulates TimescaleDB's `time_bucket()` function, respectively PostgreSQL's `date_bin()` function. ([#55619](https://github.com/ClickHouse/ClickHouse/issues/55619)). It allows to align date or timestamp values to multiples of a given interval from an *arbitrary* origin (instead of 0000-01-01 00:00:00.000 as *fixed* origin). For example, `SELECT toStartOfInterval(toDateTime('2023-01-01 14:45:00'), INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'));` returns `2023-01-01 14:44:30` which is a multiple of 1 minute intervals, starting from origin `2023-01-01 14:35:30`. [#56738](https://github.com/ClickHouse/ClickHouse/pull/56738) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). - MongoDB integration refactored: migration to new driver mongocxx from deprecated Poco::MongoDB, remove support for deprecated old protocol, support for connection by URI, support for all MongoDB types, support for WHERE and ORDER BY statements on MongoDB side, restriction for expression unsupported by MongoDB. [#63279](https://github.com/ClickHouse/ClickHouse/pull/63279) ([Kirill Nikiforov](https://github.com/allmazz)). diff --git a/docs/cloud/changelogs/changelog-24-12.md b/docs/cloud/changelogs/changelog-24-12.md index feb1b2fe93a..c9e9e18bec8 100644 --- a/docs/cloud/changelogs/changelog-24-12.md +++ b/docs/cloud/changelogs/changelog-24-12.md @@ -8,7 +8,7 @@ sidebar_label: 'v24.12' Relevant changes for ClickHouse Cloud services based on the v24.12 release. -## Backward Incompatible Changes {#backward-incompatible-changes} +## Backward incompatible changes {#backward-incompatible-changes} - Functions `greatest` and `least` now ignore NULL input values, whereas they previously returned NULL if one of the arguments was NULL. For example, `SELECT greatest(1, 2, NULL)` now returns 2. This makes the behavior compatible with PostgreSQL. [#65519](https://github.com/ClickHouse/ClickHouse/pull/65519) ([kevinyhzou](https://github.com/KevinyhZou)). - Don't allow Variant/Dynamic types in ORDER BY/GROUP BY/PARTITION BY/PRIMARY KEY by default because it may lead to unexpected results. [#69731](https://github.com/ClickHouse/ClickHouse/pull/69731) ([Pavel Kruglov](https://github.com/Avogar)). @@ -23,7 +23,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.12 release. - Remove support for `Enum` as well as `UInt128` and `UInt256` arguments in `deltaSumTimestamp`. Remove support for `Int8`, `UInt8`, `Int16`, and `UInt16` of the second ("timestamp") argument of `deltaSumTimestamp`. [#71790](https://github.com/ClickHouse/ClickHouse/pull/71790) ([Alexey Milovidov](https://github.com/alexey-milovidov)). - Added source query validation when ClickHouse is used as a source for a dictionary. [#72548](https://github.com/ClickHouse/ClickHouse/pull/72548) ([Alexey Katsman](https://github.com/alexkats)). -## New Features {#new-features} +## New features {#new-features} - Implement SYSTEM LOAD PRIMARY KEY command to load primary indexes for all parts of a specified table or for all tables if no table is specified. This will be useful for benchmarks and to prevent extra latency during query execution. [#66252](https://github.com/ClickHouse/ClickHouse/pull/66252) ([ZAWA_ll](https://github.com/Zawa-ll)). - Added statement `SYSTEM LOAD PRIMARY KEY` for loading the primary indexes of all parts in a specified table or for all tables if no table is specified. This can be useful for benchmarking and to prevent extra latency during query execution. [#67733](https://github.com/ClickHouse/ClickHouse/pull/67733) ([ZAWA_ll](https://github.com/Zawa-ll)). diff --git a/docs/cloud/changelogs/changelog-24-5.md b/docs/cloud/changelogs/changelog-24-5.md index 256c6e4c3be..a2b2158f702 100644 --- a/docs/cloud/changelogs/changelog-24-5.md +++ b/docs/cloud/changelogs/changelog-24-5.md @@ -6,11 +6,11 @@ keywords: ['changelog', 'cloud'] sidebar_label: 'v24.5' --- -# v24.5 Changelog for Cloud +# V24.5 changelog for Cloud Relevant changes for ClickHouse Cloud services based on the v24.5 release. -## Breaking Changes {#breaking-changes} +## Breaking changes {#breaking-changes} * Change the column name from duration_ms to duration_microseconds in the system.zookeeper table to reflect the reality that the duration is in the microsecond resolution. [#60774](https://github.com/ClickHouse/ClickHouse/pull/60774) (Duc Canh Le). @@ -21,7 +21,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.5 release. * Usage of functions neighbor, runningAccumulate, runningDifferenceStartingWithFirstValue, runningDifference deprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set allow_deprecated_error_prone_window_functions=1. [#63132](https://github.com/ClickHouse/ClickHouse/pull/63132) (Nikita Taranov). -## Backward Incompatible Changes {#backward-incompatible-changes} +## Backward incompatible changes {#backward-incompatible-changes} * In the new ClickHouse version, the functions geoDistance, greatCircleDistance, and greatCircleAngle will use 64-bit double precision floating point data type for internal calculations and return type if all the arguments are Float64. This closes #58476. In previous versions, the function always used Float32. You can switch to the old behavior by setting geo_distance_returns_float64_on_float64_arguments to false or setting compatibility to 24.2 or earlier. [#61848](https://github.com/ClickHouse/ClickHouse/pull/61848) (Alexey Milovidov). @@ -29,7 +29,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.5 release. * Fix crash in largestTriangleThreeBuckets. This changes the behaviour of this function and makes it to ignore NaNs in the series provided. Thus the resultset might differ from previous versions. [#62646](https://github.com/ClickHouse/ClickHouse/pull/62646) (Raúl Marín). -## New Features {#new-features} +## New features {#new-features} * The new analyzer is enabled by default on new services. diff --git a/docs/cloud/changelogs/changelog-24-6.md b/docs/cloud/changelogs/changelog-24-6.md index 3dc8d747ea5..e15aad39748 100644 --- a/docs/cloud/changelogs/changelog-24-6.md +++ b/docs/cloud/changelogs/changelog-24-6.md @@ -6,15 +6,15 @@ keywords: ['changelog', 'cloud'] sidebar_label: 'v24.6' --- -# v24.6 Changelog for Cloud +# V24.6 changelog for Cloud Relevant changes for ClickHouse Cloud services based on the v24.6 release. -## Backward Incompatible Change {#backward-incompatible-change} +## Backward incompatible change {#backward-incompatible-change} * Rework parallel processing in `Ordered` mode of storage `S3Queue`. This PR is backward incompatible for Ordered mode if you used settings `s3queue_processing_threads_num` or `s3queue_total_shards_num`. Setting `s3queue_total_shards_num` is deleted, previously it was allowed to use only under `s3queue_allow_experimental_sharded_mode`, which is now deprecated. A new setting is added - `s3queue_buckets`. [#64349](https://github.com/ClickHouse/ClickHouse/pull/64349) ([Kseniia Sumarokova](https://github.com/kssenii)). * New functions `snowflakeIDToDateTime`, `snowflakeIDToDateTime64`, `dateTimeToSnowflakeID`, and `dateTime64ToSnowflakeID` were added. Unlike the existing functions `snowflakeToDateTime`, `snowflakeToDateTime64`, `dateTimeToSnowflake`, and `dateTime64ToSnowflake`, the new functions are compatible with function `generateSnowflakeID`, i.e. they accept the snowflake IDs generated by `generateSnowflakeID` and produce snowflake IDs of the same type as `generateSnowflakeID` (i.e. `UInt64`). Furthermore, the new functions default to the UNIX epoch (aka. 1970-01-01), just like `generateSnowflakeID`. If necessary, a different epoch, e.g. Twitter's/X's epoch 2010-11-04 aka. 1288834974657 msec since UNIX epoch, can be passed. The old conversion functions are deprecated and will be removed after a transition period: to use them regardless, enable setting `allow_deprecated_snowflake_conversion_functions`. [#64948](https://github.com/ClickHouse/ClickHouse/pull/64948) ([Robert Schulze](https://github.com/rschu1ze)). -## New Feature {#new-feature} +## New feature {#new-feature} * Support empty tuples. [#55061](https://github.com/ClickHouse/ClickHouse/pull/55061) ([Amos Bird](https://github.com/amosbird)). * Add Hilbert Curve encode and decode functions. [#60156](https://github.com/ClickHouse/ClickHouse/pull/60156) ([Artem Mustafin](https://github.com/Artemmm91)). diff --git a/docs/cloud/changelogs/changelog-24-8.md b/docs/cloud/changelogs/changelog-24-8.md index 29cabc28e51..cdfe14f7e73 100644 --- a/docs/cloud/changelogs/changelog-24-8.md +++ b/docs/cloud/changelogs/changelog-24-8.md @@ -8,7 +8,7 @@ sidebar_label: 'v24.8' Relevant changes for ClickHouse Cloud services based on the v24.8 release. -## Backward Incompatible Change {#backward-incompatible-change} +## Backward incompatible change {#backward-incompatible-change} - Change binary serialization of Variant data type: add compact mode to avoid writing the same discriminator multiple times for granules with single variant or with only NULL values. Add MergeTree setting use_compact_variant_discriminators_serialization that is enabled by default. Note that Variant type is still experimental and backward-incompatible change in serialization should not impact you unless you have been working with support to get this feature enabled earlier. [#62774](https://github.com/ClickHouse/ClickHouse/pull/62774) (Kruglov Pavel). @@ -29,7 +29,7 @@ Relevant changes for ClickHouse Cloud services based on the v24.8 release. - Fix REPLACE modifier formatting (forbid omitting brackets). [#67774](https://github.com/ClickHouse/ClickHouse/pull/67774) (Azat Khuzhin). -## New Feature {#new-feature} +## New feature {#new-feature} - Extend function tuple to construct named tuples in query. Introduce function tupleNames to extract names from tuples. [#54881](https://github.com/ClickHouse/ClickHouse/pull/54881) (Amos Bird). diff --git a/docs/cloud/changelogs/changelog-25_1-25_4.md b/docs/cloud/changelogs/changelog-25_1-25_4.md index 3671f1980b1..991c394b536 100644 --- a/docs/cloud/changelogs/changelog-25_1-25_4.md +++ b/docs/cloud/changelogs/changelog-25_1-25_4.md @@ -6,7 +6,7 @@ keywords: ['changelog', 'cloud'] sidebar_label: 'v25.4' --- -## Backward Incompatible Changes {#backward-incompatible-changes} +## Backward incompatible changes {#backward-incompatible-changes} * Parquet output format converts Date and DateTime columns to date/time types supported by Parquet, instead of writing them as raw numbers. DateTime becomes DateTime64(3) (was: UInt32); setting `output_format_parquet_datetime_as_uint32` brings back the old behavior. Date becomes Date32 (was: UInt16). [#70950](https://github.com/ClickHouse/ClickHouse/pull/70950) ([Michael Kolupaev](https://github.com/al13n321)). * Don't allow comparable types (like JSON/Object/AggregateFunction) in ORDER BY and comparison functions `less/greater/equal/etc` by default. [#73276](https://github.com/ClickHouse/ClickHouse/pull/73276) ([Pavel Kruglov](https://github.com/Avogar)). @@ -25,7 +25,7 @@ sidebar_label: 'v25.4' * The legacy MongoDB integration has been removed. Server setting `use_legacy_mongodb_integration` became obsolete and now does nothing. [#77895](https://github.com/ClickHouse/ClickHouse/pull/77895) ([Robert Schulze](https://github.com/rschu1ze)). * Enhance SummingMergeTree validation to skip aggregation for columns used in partition or sort keys. [#78022](https://github.com/ClickHouse/ClickHouse/pull/78022) ([Pervakov Grigorii](https://github.com/GrigoryPervakov)). -## New Features {#new-features} +## New features {#new-features} * Added an in-memory cache for deserialized skipping index granules. This should make repeated queries that use skipping indexes faster. The size of the new cache is controlled by server settings `skipping_index_cache_size` and `skipping_index_cache_max_entries`. The original motivation for the cache were vector similarity indexes which became a lot faster now. [#70102](https://github.com/ClickHouse/ClickHouse/pull/70102) ([Robert Schulze](https://github.com/rschu1ze)). * A new implementation of the Userspace Page Cache, which allows caching data in the in-process memory instead of relying on the OS page cache. It is useful when the data is stored on a remote virtual filesystem without backing with the local filesystem cache. [#70509](https://github.com/ClickHouse/ClickHouse/pull/70509) ([Michael Kolupaev](https://github.com/al13n321)). @@ -611,7 +611,7 @@ sidebar_label: 'v25.4' * Fix crash in REFRESHABLE MV in case of ALTER after incorrect shutdown. [#78858](https://github.com/ClickHouse/ClickHouse/pull/78858) ([Azat Khuzhin](https://github.com/azat)). * Fix parsing of bad DateTime values in CSV format. [#78919](https://github.com/ClickHouse/ClickHouse/pull/78919) ([Pavel Kruglov](https://github.com/Avogar)). -## Build/Testing/Packaging Improvement {#build-testing-packaging-improvement} +## Build/testing/packaging improvement {#build-testing-packaging-improvement} * The internal dependency LLVM is bumped from 16 to 18. [#66053](https://github.com/ClickHouse/ClickHouse/pull/66053) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). * Restore deleted nats integration tests and fix errors. - fixed some race conditions in nats engine - fixed data loss when streaming data to nats in case of connection loss - fixed freeze of receiving the last chunk of data when streaming from nats ended - nats_max_reconnect is deprecated and has no effect, reconnect is performed permanently with nats_reconnect_wait timeout. [#69772](https://github.com/ClickHouse/ClickHouse/pull/69772) ([Dmitry Novikov](https://github.com/dmitry-sles-novikov)). diff --git a/docs/cloud/changelogs/fast-release-24-2.md b/docs/cloud/changelogs/fast-release-24-2.md index 714f8a8f575..3ab322c9844 100644 --- a/docs/cloud/changelogs/fast-release-24-2.md +++ b/docs/cloud/changelogs/fast-release-24-2.md @@ -8,7 +8,7 @@ sidebar_label: 'v24.2' ### ClickHouse release tag: 24.2.2.15987 {#clickhouse-release-tag-242215987} -#### Backward Incompatible Change {#backward-incompatible-change} +#### Backward incompatible change {#backward-incompatible-change} * Validate suspicious/experimental types in nested types. Previously we didn't validate such types (except JSON) in nested types like Array/Tuple/Map. [#59385](https://github.com/ClickHouse/ClickHouse/pull/59385) ([Kruglov Pavel](https://github.com/Avogar)). * The sort clause `ORDER BY ALL` (introduced with v23.12) is replaced by `ORDER BY *`. The previous syntax was too error-prone for tables with a column `all`. [#59450](https://github.com/ClickHouse/ClickHouse/pull/59450) ([Robert Schulze](https://github.com/rschu1ze)). * Add sanity check for number of threads and block sizes. [#60138](https://github.com/ClickHouse/ClickHouse/pull/60138) ([Raúl Marín](https://github.com/Algunenano)). @@ -20,9 +20,9 @@ sidebar_label: 'v24.2' * ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings, `output_format_parquet_string_as_string`, `output_format_orc_string_as_string`, `output_format_arrow_string_as_string`. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster `lz4` compression method, that's why we set `zstd` by default. This is controlled by the settings `output_format_parquet_compression_method`, `output_format_orc_compression_method`, and `output_format_arrow_compression_method`. We changed the default to `zstd` for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). [#61817](https://github.com/ClickHouse/ClickHouse/pull/61817) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with Not enough privileges. To address this problem, the release introduces a new feature of SQL security for views [https://clickhouse.com/docs/sql-reference/statements/create/view#sql_security](/sql-reference/statements/create/view#sql_security). [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) ([pufit](https://github.com/pufit)) -#### New Feature {#new-feature} +#### New feature {#new-feature} * Topk/topkweighed support mode, which return count of values and it's error. [#54508](https://github.com/ClickHouse/ClickHouse/pull/54508) ([UnamedRus](https://github.com/UnamedRus)). -* Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) ([pufit](https://github.com/pufit)). +* Added new syntax which allows to specify definer user in view/materialized view. This allows to execute selects/inserts from views without explicit grants for underlying tables. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) ([pufit](https://github.com/pufit)). * Implemented automatic conversion of merge tree tables of different kinds to replicated engine. Create empty `convert_to_replicated` file in table's data directory (`/clickhouse/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/`) and that table will be converted automatically on next server start. [#57798](https://github.com/ClickHouse/ClickHouse/pull/57798) ([Kirill](https://github.com/kirillgarbar)). * Added table function `mergeTreeIndex`. It represents the contents of index and marks files of `MergeTree` tables. It can be used for introspection. Syntax: `mergeTreeIndex(database, table, [with_marks = true])` where `database.table` is an existing table with `MergeTree` engine. [#58140](https://github.com/ClickHouse/ClickHouse/pull/58140) ([Anton Popov](https://github.com/CurtizJ)). * Try to detect file format automatically during schema inference if it's unknown in `file/s3/hdfs/url/azureBlobStorage` engines. Closes [#50576](https://github.com/ClickHouse/ClickHouse/issues/50576). [#59092](https://github.com/ClickHouse/ClickHouse/pull/59092) ([Kruglov Pavel](https://github.com/Avogar)). @@ -37,7 +37,7 @@ sidebar_label: 'v24.2' * Added function `toMillisecond` which returns the millisecond component for values of type`DateTime` or `DateTime64`. [#60281](https://github.com/ClickHouse/ClickHouse/pull/60281) ([Shaun Struwig](https://github.com/Blargian)). * Support single-argument version for the merge table function, as `merge(['db_name', ] 'tables_regexp')`. [#60372](https://github.com/ClickHouse/ClickHouse/pull/60372) ([豪肥肥](https://github.com/HowePa)). * Make all format names case insensitive, like Tsv, or TSV, or tsv, or even rowbinary. [#60420](https://github.com/ClickHouse/ClickHouse/pull/60420) ([豪肥肥](https://github.com/HowePa)). -* Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). +* Added new syntax which allows to specify definer user in view/materialized view. This allows to execute selects/inserts from views without explicit grants for underlying tables. [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). * Add four properties to the `StorageMemory` (memory-engine) `min_bytes_to_keep, max_bytes_to_keep, min_rows_to_keep` and `max_rows_to_keep` - Add tests to reflect new changes - Update `memory.md` documentation - Add table `context` property to `MemorySink` to enable access to table parameter bounds. [#60612](https://github.com/ClickHouse/ClickHouse/pull/60612) ([Jake Bamrah](https://github.com/JakeBamrah)). * Added function `toMillisecond` which returns the millisecond component for values of type`DateTime` or `DateTime64`. [#60649](https://github.com/ClickHouse/ClickHouse/pull/60649) ([Robert Schulze](https://github.com/rschu1ze)). * Separate limits on number of waiting and executing queries. Added new server setting `max_waiting_queries` that limits the number of queries waiting due to `async_load_databases`. Existing limits on number of executing queries no longer count waiting queries. [#61053](https://github.com/ClickHouse/ClickHouse/pull/61053) ([Sergei Trifonov](https://github.com/serxa)). diff --git a/docs/cloud/get-started/query-endpoints.md b/docs/cloud/get-started/query-endpoints.md index d5cfce82541..355200be90e 100644 --- a/docs/cloud/get-started/query-endpoints.md +++ b/docs/cloud/get-started/query-endpoints.md @@ -14,11 +14,11 @@ import endpoints_completed from '@site/static/images/cloud/sqlconsole/endpoints- import endpoints_curltest from '@site/static/images/cloud/sqlconsole/endpoints-curltest.png'; import endpoints_monitoring from '@site/static/images/cloud/sqlconsole/endpoints-monitoring.png'; -# Query API Endpoints +# Query API endpoints The **Query API Endpoints** feature allows you to create an API endpoint directly from any saved SQL query in the ClickHouse Cloud console. You'll be able to access API endpoints via HTTP to execute your saved queries without needing to connect to your ClickHouse Cloud service via a native driver. -## Quick-start Guide {#quick-start-guide} +## Quick-start guide {#quick-start-guide} Before proceeding, ensure you have an API key and an Admin Console Role. You can follow this guide to [create an API key](/cloud/manage/openapi). @@ -55,7 +55,7 @@ Next step, we'll go ahead and save the query: More documentation around saved queries can be found [here](/cloud/get-started/sql-console#saving-a-query). -### Configuring the Query API Endpoint {#configuring-the-query-api-endpoint} +### Configuring the query API endpoint {#configuring-the-query-api-endpoint} Query API endpoints can be configured directly from query view by clicking the **Share** button and selecting `API Endpoint`. You'll be prompted to specify which API key(s) should be able to access the endpoint: @@ -79,7 +79,7 @@ After you've sent your first request, a new button should appear immediately to -## Implementation Details {#implementation-details} +## Implementation details {#implementation-details} ### Description {#description} @@ -91,11 +91,11 @@ This route runs a query on a specified query endpoint. It supports different ver - **Method**: Basic Auth via OpenAPI Key/Secret - **Permissions**: Appropriate permissions for the query endpoint. -### URL Parameters {#url-parameters} +### URL parameters {#url-parameters} - `queryEndpointId` (required): The unique identifier of the query endpoint to run. -### Query Parameters {#query-parameters} +### Query parameters {#query-parameters} #### V1 {#v1} @@ -112,7 +112,7 @@ None - `x-clickhouse-endpoint-version` (optional): The version of the query endpoint. Supported versions are `1` and `2`. If not provided, the default version is last saved for the endpoint. - `x-clickhouse-endpoint-upgrade` (optional): Set this header to upgrade the endpoint version. This works in conjunction with the `x-clickhouse-endpoint-version` header. -### Request Body {#request-body} +### Request body {#request-body} - `queryVariables` (optional): An object containing variables to be used in the query. - `format` (optional): The format of the response. If Query API Endpoint is version 2 any ClickHouse supported format is possible. Supported formats for v1 are: @@ -132,19 +132,19 @@ None - **401 Unauthorized**: The request was made without authentication or with insufficient permissions. - **404 Not Found**: The specified query endpoint was not found. -### Error Handling {#error-handling} +### Error handling {#error-handling} - Ensure that the request includes valid authentication credentials. - Validate the `queryEndpointId` and `queryVariables` to ensure they are correct. - Handle any server errors gracefully, returning appropriate error messages. -### Upgrading the Endpoint Version {#upgrading-the-endpoint-version} +### Upgrading the endpoint version {#upgrading-the-endpoint-version} To upgrade the endpoint version from `v1` to `v2`, include the `x-clickhouse-endpoint-upgrade` header in the request and set it to `1`. This will trigger the upgrade process and allow you to use the features and improvements available in `v2`. ## Examples {#examples} -### Basic Request {#basic-request} +### Basic request {#basic-request} **Query API Endpoint SQL:** @@ -246,7 +246,7 @@ fetch( {"database":"INFORMATION_SCHEMA","num_tables":"REFERENTIAL_CONSTRAINTS"} ``` -### Request with Query Variables and Version 2 on JSONCompactEachRow Format {#request-with-query-variables-and-version-2-on-jsoncompacteachrow-format} +### Request with query variables and version 2 on JSONCompactEachRow format {#request-with-query-variables-and-version-2-on-jsoncompacteachrow-format} **Query API Endpoint SQL:** @@ -297,7 +297,7 @@ fetch( ["query_views_log", "system"] ``` -### Request with Array in the query variables that inserts data into a table {#request-with-array-in-the-query-variables-that-inserts-data-into-a-table} +### Request with array in the query variables that inserts data into a table {#request-with-array-in-the-query-variables-that-inserts-data-into-a-table} **Table SQL:** diff --git a/docs/cloud/get-started/query-insights.md b/docs/cloud/get-started/query-insights.md index 0f8047dfb41..5dbddedccb2 100644 --- a/docs/cloud/get-started/query-insights.md +++ b/docs/cloud/get-started/query-insights.md @@ -17,7 +17,7 @@ import insights_query_info from '@site/static/images/cloud/sqlconsole/insights_q The **Query Insights** feature makes ClickHouse's built-in query log easier to use through various visualizations and tables. ClickHouse's `system.query_log` table is a key source of information for query optimization, debugging, and monitoring overall cluster health and performance. -## Query Overview {#query-overview} +## Query overview {#query-overview} After selecting a service, the **Monitoring** navigation item in the left sidebar should expand to reveal a new **Query insights** sub-item. Clicking on this option opens the new Query insights page: diff --git a/docs/cloud/get-started/sql-console.md b/docs/cloud/get-started/sql-console.md index 5a8955e651d..7232a74be24 100644 --- a/docs/cloud/get-started/sql-console.md +++ b/docs/cloud/get-started/sql-console.md @@ -51,9 +51,9 @@ SQL console is the fastest and easiest way to explore and query your databases i - Execute queries and visualize result data in just a few clicks - Share queries with team members and collaborate more effectively. -### Exploring Tables {#exploring-tables} +### Exploring tables {#exploring-tables} -### Viewing Table List and Schema Info {#viewing-table-list-and-schema-info} +### Viewing table list and schema info {#viewing-table-list-and-schema-info} An overview of tables contained in your ClickHouse instance can be found in the left sidebar area. Use the database selector at the top of the left bar to view the tables in a specific database @@ -62,19 +62,19 @@ Tables in the list can also be expanded to view columns and types -### Exploring Table Data {#exploring-table-data} +### Exploring table data {#exploring-table-data} Click on a table in the list to open it in a new tab. In the Table View, data can be easily viewed, selected, and copied. Note that structure and formatting are preserved when copy-pasting to spreadsheet applications such as Microsoft Excel and Google Sheets. You can flip between pages of table data (paginated in 30-row increments) using the navigation in the footer. -### Inspecting Cell Data {#inspecting-cell-data} +### Inspecting cell data {#inspecting-cell-data} The Cell Inspector tool can be used to view large amounts of data contained within a single cell. To open it, right-click on a cell and select 'Inspect Cell'. The contents of the cell inspector can be copied by clicking the copy icon in the top right corner of the inspector contents. -## Filtering and Sorting Tables {#filtering-and-sorting-tables} +## Filtering and sorting tables {#filtering-and-sorting-tables} ### Sorting a table {#sorting-a-table} @@ -118,9 +118,9 @@ Filters and sorts are not mandatory when using the 'Create Query' feature. You can learn more about querying in the SQL console by reading the (link) query documentation. -## Creating and Running a Query {#creating-and-running-a-query} +## Creating and running a query {#creating-and-running-a-query} -### Creating a Query {#creating-a-query} +### Creating a query {#creating-a-query} There are two ways to create a new query in the SQL console. @@ -129,7 +129,7 @@ There are two ways to create a new query in the SQL console. -### Running a Query {#running-a-query} +### Running a query {#running-a-query} To run a query, type your SQL command(s) into the SQL Editor and click the 'Run' button or use the shortcut `cmd / ctrl + enter`. To write and run multiple commands sequentially, make sure to add a semicolon after each command. @@ -157,13 +157,13 @@ Running the command at the current cursor position can be achieved in two ways: The command present at the cursor position will flash yellow on execution. ::: -### Canceling a Query {#canceling-a-query} +### Canceling a query {#canceling-a-query} While a query is running, the 'Run' button in the Query Editor toolbar will be replaced with a 'Cancel' button. Simply click this button or press `Esc` to cancel the query. Note: Any results that have already been returned will persist after cancellation. -### Saving a Query {#saving-a-query} +### Saving a query {#saving-a-query} Saving queries allows you to easily find them later and share them with your teammates. The SQL console also allows you to organize your queries into folders. @@ -179,7 +179,7 @@ Alternatively, you can simultaneously name and save a query by clicking on "Unti -### Query Sharing {#query-sharing} +### Query sharing {#query-sharing} The SQL console allows you to easily share queries with your team members. The SQL console supports four levels of access that can be adjusted both globally and on a per-user basis: @@ -206,7 +206,7 @@ After selecting a team member, a new line item should appear with an access leve -### Accessing Shared Queries {#accessing-shared-queries} +### Accessing shared queries {#accessing-shared-queries} If a query has been shared with you, it will be displayed in the "Queries" tab of the SQL console left sidebar: @@ -218,7 +218,7 @@ Saved queries are also permalinked, meaning that you can send and receive links Values for any parameters that may exist in a query are automatically added to the saved query URL as query parameters. For example, if a query contains `{start_date: Date}` and `{end_date: Date}` parameters, the permalink can look like: `https://console.clickhouse.cloud/services/:serviceId/console/query/:queryId?param_start_date=2015-01-01¶m_end_date=2016-01-01`. -## Advanced Querying Features {#advanced-querying-features} +## Advanced querying features {#advanced-querying-features} ### Searching query results {#searching-query-results} @@ -246,7 +246,7 @@ Query result sets can be easily exported to CSV format directly from the SQL con -## Visualizing Query Data {#visualizing-query-data} +## Visualizing query data {#visualizing-query-data} Some data can be more easily interpreted in chart form. You can quickly create visualizations from query result data directly from the SQL console in just a few clicks. As an example, we'll use a query that calculates weekly statistics for NYC taxi trips: diff --git a/docs/cloud/manage/account-close.md b/docs/cloud/manage/account-close.md index fee12eb6cc0..b434ab66f75 100644 --- a/docs/cloud/manage/account-close.md +++ b/docs/cloud/manage/account-close.md @@ -5,13 +5,13 @@ title: 'Account Close & Deletion' description: 'We know there are circumstances that sometimes necessitate account closure. This guide will help you through the process.' --- -## Account Close & Deletion {#account-close--deletion} +## Account close & deletion {#account-close--deletion} Our goal is to help you be successful in your project. If you have questions that are not answered on this site or need help evaluating a unique use case, please contact us at [support@clickhouse.com](mailto:support@clickhouse.com). We know there are circumstances that sometimes necessitate account closure. This guide will help you through the process. -## Close vs Delete {#close-vs-delete} +## Close vs delete {#close-vs-delete} Customers may log back into closed accounts to view usage, billing and account-level activity logs. This enables you to easily access details that are useful for a variety of purposes, from documenting use cases to downloading invoices at the end of the year for tax purposes. You will also continue receiving product updates so that you know if a feature you may have been waiting for is now available. Additionally, @@ -23,7 +23,7 @@ be available. You will not receive product updates and may not reopen the accoun Newsletter subscribers can unsubscribe at any time by using the unsubscribe link at the bottom of the newsletter email without closing their account or deleting their information. -## Preparing for Closure {#preparing-for-closure} +## Preparing for closure {#preparing-for-closure} Before requesting account closure, please take the following steps to prepare the account. 1. Export any data from your service that you need to keep. @@ -31,7 +31,7 @@ Before requesting account closure, please take the following steps to prepare th 3. Remove all users except the admin that will request closure. This will help you ensure no new services are created while the process completes. 4. Review the 'Usage' and 'Billing' tabs in the control panel to verify all charges have been paid. We are not able to close accounts with unpaid balances. -## Request Account Closure {#request-account-closure} +## Request account closure {#request-account-closure} We are required to authenticate requests for both closure and deletion. To ensure your request can be processed quickly, please follow the steps outlined below. @@ -51,7 +51,7 @@ Description: We would appreciate it if you would share a brief note about why yo 6. We will close your account and send a confirmation email to let you know when it is complete. -## Request Personal Data Deletion {#request-personal-data-deletion} +## Request personal data deletion {#request-personal-data-deletion} Please note, only account administrators may request personal data deletion from ClickHouse. If you are not an account administrator, please contact your ClickHouse account administrator to request to be removed from the account. diff --git a/docs/cloud/manage/api/api-overview.md b/docs/cloud/manage/api/api-overview.md index de31f91f9d8..0d006650519 100644 --- a/docs/cloud/manage/api/api-overview.md +++ b/docs/cloud/manage/api/api-overview.md @@ -25,14 +25,14 @@ consume the ClickHouse Cloud API docs, we offer a JSON-based Swagger endpoint via https://api.clickhouse.cloud/v1. You can also find the API docs via the [Swagger UI](https://clickhouse.com/docs/cloud/manage/api/swagger). -## Rate Limits {#rate-limits} +## Rate limits {#rate-limits} Developers are limited to 100 API keys per organization. Each API key has a limit of 10 requests over a 10-second window. If you'd like to increase the number of API keys or requests per 10-second window for your organization, please contact support@clickhouse.com -## Terraform Provider {#terraform-provider} +## Terraform provider {#terraform-provider} The official ClickHouse Terraform Provider lets you use [Infrastructure as Code](https://www.redhat.com/en/topics/automation/what-is-infrastructure-as-code-iac) to create predictable, version-controlled configurations to make deployments much diff --git a/docs/cloud/manage/backups/export-backups-to-own-cloud-account.md b/docs/cloud/manage/backups/export-backups-to-own-cloud-account.md index 9288fd4b4a6..320ec111a01 100644 --- a/docs/cloud/manage/backups/export-backups-to-own-cloud-account.md +++ b/docs/cloud/manage/backups/export-backups-to-own-cloud-account.md @@ -68,9 +68,9 @@ You will need the following details to export/restore backups to your own CSP st 2. The Backup / Restore commands need to be run from the database command line. For restore to a new service, you will first need to create the service and then run the command. ::: -## Backup / Restore to AWS S3 Bucket {#backup--restore-to-aws-s3-bucket} +## Backup / Restore to AWS S3 bucket {#backup--restore-to-aws-s3-bucket} -### Take a DB Backup {#take-a-db-backup} +### Take a DB backup {#take-a-db-backup} **Full Backup** @@ -106,7 +106,7 @@ See: [Configuring BACKUP/RESTORE to use an S3 Endpoint](/operations/backup#confi ## Backup / Restore to Azure Blob Storage {#backup--restore-to-azure-blob-storage} -### Take a DB Backup {#take-a-db-backup-1} +### Take a DB backup {#take-a-db-backup-1} **Full Backup** @@ -137,7 +137,7 @@ See: [Configuring BACKUP/RESTORE to use an S3 Endpoint](/operations/backup#confi ## Backup / Restore to Google Cloud Storage (GCS) {#backup--restore-to-google-cloud-storage-gcs} -### Take a DB Backup {#take-a-db-backup-2} +### Take a DB backup {#take-a-db-backup-2} **Full Backup** diff --git a/docs/cloud/manage/backups/overview.md b/docs/cloud/manage/backups/overview.md index b2098ce5e7f..1e8f85b0c22 100644 --- a/docs/cloud/manage/backups/overview.md +++ b/docs/cloud/manage/backups/overview.md @@ -34,7 +34,7 @@ On Day 1, a full backup is taken to start the backup chain. On Day 2, an increme ## Default backup policy {#default-backup-policy} -In the Basic, Scale, and Enterprise tiers, backups are metered and billed separately from storage. All services will default to one backup with the ability to configure more, starting with the Scale tier, via the Settings tab of the Cloud Console. +In the Basic, Scale, and Enterprise tiers, backups are metered and billed separately from storage. All services will default to one backup with the ability to configure more, starting with the Scale tier, via the Settings tab of the Cloud console. ## Backup status list {#backup-status-list} @@ -171,7 +171,7 @@ SYNC SETTINGS max_table_size_to_drop=2097152 -- increases the limit to 2TB ``` ::: -## Configurable Backups {#configurable-backups} +## Configurable backups {#configurable-backups} If you want to set up a backups schedule different from the default backup schedule, take a look at [Configurable Backups](./configurable-backups.md). diff --git a/docs/cloud/manage/billing.md b/docs/cloud/manage/billing.md index 3df0e975cc8..d1a80abc00e 100644 --- a/docs/cloud/manage/billing.md +++ b/docs/cloud/manage/billing.md @@ -175,7 +175,7 @@ Best for: large scale, mission critical deployments that have stringent security
-## Frequently Asked Questions {#faqs} +## Frequently asked questions {#faqs} ### How is compute metered? {#how-is-compute-metered} @@ -191,7 +191,7 @@ Storage costs are the same across tiers and vary by region and cloud service pro Storage and backups are counted towards storage costs and billed separately. All services will default to one backup, retained for a day. -Users who need additional backups can do so by configuring additional [backups](backups/overview.md) under the settings tab of the Cloud Console. +Users who need additional backups can do so by configuring additional [backups](backups/overview.md) under the settings tab of the Cloud console. ### How do I estimate compression? {#how-do-i-estimate-compression} diff --git a/docs/cloud/manage/billing/payment-thresholds.md b/docs/cloud/manage/billing/payment-thresholds.md index 97049d81c3a..0c2b6948d0e 100644 --- a/docs/cloud/manage/billing/payment-thresholds.md +++ b/docs/cloud/manage/billing/payment-thresholds.md @@ -6,7 +6,7 @@ description: 'Payment thresholds and automatic invoicing for ClickHouse Cloud.' keywords: ['billing', 'payment thresholds', 'automatic invoicing', 'invoice'] --- -# Payment Thresholds +# Payment thresholds When your amount due in a billing period for ClickHouse Cloud reaches $10,000 USD or the equivalent value, your payment method will be automatically charged. A failed charge will result in the suspension or termination of your services after a grace period. diff --git a/docs/cloud/manage/cloud-tiers.md b/docs/cloud/manage/cloud-tiers.md index fc56abe89a5..1cd784431ec 100644 --- a/docs/cloud/manage/cloud-tiers.md +++ b/docs/cloud/manage/cloud-tiers.md @@ -5,7 +5,7 @@ title: 'ClickHouse Cloud Tiers' description: 'Cloud tiers available in ClickHouse Cloud' --- -# ClickHouse Cloud Tiers +# ClickHouse Cloud tiers There are several tiers available in ClickHouse Cloud. Tiers are assigned at any organizational level. Services within an organization therefore belong to the same tier. diff --git a/docs/cloud/manage/dashboards.md b/docs/cloud/manage/dashboards.md index 1e320c5fd78..761388007b7 100644 --- a/docs/cloud/manage/dashboards.md +++ b/docs/cloud/manage/dashboards.md @@ -24,9 +24,9 @@ import dashboards_11 from '@site/static/images/cloud/dashboards/11_dashboards.pn The SQL Console's dashboards feature allows you to collect and share visualizations from saved queries. Get started by saving and visualizing queries, adding query visualizations to a dashboard, and making the dashboard interactive using query parameters. -## Core Concepts {#core-concepts} +## Core concepts {#core-concepts} -### Query Sharing {#query-sharing} +### Query sharing {#query-sharing} In order to share your dashboard with colleagues, please be sure to share the underlying saved query. To view a visualization, users must have, at a minimum, read-only access to the underlying saved query. @@ -36,11 +36,11 @@ Use [query parameters](/sql-reference/syntax#defining-and-using-query-parameters You can toggle the query parameter input via the **Global** filters side pane by selecting a “filter” type in the visualization settings. You can also toggle the query parameter input by linking to another object (like a table) on the dashboard. Please see the “[configure a filter](/cloud/manage/dashboards#configure-a-filter)” section of the quick start guide below. -## Quick Start {#quick-start} +## Quick start {#quick-start} Let's create a dashboard to monitor our ClickHouse service using the [query\_log](/operations/system-tables/query_log) system table. -## Quick Start {#quick-start-1} +## Quick start {#quick-start-1} ### Create a saved query {#create-a-saved-query} diff --git a/docs/cloud/manage/integrations.md b/docs/cloud/manage/integrations.md index fb2c23f8453..67e562aa23d 100644 --- a/docs/cloud/manage/integrations.md +++ b/docs/cloud/manage/integrations.md @@ -7,7 +7,7 @@ description: 'Integrations for ClickHouse' To see a full list of integrations for ClickHouse, please see [this page](/integrations). -## Proprietary Integrations for ClickHouse Cloud {#proprietary-integrations-for-clickhouse-cloud} +## Proprietary integrations for ClickHouse Cloud {#proprietary-integrations-for-clickhouse-cloud} Besides the dozens of integrations available for ClickHouse, there are also some proprietary integrations only available for ClickHouse Cloud: @@ -24,9 +24,9 @@ Looker Studio can be connected to ClickHouse Cloud by enabling the [MySQL interf ### MySQL Interface {#mysql-interface} -Some applications currently do not support the ClickHouse wire protocol. To use ClickHouse Cloud with these applications, you can enable the MySQL wire protocol through the Cloud Console. Please see [this page](/interfaces/mysql#enabling-the-mysql-interface-on-clickhouse-cloud) for details on how to enable the MySQL wire protocol through the Cloud Console. +Some applications currently do not support the ClickHouse wire protocol. To use ClickHouse Cloud with these applications, you can enable the MySQL wire protocol through the Cloud console. Please see [this page](/interfaces/mysql#enabling-the-mysql-interface-on-clickhouse-cloud) for details on how to enable the MySQL wire protocol through the Cloud console. -## Unsupported Integrations {#unsupported-integrations} +## Unsupported integrations {#unsupported-integrations} The following features for integrations are not currently available for ClickHouse Cloud as they are experimental features. If you need to support these features in your application, please contact support@clickhouse.com. diff --git a/docs/cloud/manage/jan2025_faq/backup.md b/docs/cloud/manage/jan2025_faq/backup.md index 706435db827..579788f8dec 100644 --- a/docs/cloud/manage/jan2025_faq/backup.md +++ b/docs/cloud/manage/jan2025_faq/backup.md @@ -7,7 +7,7 @@ description: 'Backup policy in new tiers' ## What is the backup policy? {#what-is-the-backup-policy} In Basic, Scale, and Enterprise tiers backups are metered and billed separately from storage. -All services will default to one daily backup with the ability to configure more, starting with the Scale tier, via the Settings tab of the Cloud Console. Each backup will be retained for at least 24 hours. +All services will default to one daily backup with the ability to configure more, starting with the Scale tier, via the Settings tab of the Cloud console. Each backup will be retained for at least 24 hours. ## What happens to current configurations that users have set up separate from default backups? {#what-happens-to-current-configurations-that-users-have-set-up-separate-from-default-backups} diff --git a/docs/cloud/manage/jan2025_faq/new_tiers.md b/docs/cloud/manage/jan2025_faq/new_tiers.md index 704e3e442f4..3e87d497bb6 100644 --- a/docs/cloud/manage/jan2025_faq/new_tiers.md +++ b/docs/cloud/manage/jan2025_faq/new_tiers.md @@ -15,7 +15,7 @@ description: 'Description of new tiers and features' - **Single Sign On (SSO):** This feature is offered in Enterprise tier and requires a support ticket to be enabled for an Organization. Users who have multiple Organizations should ensure all of their organizations are on the Enterprise tier to use SSO for each organization. -## Basic Tier {#basic-tier} +## Basic tier {#basic-tier} ### What are the considerations for the Basic tier? {#what-are-the-considerations-for-the-basic-tier} @@ -37,7 +37,7 @@ Yes, single replica services are supported on all three tiers. Users can scale o No, services on this tier are meant to support workloads that are small and fixed size (single replica `1x8GiB` or `1x12GiB`). If users need to scale up/down or add replicas, they will be prompted to upgrade to Scale or Enterprise tiers. -## Scale Tier {#scale-tier} +## Scale tier {#scale-tier} ### Which tiers on the new plans (Basic/Scale/Enterprise) support compute-compute separation? {#which-tiers-on-the-new-plans-basicscaleenterprise-support-compute-compute-separation} @@ -47,7 +47,7 @@ Only Scale and Enterprise tiers support compute-compute separation. Please also Compute-compute separation is not supported on existing Development and Production services, except for users who already participated in the Private Preview and Beta. If you have additional questions, please contact [support](https://clickhouse.com/support/program). -## Enterprise Tier {#enterprise-tier} +## Enterprise tier {#enterprise-tier} ### What different hardware profiles are supported for the Enterprise tier? {#what-different-hardware-profiles-are-supported-for-the-enterprise-tier} diff --git a/docs/cloud/manage/jan2025_faq/plan_migrations.md b/docs/cloud/manage/jan2025_faq/plan_migrations.md index f69e941127d..cce2fbe0fb9 100644 --- a/docs/cloud/manage/jan2025_faq/plan_migrations.md +++ b/docs/cloud/manage/jan2025_faq/plan_migrations.md @@ -28,19 +28,21 @@ Yes, see below for guidance on self-serve migrations: Users can upgrade during the trial and continue to use the trial credits to evaluate the new service tiers and the features it supports. However, if they choose to continue using the same Development and Production services, they can do so and upgrade to PAYG. They will still have to migrate before July 23, 2025. -### Can users upgrade their tiers, i.e. Basic → Scale, Scale → Enterprise, etc? {#can-users-upgrade-their-tiers-ie-basic--scale-scale--enterprise-etc} +### Can users upgrade their tiers {#can-users-upgrade-their-tiers-ie-basic--scale-scale--enterprise-etc} -Yes, users can upgrade self-serve and the pricing will reflect the tier selection after upgrade. +Can users upgrade their tiers, for example, Basic → Scale, Scale → Enterprise, etc. +Yes, users can upgrade self-serve, and the pricing will reflect the tier selection after upgrade. -### Can users move from a higher to a lower-cost tier, e.g., Enterprise → Scale, Scale → Basic, Enterprise → Basic self-serve? {#can-users-move-from-a-higher-to-a-lower-cost-tier-eg-enterprise--scale-scale--basic-enterprise--basic-self-serve} +### Can users move from a higher to a lower-cost tier {#can-users-move-from-a-higher-to-a-lower-cost-tier-eg-enterprise--scale-scale--basic-enterprise--basic-self-serve} +For example, Enterprise → Scale, Scale → Basic, Enterprise → Basic self-serve? No, we do not permit downgrading tiers. -### Can users with only Development services in the organization migrate to the Basic tier? {#can-users-with-only-development-services-in-the-organization-migrate-to-the-basic-tier} +### Can users with only development services in the organization migrate to the Basic tier? {#can-users-with-only-development-services-in-the-organization-migrate-to-the-basic-tier} Yes, this would be permitted. Users will be given a recommendation based on their past use and can select Basic `1x8GiB` or `1x12GiB`. -### Can users with a Development and Production service in the same organization move to the Basic Tier? {#can-users-with-a-development-and-production-service-in-the-same-organization-move-to-the-basic-tier} +### Can users with a development and production service in the same organization move to the basic tier? {#can-users-with-a-development-and-production-service-in-the-same-organization-move-to-the-basic-tier} No, if a user has both Development and Production services in the same organization, they can self-serve and migrate only to the Scale or Enterprise tier. If they want to migrate to Basic, they should delete all existing Production services. diff --git a/docs/cloud/manage/notifications.md b/docs/cloud/manage/notifications.md index abc9fde3064..708c41b2274 100644 --- a/docs/cloud/manage/notifications.md +++ b/docs/cloud/manage/notifications.md @@ -17,7 +17,7 @@ ClickHouse Cloud sends notifications about critical events related to your servi 2. **Notification severity**: Notification severity can be `info`, `warning`, or `critical` depending on how important a notification is. This is not configurable. 3. **Notification channel**: Channel refers to the mode by which the notification is received such as UI, email, Slack etc. This is configurable for most notifications. -## Receiving Notifications {#receiving-notifications} +## Receiving notifications {#receiving-notifications} Notifications can be received via various channels. For now, ClickHouse Cloud supports receiving notifications through email, ClickHouse Cloud UI, and Slack. You can click on the bell icon in the top left menu to view current notifications, which opens a flyout. Clicking the button **View All** the bottom of the flyout will take you to a page that shows an activity log of all notifications. @@ -27,7 +27,7 @@ Notifications can be received via various channels. For now, ClickHouse Cloud su ClickHouse Cloud notifications activity log -## Customizing Notifications {#customizing-notifications} +## Customizing notifications {#customizing-notifications} For each notification, you can customize how you receive the notification. You can access the settings screen from the notifications flyout or from the second tab on the notifications activity log. @@ -43,6 +43,6 @@ To configure delivery for a specific notification, click on the pencil icon to m Certain **required** notifications such as **Payment failed** are not configurable. ::: -## Supported Notifications {#supported-notifications} +## Supported notifications {#supported-notifications} Currently, we send out notifications related to billing (payment failure, usage exceeded ascertain threshold, etc.) as well as notifications related to scaling events (scaling completed, scaling blocked etc.). diff --git a/docs/cloud/manage/openapi.md b/docs/cloud/manage/openapi.md index 4e1410bb53f..919cb38cc48 100644 --- a/docs/cloud/manage/openapi.md +++ b/docs/cloud/manage/openapi.md @@ -12,7 +12,7 @@ import image_04 from '@site/static/images/cloud/manage/openapi4.png'; import image_05 from '@site/static/images/cloud/manage/openapi5.png'; import Image from '@theme/IdealImage'; -# Managing API Keys +# Managing API keys ClickHouse Cloud provides an API utilizing OpenAPI that allows you to programmatically manage your account and aspects of your services. diff --git a/docs/cloud/manage/postman.md b/docs/cloud/manage/postman.md index 51655f3c536..d1917939568 100644 --- a/docs/cloud/manage/postman.md +++ b/docs/cloud/manage/postman.md @@ -28,16 +28,19 @@ This guide will help you test the ClickHouse Cloud API using [Postman](https://w The Postman Application is available for use within a web browser or can be downloaded to a desktop. ### Create an account {#create-an-account} + * Free accounts are available at [https://www.postman.com](https://www.postman.com). Postman site -### Create a Workspace {#create-a-workspace} +### Create a workspace {#create-a-workspace} + * Name your workspace and set the visibility level. Create workspace -### Create a Collection {#create-a-collection} +### Create a collection {#create-a-collection} + * Below "Explore" on the top left Menu click "Import": Explore > Import @@ -63,7 +66,7 @@ The Postman Application is available for use within a web browser or can be down Import complete -### Set Authorization {#set-authorization} +### Set authorization {#set-authorization} * Toggle the dropdown menu to select "Basic Auth": Basic auth @@ -72,9 +75,12 @@ The Postman Application is available for use within a web browser or can be down credentials -### Enable Variables {#enable-variables} +### Enable variables {#enable-variables} + * [Variables](https://learning.postman.com/docs/sending-requests/variables/) enable the storage and reuse of values in Postman allowing for easier API testing. -#### Set the Organization ID and Service ID {#set-the-organization-id-and-service-id} + +#### Set the organization ID and Service ID {#set-the-organization-id-and-service-id} + * Within the "Collection", click the "Variable" tab in the middle pane (The Base URL will have been set by the earlier API import): * Below `baseURL` click the open field "Add new value", and Substitute your organization ID and service ID: @@ -82,7 +88,9 @@ The Postman Application is available for use within a web browser or can be down ## Test the ClickHouse Cloud API functionalities {#test-the-clickhouse-cloud-api-functionalities} + ### Test "GET list of available organizations" {#test-get-list-of-available-organizations} + * Under the "OpenAPI spec for ClickHouse Cloud", expand the folder > V1 > organizations * Click "GET list of available organizations" and press the blue "Send" button on the right: @@ -93,6 +101,7 @@ The Postman Application is available for use within a web browser or can be down Status ### Test "GET organizational details" {#test-get-organizational-details} + * Under the `organizationid` folder, navigate to "GET organizational details": * In the middle frame menu under Params an `organizationid` is required. @@ -109,6 +118,7 @@ The Postman Application is available for use within a web browser or can be down * The returned results should deliver your organization details with "status": 200. (If you receive a "status" 400 with no organization information your configuration is not correct). ### Test "GET service details" {#test-get-service-details} + * Click "GET service details" * Edit the Values for `organizationid` and `serviceid` with `{{orgid}}` and `{{serviceid}}` respectively. * Press "Save" and then the blue "Send" button on the right. diff --git a/docs/cloud/manage/replica-aware-routing.md b/docs/cloud/manage/replica-aware-routing.md index a32f5de78c5..c220fba21bb 100644 --- a/docs/cloud/manage/replica-aware-routing.md +++ b/docs/cloud/manage/replica-aware-routing.md @@ -5,7 +5,7 @@ description: 'How to use Replica-aware routing to increase cache re-use' keywords: ['cloud', 'sticky endpoints', 'sticky', 'endpoints', 'sticky routing', 'routing', 'replica aware routing'] --- -# Replica-aware routing (Private Preview) +# Replica-aware routing (private preview) Replica-aware routing (also known as sticky sessions, sticky routing, or session affinity) utilizes [Envoy proxy's ring hash load balancing](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/load_balancers#ring-hash). The main purpose of replica-aware routing is to increase the chance of cache reuse. It does not guarantee isolation. @@ -31,6 +31,6 @@ Any disruption to the service, e.g. server pod restarts (due to any reason like Customers need to manually add a DNS entry to make name resolution work for the new hostname pattern. It is possible that this can cause imbalance in the server load if customers use it incorrectly. -## Configuring Replica-aware Routing {#configuring-replica-aware-routing} +## Configuring replica-aware routing {#configuring-replica-aware-routing} To enable Replica-aware routing, please contact [our support team](https://clickhouse.com/support). diff --git a/docs/cloud/manage/scaling.md b/docs/cloud/manage/scaling.md index 429bbc848d7..6da2eb2570f 100644 --- a/docs/cloud/manage/scaling.md +++ b/docs/cloud/manage/scaling.md @@ -15,7 +15,7 @@ import scaling_configure from '@site/static/images/cloud/manage/scaling-configur import scaling_memory_allocation from '@site/static/images/cloud/manage/scaling-memory-allocation.png'; import ScalePlanFeatureBadge from '@theme/badges/ScalePlanFeatureBadge' -# Automatic Scaling +# Automatic scaling Scaling is the ability to adjust available resources to meet client demands. Scale and Enterprise (with standard 1:4 profile) tier services can be scaled horizontally by calling an API programmatically, or changing settings on the UI to adjust system resources. Alternatively, these services can be **autoscaled** vertically to meet application demands. @@ -110,7 +110,7 @@ Once the service has scaled, the metrics dashboard in the cloud console should s Scaling memory allocation -## Automatic Idling {#automatic-idling} +## Automatic idling {#automatic-idling} In the **Settings** page, you can also choose whether or not to allow automatic idling of your service when it is inactive as shown in the image above (i.e. when the service is not executing any user-submitted queries). Automatic idling reduces the cost of your service, as you are not billed for compute resources when the service is paused. :::note @@ -123,7 +123,8 @@ The service may enter an idle state where it suspends refreshes of [refreshable Use automatic idling only if your use case can handle a delay before responding to queries, because when a service is paused, connections to the service will time out. Automatic idling is ideal for services that are used infrequently and where a delay can be tolerated. It is not recommended for services that power customer-facing features that are used frequently. ::: -## Handling bursty workloads {#handling-bursty-workloads} +## Handling spikes in workload {#handling-bursty-workloads} + If you have an upcoming expected spike in your workload, you can use the [ClickHouse Cloud API](/cloud/manage/api/api-overview) to preemptively scale up your service to handle the spike and scale it down once diff --git a/docs/cloud/manage/service-uptime.md b/docs/cloud/manage/service-uptime.md index cae47a221e3..3a31e459eaf 100644 --- a/docs/cloud/manage/service-uptime.md +++ b/docs/cloud/manage/service-uptime.md @@ -5,7 +5,7 @@ title: 'Service Uptime' description: 'Users can now see regional uptimes on the status page and subscribe to alerts on service disruptions.' --- -## Uptime Alerts {#uptime-alerts} +## Uptime alerts {#uptime-alerts} Users can now see regional uptimes on the [status page](https://status.clickhouse.com/) and subscribe to alerts on service disruptions. diff --git a/docs/cloud/manage/settings.md b/docs/cloud/manage/settings.md index 0ce24c99d60..a766ef59c13 100644 --- a/docs/cloud/manage/settings.md +++ b/docs/cloud/manage/settings.md @@ -8,7 +8,7 @@ description: 'How to configure settings for your ClickHouse Cloud service for a import Image from '@theme/IdealImage'; import cloud_settings_sidebar from '@site/static/images/cloud/manage/cloud-settings-sidebar.png'; -# Configuring Settings +# Configuring settings To specify settings for your ClickHouse Cloud service for a specific [user](/operations/access-rights#user-account-management) or [role](/operations/access-rights#role-management), you must use [SQL-driven Settings Profiles](/operations/access-rights#settings-profiles-management). Applying Settings Profiles ensures that the settings you configure persist, even when your services stop, idle, and upgrade. To learn more about Settings Profiles, please see [this page](/operations/settings/settings-profiles.md). diff --git a/docs/cloud/reference/architecture.md b/docs/cloud/reference/architecture.md index a84e0eef521..9c3d7cf5f56 100644 --- a/docs/cloud/reference/architecture.md +++ b/docs/cloud/reference/architecture.md @@ -7,7 +7,7 @@ description: 'This page describes the architecture of ClickHouse Cloud' import Architecture from '@site/static/images/cloud/reference/architecture.svg'; -# ClickHouse Cloud Architecture +# ClickHouse Cloud architecture @@ -43,10 +43,10 @@ For AWS, access to storage is controlled via AWS IAM, and each IAM role is uniqu For GCP and Azure, services have object storage isolation (all services have their own buckets or storage container). -## Compute-Compute Separation {#compute-compute-separation} +## Compute-compute separation {#compute-compute-separation} [Compute-compute separation](/cloud/reference/warehouses) lets users create multiple compute node groups, each with their own service URL, that all use the same shared object storage. This allows for compute isolation of different use cases such as reads from writes, that share the same data. It also leads to more efficient resource utilization by allowing for independent scaling of the compute groups as needed. -## Concurrency Limits {#concurrency-limits} +## Concurrency limits {#concurrency-limits} There is no limit to the number of queries per second (QPS) in your ClickHouse Cloud service. There is, however, a limit of 1000 concurrent queries per replica. QPS is ultimately a function of your average query execution time and the number of replicas in your service. diff --git a/docs/cloud/reference/byoc.md b/docs/cloud/reference/byoc.md index b35b6984d55..6de7eca0699 100644 --- a/docs/cloud/reference/byoc.md +++ b/docs/cloud/reference/byoc.md @@ -48,23 +48,23 @@ Metrics and logs are stored within the customer's BYOC VPC. Logs are currently s
-## Onboarding Process {#onboarding-process} +## Onboarding process {#onboarding-process} Customers can initiate the onboarding process by reaching out to [us](https://clickhouse.com/cloud/bring-your-own-cloud). Customers need to have a dedicated AWS account and know the region they will use. At this time, we are allowing users to launch BYOC services only in the regions that we support for ClickHouse Cloud. -### Prepare an AWS Account {#prepare-an-aws-account} +### Prepare an AWS account {#prepare-an-aws-account} Customers are recommended to prepare a dedicated AWS account for hosting the ClickHouse BYOC deployment to ensure better isolation. However, using a shared account and an existing VPC is also possible. See the details in *Setup BYOC Infrastructure* below. With this account and the initial organization admin email, you can contact ClickHouse support. -### Apply CloudFormation Template {#apply-cloudformation-template} +### Apply CloudFormation template {#apply-cloudformation-template} BYOC setup is initialized via a [CloudFormation stack](https://s3.us-east-2.amazonaws.com/clickhouse-public-resources.clickhouse.cloud/cf-templates/byoc.yaml), which creates only a role allowing BYOC controllers from ClickHouse Cloud to manage infrastructure. The S3, VPC, and compute resources for running ClickHouse are not included in this stack. -### Setup BYOC Infrastructure {#setup-byoc-infrastructure} +### Set up BYOC infrastructure {#setup-byoc-infrastructure} After creating the CloudFormation stack, you will be prompted to set up the infrastructure, including S3, VPC, and the EKS cluster, from the cloud console. Certain configurations must be determined at this stage, as they cannot be changed later. Specifically: @@ -106,7 +106,7 @@ Create a support ticket with the following information: To create or delete VPC peering for ClickHouse BYOC, follow the steps: -#### Step 1 Enable Private Load Balancer for ClickHouse BYOC {#step-1-enable-private-load-balancer-for-clickhouse-byoc} +#### Step 1: Enable private load balancer for ClickHouse BYOC {#step-1-enable-private-load-balancer-for-clickhouse-byoc} Contact ClickHouse Support to enable Private Load Balancer. #### Step 2 Create a peering connection {#step-2-create-a-peering-connection} @@ -162,7 +162,7 @@ In the peering AWS account,
-#### Step 6 Edit Security Group to allow Peered VPC access {#step-6-edit-security-group-to-allow-peered-vpc-access} +#### Step 6: Edit security group to allow peered VPC access {#step-6-edit-security-group-to-allow-peered-vpc-access} In the ClickHouse BYOC account, you need to update the Security Group settings to allow traffic from your peered VPC. Please contact ClickHouse Support to request the addition of inbound rules that include the CIDR ranges of your peered VPC. --- @@ -174,7 +174,7 @@ To access ClickHouse privately, a private load balancer and endpoint are provisi Optional, after verifying that peering is working, you can request the removal of the public load balancer for ClickHouse BYOC. -## Upgrade Process {#upgrade-process} +## Upgrade process {#upgrade-process} We regularly upgrade the software, including ClickHouse database version upgrades, ClickHouse Operator, EKS, and other components. @@ -184,7 +184,7 @@ While we aim for seamless upgrades (e.g., rolling upgrades and restarts), some, Maintenance windows do not apply to security and vulnerability fixes. These are handled as off-cycle upgrades, with timely communication to coordinate a suitable time and minimize operational impact. ::: -## CloudFormation IAM Roles {#cloudformation-iam-roles} +## CloudFormation IAM roles {#cloudformation-iam-roles} ### Bootstrap IAM role {#bootstrap-iam-role} @@ -218,7 +218,7 @@ These roles are assumed by applications running within the customer's EKS cluste Lastly, **`data-plane-mgmt`** allows a ClickHouse Cloud Control Plane component to reconcile necessary custom resources, such as `ClickHouseCluster` and the Istio Virtual Service/Gateway. -## Network Boundaries {#network-boundaries} +## Network boundaries {#network-boundaries} This section covers different network traffic to and from the customer BYOC VPC: @@ -309,7 +309,7 @@ Besides Clickhouse instances (ClickHouse servers and ClickHouse Keeper), we run Currently we have 3 m5.xlarge nodes (one for each AZ) in a dedicated node group to run those workloads. -### Network and Security {#network-and-security} +### Network and security {#network-and-security} #### Can we revoke permissions set up during installation after setup is complete? {#can-we-revoke-permissions-set-up-during-installation-after-setup-is-complete} @@ -331,9 +331,9 @@ Contact support to schedule maintenance windows. Please expect a minimum of a we ## Observability {#observability} -### Built-in Monitoring Tools {#built-in-monitoring-tools} +### Built-in monitoring tools {#built-in-monitoring-tools} -#### Observability Dashboard {#observability-dashboard} +#### Observability dashboard {#observability-dashboard} ClickHouse Cloud includes an advanced observability dashboard that displays metrics such as memory usage, query rates, and I/O. This can be accessed in the **Monitoring** section of ClickHouse Cloud web console interface. @@ -343,7 +343,7 @@ ClickHouse Cloud includes an advanced observability dashboard that displays metr
-#### Advanced Dashboard {#advanced-dashboard} +#### Advanced dashboard {#advanced-dashboard} You can customize a dashboard using metrics from system tables like `system.metrics`, `system.events`, and `system.asynchronous_metrics` and more to monitor server performance and resource utilization in detail. diff --git a/docs/cloud/reference/changelog.md b/docs/cloud/reference/changelog.md index e49bc042789..4172e93632a 100644 --- a/docs/cloud/reference/changelog.md +++ b/docs/cloud/reference/changelog.md @@ -232,7 +232,7 @@ Users can schedule upgrades for their services. This feature is supported for En [Golang](https://github.com/ClickHouse/clickhouse-go/releases/tag/v2.30.1), [Python](https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.8.11), and [NodeJS](https://github.com/ClickHouse/clickhouse-js/releases/tag/1.10.1) clients added support for Dynamic, Variant, and JSON types. -### DBT support for Refreshable Materialized Views {#dbt-support-for-refreshable-materialized-views} +### DBT support for refreshable materialized views {#dbt-support-for-refreshable-materialized-views} DBT now [supports Refreshable Materialized Views](https://github.com/ClickHouse/dbt-clickhouse/releases/tag/v1.8.7) in the `1.8.7` release. @@ -275,15 +275,15 @@ Org Admins can now add more email addresses to a specific notification as additi ## December 6, 2024 {#december-6-2024} -### BYOC (Beta) {#byoc-beta} +### BYOC (beta) {#byoc-beta} Bring Your Own Cloud for AWS is now available in Beta. This deployment model allows you to deploy and run ClickHouse Cloud in your own AWS account. We support deployments in 11+ AWS regions, with more coming soon. Please [contact support](https://clickhouse.com/support/program) for access. Note that this deployment is reserved for large-scale deployments. -### Postgres Change-Data-Capture (CDC) Connector in ClickPipes {#postgres-change-data-capture-cdc-connector-in-clickpipes} +### Postgres Change Data Capture (CDC) connector in ClickPipes {#postgres-change-data-capture-cdc-connector-in-clickpipes} This turnkey integration enables customers to replicate their Postgres databases to ClickHouse Cloud in just a few clicks and leverage ClickHouse for blazing-fast analytics. You can use this connector for both continuous replication and one-time migrations from Postgres. -### Dashboards (Beta) {#dashboards-beta} +### Dashboards (beta) {#dashboards-beta} This week, we're excited to announce the Beta launch of Dashboards in ClickHouse Cloud. With Dashboards, users can turn saved queries into visualizations, organize visualizations onto dashboards, and interact with dashboards using query parameters. To get started, follow the [dashboards documentation](/cloud/manage/dashboards). @@ -309,7 +309,7 @@ To get started, follow the [Query API Endpoints documentation](/cloud/get-starte We are launching Beta for our native JSON support in ClickHouse Cloud. To get started, please get in touch with support[ to enable your cloud service](/cloud/support). -### Vector search using vector similarity indexes (Early Access) {#vector-search-using-vector-similarity-indexes-early-access} +### Vector search using vector similarity indexes (early access) {#vector-search-using-vector-similarity-indexes-early-access} We are announcing vector similarity indexes for approximate vector search in early access! @@ -317,7 +317,7 @@ ClickHouse already offers robust support for vector-based use cases, with a wide To get started, [please sign up for the early access waitlist](https://clickhouse.com/cloud/vector-search-index-waitlist). -### ClickHouse-Connect (Python) and ClickHouse-Kafka-Connect Users {#clickhouse-connect-python-and-clickhouse-kafka-connect-users} +### ClickHouse-connect (Python) and ClickHouse Kafka Connect users {#clickhouse-connect-python-and-clickhouse-kafka-connect-users} Notification emails went out to customers who had experienced issues where the clients could encounter a `MEMORY_LIMIT_EXCEEDED` exception. @@ -355,7 +355,7 @@ We've improved autocomplete significantly, allowing you to get in-line SQL compl Animation showing the AI Copilot providing SQL autocompletion suggestions as a user types -### New "Billing" role {#new-billing-role} +### New "billing" role {#new-billing-role} You can now assign users in your organization to a new **Billing** role that allows them to view and manage billing information without giving them the ability to configure or manage services. Simply invite a new user or edit an existing user's role to assign the **Billing** role. @@ -381,7 +381,7 @@ Customers looking for increased security for protected health information (PHI) Services are available in GCP `us-central-1` to customers with the **Dedicated** service type and require a Business Associate Agreement (BAA). Contact [sales](mailto:sales@clickhouse.com) or [support](https://clickhouse.com/support/program) to request access to this feature or join the wait list for additional GCP, AWS, and Azure regions. -### Compute-Compute separation is now in Private Preview for GCP and Azure {#compute-compute-separation-is-now-in-private-preview-for-gcp-and-azure} +### Compute-compute separation is now in private preview for GCP and Azure {#compute-compute-separation-is-now-in-private-preview-for-gcp-and-azure} We recently announced the Private Preview for Compute-Compute Separation for AWS. We're happy to announce that it is now available for GCP and Azure. @@ -391,7 +391,7 @@ Compute-compute separation allows you to designate specific services as read-wri Customers using multi-factor authentication can now obtain recovery codes that can be used in the event of a lost phone or accidentally deleted token. Customers enrolling in MFA for the first time will be provided the code on set up. Customers with existing MFA can obtain a recovery code by removing their existing MFA token and adding a new one. -### ClickPipes Update: Custom Certificates, Latency Insights, and More! {#clickpipes-update-custom-certificates-latency-insights-and-more} +### ClickPipes update: custom certificates, latency insights, and more! {#clickpipes-update-custom-certificates-latency-insights-and-more} We're excited to share the latest updates for ClickPipes, the easiest way to ingest data into your ClickHouse service! These new features are designed to enhance your control over data ingestion and provide greater visibility into performance metrics. @@ -445,11 +445,11 @@ ClickPipes is the easiest way to ingest data into ClickHouse Cloud. We're happy ## July 18, 2024 {#july-18-2024} -### Prometheus Endpoint for Metrics is now Generally Available {#prometheus-endpoint-for-metrics-is-now-generally-available} +### Prometheus endpoint for metrics is now generally available {#prometheus-endpoint-for-metrics-is-now-generally-available} In our last cloud changelog, we announced the Private Preview for exporting [Prometheus](https://prometheus.io/) metrics from ClickHouse Cloud. This feature allows you to use the [ClickHouse Cloud API](/cloud/manage/api/api-overview) to get your metrics into tools like [Grafana](https://grafana.com/) and [Datadog](https://www.datadoghq.com/) for visualization. We're happy to announce that this feature is now **Generally Available**. Please see [our docs](/integrations/prometheus) to learn more about this feature. -### Table Inspector in Cloud Console {#table-inspector-in-cloud-console} +### Table inspector in Cloud console {#table-inspector-in-cloud-console} ClickHouse has commands like [`DESCRIBE`](/sql-reference/statements/describe-table) that allow you to introspect your table to examine schema. These commands output to the console, but they are often not convenient to use as you need to combine several queries to retrieve all pertinent data about your tables and columns. @@ -469,7 +469,7 @@ Stay tuned for more improvements to the analyzer as we have many more optimizati ## June 28, 2024 {#june-28-2024} -### ClickHouse Cloud for Microsoft Azure is now Generally Available! {#clickhouse-cloud-for-microsoft-azure-is-now-generally-available} +### ClickHouse Cloud for Microsoft Azure is now generally available! {#clickhouse-cloud-for-microsoft-azure-is-now-generally-available} We first announced Microsoft Azure support in Beta [this past May](https://clickhouse.com/blog/clickhouse-cloud-is-now-on-azure-in-public-beta). In this latest cloud release, we're happy to announce that our Azure support is transitioning from Beta to Generally Available. ClickHouse Cloud is now available on all the three major cloud platforms: AWS, Google Cloud Platform, and now Microsoft Azure. @@ -480,19 +480,19 @@ This release also includes support for subscriptions via the [Microsoft Azure Ma If you'd like any specific region to be supported, please [contact us](https://clickhouse.com/support/program). -### Query Log Insights {#query-log-insights} +### Query log insights {#query-log-insights} -Our new Query Insights UI in the Cloud Console makes ClickHouse's built-in query log a lot easier to use. ClickHouse's `system.query_log` table is a key source of information for query optimization, debugging, and monitoring overall cluster health and performance. There's just one caveat: with 70+ fields and multiple records per query, interpreting the query log represents a steep learning curve. This initial version of query insights provides a blueprint for future work to simplify query debugging and optimization patterns. We'd love to hear your feedback as we continue to iterate on this feature, so please reach out—your input will be greatly appreciated! +Our new Query Insights UI in the Cloud console makes ClickHouse's built-in query log a lot easier to use. ClickHouse's `system.query_log` table is a key source of information for query optimization, debugging, and monitoring overall cluster health and performance. There's just one caveat: with 70+ fields and multiple records per query, interpreting the query log represents a steep learning curve. This initial version of query insights provides a blueprint for future work to simplify query debugging and optimization patterns. We'd love to hear your feedback as we continue to iterate on this feature, so please reach out—your input will be greatly appreciated! ClickHouse Cloud Query Insights UI showing query performance metrics and analysis -### Prometheus Endpoint for Metrics (Private Preview) {#prometheus-endpoint-for-metrics-private-preview} +### Prometheus endpoint for metrics (private preview) {#prometheus-endpoint-for-metrics-private-preview} Perhaps one of our most requested features: you can now export [Prometheus](https://prometheus.io/) metrics from ClickHouse Cloud to [Grafana](https://grafana.com/) and [Datadog](https://www.datadoghq.com/) for visualization. Prometheus provides an open-source solution to monitor ClickHouse and set up custom alerts. Access to Prometheus metrics for your ClickHouse Cloud service is available via the [ClickHouse Cloud API](/integrations/prometheus). This feature is currently in Private Preview. Please reach out to the [support team](https://clickhouse.com/support/program) to enable this feature for your organization. Grafana dashboard showing Prometheus metrics from ClickHouse Cloud -### Other features: {#other-features} +### Other features {#other-features} - [Configurable backups](/cloud/manage/backups/configurable-backups) to configure custom backup policies like frequency, retention, and schedule are now Generally Available. ## June 13, 2024 {#june-13-2024} @@ -540,7 +540,7 @@ We expect to have ClickHouse Cloud for Azure ready for General Availability in t Note: **Development** services for Azure are not supported at this time. -### Set up Private Link via the Cloud Console {#set-up-private-link-via-the-cloud-console} +### Set up Private Link via the Cloud console {#set-up-private-link-via-the-cloud-console} Our Private Link feature allows you to connect your ClickHouse Cloud services with internal services in your cloud provider account without having to direct traffic to the public internet, saving costs and enhancing security. Previously, this was difficult to set up and required using the ClickHouse Cloud API. @@ -550,7 +550,7 @@ You can now configure private endpoints in just a few clicks directly from the C ## May 17, 2024 {#may-17-2024} -### Ingest data from Amazon Kinesis using ClickPipes (Beta) {#ingest-data-from-amazon-kinesis-using-clickpipes-beta} +### Ingest data from Amazon Kinesis using ClickPipes (beta) {#ingest-data-from-amazon-kinesis-using-clickpipes-beta} ClickPipes is an exclusive service provided by ClickHouse Cloud to ingest data without code. Amazon Kinesis is AWS's fully managed streaming service to ingest and store data streams for processing. We are thrilled to launch the ClickPipes beta for Amazon Kinesis, one of our most requested integrations. We're looking to add more integrations to ClickPipes, so please let us know which data source you'd like us to support! Read more about this feature [here](https://clickhouse.com/blog/clickpipes-amazon-kinesis). @@ -558,7 +558,7 @@ You can try the new Amazon Kinesis integration for ClickPipes in the cloud conso ClickPipes interface showing Amazon Kinesis integration configuration options -### Configurable Backups (Private Preview) {#configurable-backups-private-preview} +### Configurable backups (private preview) {#configurable-backups-private-preview} Backups are important for every database (no matter how reliable), and we've taken backups very seriously since day 1 of ClickHouse Cloud. This week, we launched Configurable Backups, which allows for much more flexibility for your service's backups. You can now control start time, retention, and frequency. This feature is available for **Production** and **Dedicated** services and is not available for **Development** services. As this feature is in private preview, please contact support@clickhouse.com to enable this for your service. Read more about configurable backups [here](https://clickhouse.com/blog/configurable-backups-in-clickhouse-cloud). @@ -622,7 +622,7 @@ Other changes: ## April 4, 2024 {#april-4-2024} -### Introducing the new ClickHouse Cloud Console {#introducing-the-new-clickhouse-cloud-console} +### Introducing the new ClickHouse Cloud console {#introducing-the-new-clickhouse-cloud-console} This release introduces a private preview for the new cloud console. @@ -632,7 +632,7 @@ Thousands of ClickHouse Cloud users execute billions of queries on our SQL conso Select customers will receive a preview of our new cloud console experience – a unified and immersive way to explore and manage your data in ClickHouse. Please reach out to us at support@clickhouse.com if you'd like priority access. -Animation showing the new ClickHouse Cloud Console interface with integrated SQL editor and management features +Animation showing the new ClickHouse Cloud console interface with integrated SQL editor and management features ## March 28, 2024 {#march-28-2024} @@ -666,10 +666,10 @@ This release introduces support for Microsoft Azure, Horizontal Scaling via API, ## March 14, 2024 {#march-14-2024} -This release makes available in early access the new Cloud Console experience, ClickPipes for bulk loading from S3 and GCS, and support for Avro format in ClickPipes for Kafka. It also upgrades the ClickHouse database version to 24.1, bringing support for new functions as well as performance and resource usage optimizations. +This release makes available in early access the new Cloud console experience, ClickPipes for bulk loading from S3 and GCS, and support for Avro format in ClickPipes for Kafka. It also upgrades the ClickHouse database version to 24.1, bringing support for new functions as well as performance and resource usage optimizations. ### Console changes {#console-changes-2} -- New Cloud Console experience is available in early access (please contact support if you're interested in participating). +- New Cloud console experience is available in early access (please contact support if you're interested in participating). - ClickPipes for bulk loading from S3 and GCS are available in early access (please contact support if you're interested in participating). - Support for Avro format in ClickPipes for Kafka is available in early access (please contact support if you're interested in participating). @@ -1025,14 +1025,15 @@ This release brings the public release of the ClickHouse Cloud Programmatic API ## May 11, 2023 {#may-11-2023} -This release brings the ~~public beta~~ (now GA, see June 20th entry above) of ClickHouse Cloud on GCP (see [blog](https://clickhouse.com/blog/clickhouse-cloud-on-gcp-available-in-public-beta) for details), extends administrators rights to grant terminate query permissions, and adds more visibility into the status of MFA users in the Cloud console. +This release brings the public beta (now GA, see June 20th entry above) of ClickHouse Cloud on GCP (see [blog](https://clickhouse.com/blog/clickhouse-cloud-on-gcp-available-in-public-beta) for details), extends administrators' rights to grant terminate query permissions, and adds more visibility into the status of MFA users in the Cloud console. -### ClickHouse Cloud on GCP ~~(Public Beta)~~ (now GA, see June 20th entry above) {#clickhouse-cloud-on-gcp-public-beta-now-ga-see-june-20th-entry-above} +### ClickHouse Cloud on GCP is now available in public beta (now GA, see June 20th entry above) {#clickhouse-cloud-on-gcp-is-now-available-in-public-beta-now-ga-see-june-20th-entry-above} - Launches a fully-managed separated storage and compute ClickHouse offering, running on top of Google Compute and Google Cloud Storage - Available in Iowa (us-central1), Netherlands (europe-west4), and Singapore (asia-southeast1) regions - Supports both Development and Production services in all three initial regions - Provides strong security by default: End-to-end encryption in transit, data-at-rest encryption, IP Allow Lists +{{ ... }} ### Integrations changes {#integrations-changes-18} - Golang client: Added proxy environment variables support - Grafana: Added the ability to specify ClickHouse custom settings and proxy environment variables in Grafana datasource setup @@ -1284,7 +1285,7 @@ This release introduces seamless logins for administrators to SQL console, impro ### Integrations changes {#integrations-changes-26} - The [Metabase plugin](/integrations/data-visualization/metabase-and-clickhouse.md) got a long-awaited v0.9.1 major update. Now it is compatible with the latest Metabase version and has been thoroughly tested against ClickHouse Cloud. -## December 6, 2022 - General Availability {#december-6-2022---general-availability} +## December 6, 2022 - General availability {#december-6-2022---general-availability} ClickHouse Cloud is now production-ready with SOC2 Type II compliance, uptime SLAs for production workloads, and public status page. This release includes major new capabilities like AWS Marketplace integration, SQL console - a data exploration workbench for ClickHouse users, and ClickHouse Academy - self-paced learning in ClickHouse Cloud. Learn more in this [blog](https://clickhouse.com/blog/clickhouse-cloud-generally-available). diff --git a/docs/cloud/reference/cloud-compatibility.md b/docs/cloud/reference/cloud-compatibility.md index 8250ca62d7f..62cdfcb8710 100644 --- a/docs/cloud/reference/cloud-compatibility.md +++ b/docs/cloud/reference/cloud-compatibility.md @@ -5,11 +5,11 @@ title: 'Cloud Compatibility' description: 'This guide provides an overview of what to expect functionally and operationally in ClickHouse Cloud.' --- -# ClickHouse Cloud — Compatibility Guide +# ClickHouse Cloud compatibility guide This guide provides an overview of what to expect functionally and operationally in ClickHouse Cloud. While ClickHouse Cloud is based on the open-source ClickHouse distribution, there may be some differences in architecture and implementation. You may find this blog on [how we built ClickHouse Cloud](https://clickhouse.com/blog/building-clickhouse-cloud-from-scratch-in-a-year) interesting and relevant to read as background. -## ClickHouse Cloud Architecture {#clickhouse-cloud-architecture} +## ClickHouse Cloud architecture {#clickhouse-cloud-architecture} ClickHouse Cloud significantly simplifies operational overhead and reduces the costs of running ClickHouse at scale. There is no need to size your deployment upfront, set up replication for high availability, manually shard your data, scale up your servers when your workload increases, or scale them down when you are not using them — we handle this for you. These benefits come as a result of architectural choices underlying ClickHouse Cloud: @@ -99,7 +99,7 @@ The [Kafka Table Engine](/integrations/data-ingestion/kafka/index.md) is not gen [Named collections](/operations/named-collections) are not currently supported in ClickHouse Cloud. -## Operational Defaults and Considerations {#operational-defaults-and-considerations} +## Operational defaults and considerations {#operational-defaults-and-considerations} The following are default settings for ClickHouse Cloud services. In some cases, these settings are fixed to ensure the correct operation of the service, and in others, they can be adjusted. ### Operational limits {#operational-limits} diff --git a/docs/cloud/reference/index.md b/docs/cloud/reference/index.md index a56fdbffe54..a4e19f99a7c 100644 --- a/docs/cloud/reference/index.md +++ b/docs/cloud/reference/index.md @@ -1,20 +1,20 @@ --- slug: /cloud/reference -keywords: ['Cloud', 'reference', 'architecture', 'SharedMergeTree', 'Compute-Compute Separation', 'Bring Your Own Cloud', 'Changelogs', 'Supported Cloud Regions', 'Cloud Compatibility'] +keywords: ['Cloud', 'reference', 'architecture', 'SharedMergeTree', 'Compute-compute Separation', 'Bring Your Own Cloud', 'Changelogs', 'Supported Cloud Regions', 'Cloud Compatibility'] title: 'Overview' hide_title: true description: 'Landing page for the Cloud reference section' --- -# Cloud Reference +# Cloud reference This section acts as a reference guide for some of the more technical details of ClickHouse Cloud and contains the following pages: | Page | Description | |-----------------------------------|-----------------------------------------------------------------------------------------------------------| | [Architecture](/cloud/reference/architecture) | Discusses the architecture of ClickHouse Cloud, including storage, compute, administration, and security. | -| [SharedMergeTree](/cloud/reference/shared-merge-tree) | Explainer on SharedMergeTree, the cloud-native replacement for the ReplicatedMergeTree and analogues. | -| [Warehouses](/cloud/reference/warehouses) | Explainer on what Warehouses and Compute-Compute separation are in ClickHouse Cloud. | +| [SharedMergeTree](/cloud/reference/shared-merge-tree) | Explainer on SharedMergeTree, the cloud-native replacement for the ReplicatedMergeTree and analogues. | +| [Warehouses](/cloud/reference/warehouses) | Explainer on what Warehouses and compute-compute separation are in ClickHouse Cloud. | | [BYOC (Bring Your Own Cloud)](/cloud/reference/byoc)| Explainer on the Bring Your Own Cloud (BYOC) service available with ClickHouse Cloud. | | [Changelogs](/cloud/reference/changelogs) | Cloud Changelogs and Release Notes. | | [Cloud Compatibility](/whats-new/cloud-compatibility) | A guide to what to expect functionally and operationally in ClickHouse Cloud. | diff --git a/docs/cloud/reference/shared-catalog.md b/docs/cloud/reference/shared-catalog.md index b70b82d6019..fa474c41b74 100644 --- a/docs/cloud/reference/shared-catalog.md +++ b/docs/cloud/reference/shared-catalog.md @@ -6,7 +6,7 @@ keywords: ['SharedCatalog', 'SharedDatabaseEngine'] description: 'Describes the Shared Catalog component and the Shared database engine in ClickHouse Cloud' --- -# Shared Catalog and Shared Database Engine {#shared-catalog-and-shared-database-engine} +# Shared catalog and shared database engine {#shared-catalog-and-shared-database-engine} **Available exclusively in ClickHouse Cloud (and first party partner cloud services)** @@ -21,7 +21,7 @@ It supports replication of the following database engines: - MySQL - DataLakeCatalog -## Architecture and Metadata Storage {#architecture-and-metadata-storage} +## Architecture and metadata storage {#architecture-and-metadata-storage} All metadata and DDL query history in Shared Catalog is stored centrally in ZooKeeper. Nothing is persisted on local disk. This architecture ensures: @@ -29,7 +29,7 @@ All metadata and DDL query history in Shared Catalog is stored centrally in ZooK - Statelessness of compute nodes - Fast, reliable replica bootstrapping -## Shared Database Engine {#shared-database-engine} +## Shared database engine {#shared-database-engine} The **Shared database engine** works in conjunction with Shared Catalog to manage databases whose tables use **stateless table engines** such as `SharedMergeTree`. These table engines do not write persistent state to disk and are compatible with dynamic compute environments. diff --git a/docs/cloud/reference/shared-merge-tree.md b/docs/cloud/reference/shared-merge-tree.md index 621dbabd92c..a170650af7d 100644 --- a/docs/cloud/reference/shared-merge-tree.md +++ b/docs/cloud/reference/shared-merge-tree.md @@ -11,7 +11,7 @@ import shared_merge_tree_2 from '@site/static/images/cloud/reference/shared-merg import Image from '@theme/IdealImage'; -# SharedMergeTree Table Engine +# SharedMergeTree table engine *\* Available exclusively in ClickHouse Cloud (and first party partner cloud services)* diff --git a/docs/cloud/reference/supported-regions.md b/docs/cloud/reference/supported-regions.md index bfbdd281214..e543fc0596c 100644 --- a/docs/cloud/reference/supported-regions.md +++ b/docs/cloud/reference/supported-regions.md @@ -8,9 +8,9 @@ slug: /cloud/reference/supported-regions import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge' -# Supported Cloud Regions +# Supported cloud regions -## AWS Regions {#aws-regions} +## AWS regions {#aws-regions} - ap-northeast-1 (Tokyo) - ap-south-1 (Mumbai) @@ -75,7 +75,7 @@ Key considerations for private regions: Additional requirements may apply for HIPAA compliance (including signing a BAA). Note that HIPAA is currently available only for Enterprise tier services -## HIPAA Compliant Regions {#hipaa-compliant-regions} +## HIPAA compliant regions {#hipaa-compliant-regions} @@ -88,7 +88,7 @@ Customers must sign a Business Associate Agreement (BAA) and request onboarding - GCP us-central1 (Iowa) - GCP us-east1 (South Carolina) -## PCI Compliant Regions {#pci-compliant-regions} +## PCI compliant regions {#pci-compliant-regions} diff --git a/docs/cloud/reference/warehouses.md b/docs/cloud/reference/warehouses.md index b9838d93931..e7e88b72ddf 100644 --- a/docs/cloud/reference/warehouses.md +++ b/docs/cloud/reference/warehouses.md @@ -16,7 +16,7 @@ import Image from '@theme/IdealImage'; # Warehouses -## What is Compute-Compute Separation? {#what-is-compute-compute-separation} +## What is compute-compute separation? {#what-is-compute-compute-separation} Compute-compute separation is available for Scale and Enterprise tiers. @@ -47,7 +47,7 @@ _Fig. 2 - compute separation in ClickHouse Cloud_ It is possible to create extra services that share the same data with your existing services, or create a completely new setup with multiple services sharing the same data. -## What is a Warehouse? {#what-is-a-warehouse} +## What is a warehouse? {#what-is-a-warehouse} In ClickHouse Cloud, a _warehouse_ is a set of services that share the same data. Each warehouse has a primary service (this service was created first) and secondary service(s). For example, in the screenshot below you can see a warehouse "DWH Prod" with two services: @@ -153,9 +153,9 @@ Compute prices are the same for all services in a warehouse (primary and seconda - As all services in a single warehouse share the same storage, backups are made only on the primary (initial) service. By this, the data for all services in a warehouse is backed up. - If you restore a backup from a primary service of a warehouse, it will be restored to a completely new service, not connected to the existing warehouse. You can then add more services to the new service immediately after the restore is finished. -## Using Warehouses {#using-warehouses} +## Using warehouses {#using-warehouses} -### Creating a Warehouse {#creating-a-warehouse} +### Creating a warehouse {#creating-a-warehouse} To create a warehouse, you need to create a second service that will share the data with an existing service. This can be done by clicking the plus sign on any of the existing services: @@ -167,7 +167,7 @@ _Fig. 7 - Click the plus sign to create a new service in a warehouse_ On the service creation screen, the original service will be selected in the dropdown as the source for the data of the new service. Once created, these two services will form a warehouse. -### Renaming a Warehouse {#renaming-a-warehouse} +### Renaming a warehouse {#renaming-a-warehouse} There are two ways to rename a warehouse: diff --git a/docs/cloud/security/accessing-s3-data-securely.md b/docs/cloud/security/accessing-s3-data-securely.md index f20673e1d08..7503cdccf9c 100644 --- a/docs/cloud/security/accessing-s3-data-securely.md +++ b/docs/cloud/security/accessing-s3-data-securely.md @@ -22,7 +22,7 @@ This approach allows customers to manage all access to their S3 buckets in a sin ## Setup {#setup} -### Obtaining the ClickHouse service IAM role Arn {#obtaining-the-clickhouse-service-iam-role-arn} +### Obtaining the ClickHouse service IAM role ARN {#obtaining-the-clickhouse-service-iam-role-arn} 1 - Login to your ClickHouse cloud account. @@ -71,7 +71,7 @@ This approach allows customers to manage all access to their S3 buckets in a sin CloudFormation stack output showing IAM Role ARN -#### Option 2: Manually create IAM role. {#option-2-manually-create-iam-role} +#### Option 2: Manually create IAM role {#option-2-manually-create-iam-role} 1 - Login to your AWS Account in the web browser with an IAM user that has permission to create & manage IAM role. @@ -128,7 +128,7 @@ IAM policy (Please replace `{BUCKET_NAME}` with your bucket name): 4 - Copy the new **IAM Role Arn** after creation. This is what needed to access your S3 bucket. -## Access your S3 bucket with the ClickHouseAccess Role {#access-your-s3-bucket-with-the-clickhouseaccess-role} +## Access your S3 bucket with the ClickHouseAccess role {#access-your-s3-bucket-with-the-clickhouseaccess-role} ClickHouse Cloud has a new feature that allows you to specify `extra_credentials` as part of the S3 table function. Below is an example of how to run a query using the newly created role copied from above. diff --git a/docs/cloud/security/aws-privatelink.md b/docs/cloud/security/aws-privatelink.md index d61dfa89984..2ea42361bfb 100644 --- a/docs/cloud/security/aws-privatelink.md +++ b/docs/cloud/security/aws-privatelink.md @@ -39,12 +39,12 @@ ClickHouse Cloud currently supports [cross-region PrivateLink](https://aws.amazo Find Terraform examples [here](https://github.com/ClickHouse/terraform-provider-clickhouse/tree/main/examples/). -## Attention {#attention} +## Attention to the following {#attention} ClickHouse attempts to group your services to reuse the same published [service endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html#endpoint-service-overview) within the AWS region. However, this grouping is not guaranteed, especially if you spread your services across multiple ClickHouse organizations. -If you already have PrivateLink configured for other services in your ClickHouse organization, you can often skip most of the steps because of that grouping and proceed directly to the final step: [Add ClickHouse "Endpoint ID" to ClickHouse service allow list](#add-endpoint-id-to-services-allow-list). +If you already have PrivateLink configured for other services in your ClickHouse organization, you can often skip most of the steps because of that grouping and proceed directly to the final step: Add ClickHouse "Endpoint ID" to ClickHouse service allow list. -## Prerequisites {#prerequisites} +## Prerequisites for this process {#prerequisites} Before you get started you will need: @@ -55,7 +55,7 @@ Before you get started you will need: Follow these steps to connect your ClickHouse Cloud services via AWS PrivateLink. -### Obtain Endpoint "Service name" {#obtain-endpoint-service-info} +### Obtain endpoint "Service name" {#obtain-endpoint-service-info} #### Option 1: ClickHouse Cloud console {#option-1-clickhouse-cloud-console} @@ -105,7 +105,7 @@ This command should return something like: Make a note of the `endpointServiceId` and `privateDnsHostname` [move onto next step](#create-aws-endpoint). -### Create AWS Endpoint {#create-aws-endpoint} +### Create AWS endpoint {#create-aws-endpoint} :::important This section covers ClickHouse-specific details for configuring ClickHouse via AWS PrivateLink. AWS-specific steps are provided as a reference to guide you on where to look, but they may change over time without notice from the AWS cloud provider. Please consider AWS configuration based on your specific use case. @@ -187,7 +187,7 @@ resource "aws_vpc_endpoint" "this" { After creating the VPC Endpoint, make a note of the `Endpoint ID` value; you'll need it for an upcoming step. -#### Set Private DNS Name for Endpoint {#set-private-dns-name-for-endpoint} +#### Set private DNS name for endpoint {#set-private-dns-name-for-endpoint} :::note There are various ways to configure DNS. Please set up DNS according to your specific use case. @@ -269,7 +269,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" \ Each service with Private Link enabled has a public and private endpoint. In order to connect using Private Link, you need to use a private endpoint which will be `privateDnsHostname`API or `DNS Name`console taken from [Obtain Endpoint "Service name"](#obtain-endpoint-service-info). -#### Getting Private DNS Hostname {#getting-private-dns-hostname} +#### Getting private DNS hostname {#getting-private-dns-hostname} ##### Option 1: ClickHouse Cloud console {#option-1-clickhouse-cloud-console-3} @@ -328,7 +328,7 @@ Please refer [here](#attention) - Most likely Endpoint ID was not added to service allow list, please visit [step](#add-endpoint-id-to-services-allow-list) -### Checking Endpoint filters {#checking-endpoint-filters} +### Checking endpoint filters {#checking-endpoint-filters} Set the following environment variables before running any commands: diff --git a/docs/cloud/security/azure-privatelink.md b/docs/cloud/security/azure-privatelink.md index e3d99975381..f213496c7a7 100644 --- a/docs/cloud/security/azure-privatelink.md +++ b/docs/cloud/security/azure-privatelink.md @@ -99,7 +99,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" "https://api.clickhouse.cloud Make a note of the `endpointServiceId`. You'll use it in the next step. -## Create a Private Endpoint in Azure {#create-private-endpoint-in-azure} +## Create a private endpoint in Azure {#create-private-endpoint-in-azure} :::important This section covers ClickHouse-specific details for configuring ClickHouse via Azure Private Link. Azure-specific steps are provided as a reference to guide you on where to look, but they may change over time without notice from the Azure cloud provider. Please consider Azure configuration based on your specific use case. @@ -111,7 +111,7 @@ For any issues related to Azure configuration tasks, contact Azure Support direc In this section, we're going to create a Private Endpoint in Azure. You can use either the Azure Portal or Terraform. -### Option 1: Using Azure Portal to create a Private Endpoint in Azure {#option-1-using-azure-portal-to-create-a-private-endpoint-in-azure} +### Option 1: Using Azure Portal to create a private endpoint in Azure {#option-1-using-azure-portal-to-create-a-private-endpoint-in-azure} In the Azure Portal, open **Private Link Center → Private Endpoints**. @@ -180,7 +180,7 @@ Open the network interface associated with Private Endpoint and copy the **Priva Private Endpoint IP Address -### Option 2: Using Terraform to create a Private Endpoint in Azure {#option-2-using-terraform-to-create-a-private-endpoint-in-azure} +### Option 2: Using Terraform to create a private endpoint in Azure {#option-2-using-terraform-to-create-a-private-endpoint-in-azure} Use the template below to use Terraform to create a Private Endpoint: @@ -199,7 +199,7 @@ resource "azurerm_private_endpoint" "example_clickhouse_cloud" { } ``` -### Obtaining the Private Endpoint `resourceGuid` {#obtaining-private-endpoint-resourceguid} +### Obtaining the private endpoint `resourceGuid` {#obtaining-private-endpoint-resourceguid} In order to use Private Link, you need to add the Private Endpoint connection GUID to your service allow list. @@ -427,7 +427,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" -X PATCH -H "Content-Type: ap Each service with Private Link enabled has a public and private endpoint. In order to connect using Private Link, you need to use a private endpoint which will be `privateDnsHostname`API or `DNS name`console taken from [Obtain Azure connection alias for Private Link](#obtain-azure-connection-alias-for-private-link). -### Obtaining the Private DNS Hostname {#obtaining-the-private-dns-hostname} +### Obtaining the private DNS hostname {#obtaining-the-private-dns-hostname} #### Option 1: ClickHouse Cloud console {#option-1-clickhouse-cloud-console-3} @@ -488,7 +488,7 @@ Address: 10.0.0.4 Most likely, the Private Endpoint GUID was not added to the service allow-list. Revisit the [_Add Private Endpoint GUID to your services allow-list_ step](#add-private-endpoint-guid-to-services-allow-list). -### Private Endpoint is in Pending state {#private-endpoint-is-in-pending-state} +### Private Endpoint is in pending state {#private-endpoint-is-in-pending-state} Most likely, the Private Endpoint GUID was not added to the service allow-list. Revisit the [_Add Private Endpoint GUID to your services allow-list_ step](#add-private-endpoint-guid-to-services-allow-list). @@ -523,7 +523,7 @@ Early data was not sent Verify return code: 0 (ok) ``` -### Checking Private Endpoint filters {#checking-private-endpoint-filters} +### Checking private endpoint filters {#checking-private-endpoint-filters} Set the following environment variables before running any commands: diff --git a/docs/cloud/security/cloud-access-management/cloud-authentication.md b/docs/cloud/security/cloud-access-management/cloud-authentication.md index 3c4b5b4f8d0..c0138a1d18e 100644 --- a/docs/cloud/security/cloud-access-management/cloud-authentication.md +++ b/docs/cloud/security/cloud-access-management/cloud-authentication.md @@ -8,11 +8,11 @@ description: 'This guide explains some good practices for configuring your authe import ScalePlanFeatureBadge from '@theme/badges/ScalePlanFeatureBadge' import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge' -# Cloud Authentication +# Cloud authentication ClickHouse Cloud provides a number of ways to authenticate. This guide explains some good practices for configuring your authentication. Always check with your security team when selecting authentication methods. -## Password Settings {#password-settings} +## Password settings {#password-settings} Minimum password settings for our console and services (databases) currently comply with [NIST 800-63B](https://pages.nist.gov/800-63-3/sp800-63b.html#sec4) Authenticator Assurance Level 1: - Minimum 12 characters @@ -22,15 +22,15 @@ Minimum password settings for our console and services (databases) currently com - 1 number - 1 special character -## Email + Password {#email--password} +## Email and password {#email--password} ClickHouse Cloud allows you to authenticate with an email address and password. When using this method the best way to protect your ClickHouse account use a strong password. There are many online resources to help you devise a password you can remember. Alternatively, you can use a random password generator and store your password in a password manager for increased security. -## SSO Using Google or Microsoft Social Authentication {#sso-using-google-or-microsoft-social-authentication} +## SSO using Google or Microsoft social authentication {#sso-using-google-or-microsoft-social-authentication} If your company uses Google Workspace or Microsoft 365, you can leverage your current single sign-on setup within ClickHouse Cloud. To do this, simply sign up using your company email address and invite other users using their company email. The effect is that your users must login using your company's login flows, whether via your identity provider or directly through Google or Microsoft authentication, before they can authenticate into ClickHouse Cloud. -## Multi-Factor Authentication {#multi-factor-authentication} +## Multi-factor authentication {#multi-factor-authentication} Users with email + password or social authentication can further secure their account using multi-factor authentication (MFA). To set up MFA: 1. Log into console.clickhouse.cloud @@ -118,7 +118,7 @@ Users with email + password or social authentication can further secure their ac ClickHouse Cloud also supports security assertion markup language (SAML) single sign on (SSO). For more information, see [SAML SSO Setup](/cloud/security/saml-setup). -## Database User ID + Password {#database-user-id--password} +## Database user ID and password {#database-user-id--password} Use the SHA256_hash method when [creating user accounts](/sql-reference/statements/create/user.md) to secure passwords. diff --git a/docs/cloud/security/cloud-access-management/index.md b/docs/cloud/security/cloud-access-management/index.md index 0aa6b9e3c4c..a61339a7258 100644 --- a/docs/cloud/security/cloud-access-management/index.md +++ b/docs/cloud/security/cloud-access-management/index.md @@ -1,6 +1,6 @@ --- slug: /cloud/security/cloud-access-management -title: 'Cloud Access Management' +title: 'Cloud access management' description: 'Cloud Access Management Table Of Contents' --- diff --git a/docs/cloud/security/cmek.md b/docs/cloud/security/cmek.md index a62982207da..f5be60ec17f 100644 --- a/docs/cloud/security/cmek.md +++ b/docs/cloud/security/cmek.md @@ -9,7 +9,7 @@ import Image from '@theme/IdealImage'; import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge' import cmek_performance from '@site/static/images/_snippets/cmek-performance.png'; -# ClickHouse Enhanced Encryption +# ClickHouse enhanced encryption @@ -89,15 +89,15 @@ Once a service is encrypted with TDE, customers may update the key to enable CME -## Key Rotation {#key-rotation} +## Key rotation {#key-rotation} Once you set up CMEK, rotate the key by following the procedures above for creating a new KMS key and granting permissions. Return to the service settings to paste the new ARN (AWS) or Key Resource Path (GCP) and save the settings. The service will restart to apply the new key. -## Backup and Restore {#backup-and-restore} +## Backup and restore {#backup-and-restore} Backups are encrypted using the same key as the associated service. When you restore an encrypted backup, it creates an encrypted instance that uses the same KMS key as the original instance. If needed, you can rotate the KMS key after restoration; see [Key Rotation](#key-rotation) for more details. -## KMS Key Poller {#kms-key-poller} +## KMS key poller {#kms-key-poller} When using CMEK, the validity of the provided KMS key is checked every 10 minutes. If access to the KMS key is invalid, the ClickHouse service will stop. To resume service, restore access to the KMS key by following the steps in this guide, and then restart the service. diff --git a/docs/cloud/security/common-access-management-queries.md b/docs/cloud/security/common-access-management-queries.md index ddaf0581275..24b98073491 100644 --- a/docs/cloud/security/common-access-management-queries.md +++ b/docs/cloud/security/common-access-management-queries.md @@ -7,7 +7,7 @@ description: 'This article shows the basics of defining SQL users and roles and import CommonUserRolesContent from '@site/docs/_snippets/_users-and-roles-common.md'; -# Common Access Management Queries +# Common access management queries :::tip Self-managed If you are working with self-managed ClickHouse please see [SQL users and roles](/guides/sre/user-management/index.md). diff --git a/docs/cloud/security/compliance-overview.md b/docs/cloud/security/compliance-overview.md index a2043f7624d..268f8993a68 100644 --- a/docs/cloud/security/compliance-overview.md +++ b/docs/cloud/security/compliance-overview.md @@ -8,10 +8,10 @@ description: 'This page describes the security and compliance measures implement import BetaBadge from '@theme/badges/BetaBadge'; import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge'; -# Security and Compliance Reports +# Security and compliance reports ClickHouse Cloud evaluates the security and compliance needs of our customers and is continuously expanding the program as additional reports are requested. For additional information or to download the reports visit our [Trust Center](https://trust.clickhouse.com). -### SOC 2 Type II (Since 2022) {#soc-2-type-ii-since-2022} +### SOC 2 Type II (since 2022) {#soc-2-type-ii-since-2022} System and Organization Controls (SOC) 2 is a report focusing on security, availability, confidentiality, processing integrity and privacy criteria contained in the Trust Services Criteria (TSC) as applied to an organization's systems and is designed to provide assurance about these controls to relying parties (our customers). ClickHouse works with independent external auditors to undergo an audit at least once per year addressing security, availability and processing integrity of our systems and confidentiality and privacy of the data processed by our systems. The report addresses both our ClickHouse Cloud and Bring Your Own Cloud (BYOC) offerings. @@ -19,11 +19,11 @@ System and Organization Controls (SOC) 2 is a report focusing on security, avail International Standards Organization (ISO) 27001 is an international standard for information security. It requires companies to implement an Information Security Management System (ISMS) that includes processes for managing risks, creating and communicating policies, implementing security controls, and monitoring to ensure components remain relevant and effective. ClickHouse conducts internal audits and works with independent external auditors to undergo audits and interim inspections for the 2 years between certificate issuance. -### U.S. DPF (Since 2024) {#us-dpf-since-2024} +### U.S. DPF (since 2024) {#us-dpf-since-2024} The U.S. Data Privacy Framework was developed to provide U.S. organizations with reliable mechanisms for personal data transfers to the United States from the European Union/ European Economic Area, the United Kingdom, and Switzerland that are consistent with EU, UK and Swiss law (https://dataprivacyframework.gov/Program-Overview). ClickHouse self-certified to the framework and is listed on the [Data Privacy Framework List](https://dataprivacyframework.gov/list). -### HIPAA (Since 2024) {#hipaa-since-2024} +### HIPAA (since 2024) {#hipaa-since-2024} @@ -31,7 +31,7 @@ Customers must complete a Business Associate Agreement (BAA) and contact sales o The Health Insurance Portability and Accountability Act (HIPAA) of 1996 is a United States based privacy law focused on management of protected health information (PHI). HIPAA has several requirements, including the [Security Rule](https://www.hhs.gov/hipaa/for-professionals/security/index.html), which is focused on protecting electronic personal health information (ePHI). ClickHouse has implemented administrative, physical and technical safeguards to ensure the confidentiality, integrity and security of ePHI stored in designated services. These activities are incorporated in our SOC 2 Type II report available for download in our [Trust Center](https://trust.clickhouse.com). -### PCI Service Provider (Since 2025) {#pci-service-provider-since-2025} +### PCI service provider (since 2025) {#pci-service-provider-since-2025} @@ -39,27 +39,27 @@ Customers must contact sales or support to onboard services to PCI compliant reg The [Payment Card Industry Data Security Standard (PCI DSS)](https://www.pcisecuritystandards.org/standards/pci-dss/) is a set of rules created by the PCI Security Standards Council to protect credit card payment data. ClickHouse has undergone an external audit with a Qualified Security Assessor (QSA) that resulted in a passing Report on Compliance (ROC) against PCI criteria relevant to storing credit card data. To download a copy of our Attestation on Compliance (AOC) and PCI responsibility overview, please visit our [Trust Center](https://trust.clickhouse.com). -# Privacy Compliance +# Privacy compliance In addition to the items above, ClickHouse maintains internal compliance programs addressing the General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA) and other relevant privacy frameworks. Details on personal data that ClickHouse collects, how it is used, how it is protected and other privacy related information can be found in the following locations. -### Legal Documents {#legal-documents} +### Legal documents {#legal-documents} - [Privacy Policy](https://clickhouse.com/legal/privacy-policy) - [Cookie Policy](https://clickhouse.com/legal/cookie-policy) - [Data Privacy Framework Notification](https://clickhouse.com/legal/data-privacy-framework) - [Data Processing Addendum (DPA)](https://clickhouse.com/legal/agreements/data-processing-addendum) -### Processing Locations {#processing-locations} +### Processing locations {#processing-locations} - [Sub-Processors and Affiliates](https://clickhouse.com/legal/agreements/subprocessors) - [Data Processing Locations](https://trust.clickhouse.com) -### Additional Procedures {#additional-procedures} +### Additional procedures {#additional-procedures} - [Personal Data Access](/cloud/security/personal-data-access) - [Delete Account](/cloud/manage/close_account) -# Payment Compliance +# Payment compliance ClickHouse provides a secure method to pay by credit card that is compliant with [PCI SAQ A v4.0](https://www.pcisecuritystandards.org/document_library/). diff --git a/docs/cloud/security/gcp-private-service-connect.md b/docs/cloud/security/gcp-private-service-connect.md index 45135092a10..1113b3346c9 100644 --- a/docs/cloud/security/gcp-private-service-connect.md +++ b/docs/cloud/security/gcp-private-service-connect.md @@ -23,7 +23,7 @@ import gcp_privatelink_pe_dns from '@site/static/images/cloud/security/gcp-priva -Private Service Connect(PSC) is a Google Cloud networking feature that allows consumers to access managed services privately inside their virtual private cloud (VPC) network. Similarly, it allows managed service producers to host these services in their own separate VPC networks and offer a private connection to their consumers. +Private Service Connect (PSC) is a Google Cloud networking feature that allows consumers to access managed services privately inside their virtual private cloud (VPC) network. Similarly, it allows managed service producers to host these services in their own separate VPC networks and offer a private connection to their consumers. Service producers publish their applications to consumers by creating Private Service Connect services. Service consumers access those Private Service Connect services directly through one of these Private Service Connect types. @@ -65,7 +65,7 @@ Code examples are provided below to show how to set up Private Service Connect w - GCP VPC in customer GCP project: `default` ::: -You'll need to retrieve information about your ClickHouse Cloud service. You can do this either via the ClickHouse Cloud Console or the ClickHouse API. If you are going to use the ClickHouse API, please set the following environment variables before proceeding: +You'll need to retrieve information about your ClickHouse Cloud service. You can do this either via the ClickHouse Cloud console or the ClickHouse API. If you are going to use the ClickHouse API, please set the following environment variables before proceeding: ```shell REGION= @@ -129,7 +129,7 @@ For any issues related to GCP configuration tasks, contact GCP Support directly. In this section, we're going to create a service endpoint. -### Adding a Private Service Connection {#adding-a-private-service-connection} +### Adding a private service connection {#adding-a-private-service-connection} First up, we're going to create a Private Service Connection. @@ -137,7 +137,7 @@ First up, we're going to create a Private Service Connection. In the Google Cloud console, navigate to **Network services -> Private Service Connect**. -Open Private Service Connect in Google Cloud Console +Open Private Service Connect in Google Cloud console Open the Private Service Connect creation dialog by clicking on the **Connect Endpoint** button. @@ -209,7 +209,7 @@ output "psc_connection_id" { use `endpointServiceId`API or `Service name`console from [Obtain GCP service attachment for Private Service Connect](#obtain-gcp-service-attachment-and-dns-name-for-private-service-connect) step ::: -## Set Private DNS Name for Endpoint {#setting-up-dns} +## Set private DNS name for endpoint {#set-private-dns-name-for-endpoint} :::note There are various ways to configure DNS. Please set up DNS according to your specific use case. @@ -336,7 +336,7 @@ curl --silent --user "${KEY_ID:?}:${KEY_SECRET:?}" -X PATCH -H "Content-Type: ap Each service with Private Link enabled has a public and private endpoint. In order to connect using Private Link, you need to use a private endpoint which will be `privateDnsHostname` taken from [Obtain GCP service attachment for Private Service Connect](#obtain-gcp-service-attachment-and-dns-name-for-private-service-connect). -### Getting Private DNS Hostname {#getting-private-dns-hostname} +### Getting private DNS hostname {#getting-private-dns-hostname} #### Option 1: ClickHouse Cloud console {#option-1-clickhouse-cloud-console-3} @@ -412,7 +412,7 @@ Early data was not sent Verify return code: 0 (ok) ``` -### Checking Endpoint filters {#checking-endpoint-filters} +### Checking endpoint filters {#checking-endpoint-filters} #### REST API {#rest-api} diff --git a/docs/cloud/security/index.md b/docs/cloud/security/index.md index 481708bf471..b6a2d56ab1b 100644 --- a/docs/cloud/security/index.md +++ b/docs/cloud/security/index.md @@ -6,7 +6,7 @@ hide_title: true description: 'Landing page for ClickHouse Cloud Security' --- -# ClickHouse Cloud Security +# ClickHouse Cloud security This section delves into security in ClickHouse Cloud and contains the following pages: diff --git a/docs/cloud/security/inviting-new-users.md b/docs/cloud/security/inviting-new-users.md index 8af59db9301..38dc099a6cb 100644 --- a/docs/cloud/security/inviting-new-users.md +++ b/docs/cloud/security/inviting-new-users.md @@ -1,7 +1,7 @@ --- sidebar_label: 'Inviting new users' slug: /cloud/security/inviting-new-users -title: 'Inviting New Users' +title: 'Inviting new users' description: 'This page describes how administrators can invite new users to their organisation and assign roles to them' --- diff --git a/docs/cloud/security/personal-data-access.md b/docs/cloud/security/personal-data-access.md index 8682c52eb60..bcf4514b301 100644 --- a/docs/cloud/security/personal-data-access.md +++ b/docs/cloud/security/personal-data-access.md @@ -20,7 +20,7 @@ Depending on where you are located, applicable law may also provide you addition Please review ClickHouse's Privacy Policy for details on personal data that ClickHouse collects and how it may be used. -## Self Service {#self-service} +## Self service {#self-service} By default, ClickHouse empowers users to view their personal data directly from the ClickHouse console. @@ -38,7 +38,7 @@ Below is a summary of the data ClickHouse collects during account setup and serv Note: URLs with `OrgID` need to be updated to reflect the `OrgID` for your specific account. -### Current Customers {#current-customers} +### Current customers {#current-customers} If you have an account with us and the self-service option has not resolved your personal data issue, you can submit a Data Subject Access Request under the Privacy Policy. To do so, log into your ClickHouse account and open a [support case](https://console.clickhouse.cloud/support). This helps us verify your identity and streamline the process to address your request. @@ -51,11 +51,11 @@ Please be sure to include the following details in your support case: Support Case Form in ClickHouse Cloud -### Individuals Without an Account {#individuals-without-an-account} +### Individuals without an account {#individuals-without-an-account} If you do not have an account with us and the self-service option above has not resolved your personal-data issue, and you wish to make a Data Subject Access Request pursuant to the Privacy Policy, you may submit these requests by email to [privacy@clickhouse.com](mailto:privacy@clickhouse.com). -## Identity Verification {#identity-verification} +## Identity verification {#identity-verification} Should you submit a Data Subject Access Request through email, we may request specific information from you to help us confirm your identity and process your request. Applicable law may require or permit us to decline your request. If we decline your request, we will tell you why, subject to legal restrictions. diff --git a/docs/cloud/security/privacy-compliance-overview.md b/docs/cloud/security/privacy-compliance-overview.md index e81afc20fd2..e47d422c0a8 100644 --- a/docs/cloud/security/privacy-compliance-overview.md +++ b/docs/cloud/security/privacy-compliance-overview.md @@ -5,7 +5,7 @@ title: 'Privacy and Compliance' description: 'Landing page for privacy and compliance' --- -# Privacy and Compliance +# Privacy and compliance This section contains the following pages: diff --git a/docs/cloud/security/private-link-overview.md b/docs/cloud/security/private-link-overview.md index 9e3ada28a27..183362a8e58 100644 --- a/docs/cloud/security/private-link-overview.md +++ b/docs/cloud/security/private-link-overview.md @@ -1,14 +1,14 @@ --- -sidebar_label: 'Private Link Overview' +sidebar_label: 'Private link overview' slug: /cloud/security/private-link-overview -title: 'Private Link Overview' -description: 'Landing page for Private Link' +title: 'Private link overview' +description: 'Landing page for private link' --- -# Private Link Overview +# Private link overview ClickHouse Cloud provides the ability to connect your services to your cloud virtual network. Refer to the guides below for your provider: -- [AWS Private Link](/cloud/security/aws-privatelink.md) -- [GCP Private Service Connect](/cloud/security/gcp-private-service-connect.md) -- [Azure Private Link](/cloud/security/azure-privatelink.md) +- [AWS private Link](/cloud/security/aws-privatelink.md) +- [GCP private service connect](/cloud/security/gcp-private-service-connect.md) +- [Azure private link](/cloud/security/azure-privatelink.md) diff --git a/docs/cloud/security/saml-sso-setup.md b/docs/cloud/security/saml-sso-setup.md index 4d269aa8d40..89ad1b146fd 100644 --- a/docs/cloud/security/saml-sso-setup.md +++ b/docs/cloud/security/saml-sso-setup.md @@ -13,13 +13,13 @@ import samlAzureApp from '@site/static/images/cloud/security/saml-azure-app.png' import samlAzureClaims from '@site/static/images/cloud/security/saml-azure-claims.png'; import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge' -# SAML SSO Setup +# SAML SSO setup ClickHouse Cloud supports single-sign on (SSO) via security assertion markup language (SAML). This enables you to sign in securely to your ClickHouse Cloud organization by authenticating with your identity provider (IdP). -We currently support service provider initiated SSO, multiple organizations using separate connections, and just-in-time provisioning. We do not yet support a system for cross-domain identity management (SCIM) or attribute mapping. +We currently support service provider-initiated SSO SSO, multiple organizations using separate connections, and just-in-time provisioning. We do not yet support a system for cross-domain identity management (SCIM) or attribute mapping. ## Before you begin {#before-you-begin} @@ -27,7 +27,7 @@ You will need Admin permissions in your IdP and the **Admin** role in your Click We recommend setting up a **direct link to your organization** in addition to your SAML connection to simplify the login process. Each IdP handles this differently. Read on for how to do this for your IdP. -## How to Configure Your IdP {#how-to-configure-your-idp} +## How to configure your IdP {#how-to-configure-your-idp} ### Steps {#steps} @@ -51,7 +51,7 @@ We recommend setting up a **direct link to your organization** in addition to yo
Configure your SAML integration - ClickHouse uses service provider initiated SAML connections. This means you can log in via https://console.clickhouse.cloud or via a direct link. We do not currently support identity provider initiated connections. Basic SAML configurations include the following: + ClickHouse uses service provider-initiated SAML connections. This means you can log in via https://console.clickhouse.cloud or via a direct link. We do not currently support identity provider initiated connections. Basic SAML configurations include the following: - SSO URL or ACS URL: `https://auth.clickhouse.cloud/login/callback?connection={organizationid}` @@ -332,33 +332,33 @@ Azure (Microsoft) SAML may also be referred to as Azure Active Directory (AD) or
-## How It Works {#how-it-works} +## How it works {#how-it-works} -### Service Provider Initiated SSO {#service-provider-initiated-sso} +### Service provider-initiated SSO {#service-provider-initiated-sso} -We only utilize service provider initiated SSO. This means users go to `https://console.clickhouse.cloud` and enter their email address to be redirected to the IdP for authentication. Users already authenticated via your IdP can use the direct link to automatically log in to your organization without entering their email address at the login page. +We only utilize service provider-initiated SSO. This means users go to `https://console.clickhouse.cloud` and enter their email address to be redirected to the IdP for authentication. Users already authenticated via your IdP can use the direct link to automatically log in to your organization without entering their email address at the login page. -### Assigning User Roles {#assigning-user-roles} +### Assigning user roles {#assigning-user-roles} Users will appear in your ClickHouse Cloud console after they are assigned to your IdP application and log in for the first time. At least one SSO user should be assigned the Admin role in your organization. Use social login or `https://console.clickhouse.cloud/?with=email` to log in with your original authentication method to update your SSO role. -### Removing Non-SSO Users {#removing-non-sso-users} +### Removing non-SSO users {#removing-non-sso-users} Once you have SSO users set up and have assigned at least one user the Admin role, the Admin can remove users using other methods (e.g. social authentication or user ID + password). Google authentication will continue to work after SSO is set up. User ID + password users will be automatically redirected to SSO based on their email domain unless users use `https://console.clickhouse.cloud/?with=email`. -### Managing Users {#managing-users} +### Managing users {#managing-users} ClickHouse Cloud currently implements SAML for SSO. We have not yet implemented SCIM to manage users. This means SSO users must be assigned to the application in your IdP to access your ClickHouse Cloud organization. Users must log in to ClickHouse Cloud once to appear in the **Users** area in the organization. When users are removed in your IdP, they will not be able to log in to ClickHouse Cloud using SSO. However, the SSO user will still show in your organization until and administrator manually removes the user. -### Multi-Org SSO {#multi-org-sso} +### Multi-org SSO {#multi-org-sso} ClickHouse Cloud supports multi-organization SSO by providing a separate connection for each organization. Use the direct link (`https://console.clickhouse.cloud/?connection={organizationid}`) to log in to each respective organization. Be sure to log out of one organization before logging into another. -## Additional Information {#additional-information} +## Additional information {#additional-information} Security is our top priority when it comes to authentication. For this reason, we made a few decisions when implementing SSO that we need you to know. -- **We only process service provider initiated authentication flows.** Users must navigate to `https://console.clickhouse.cloud` and enter an email address to be redirected to your identity provider. Instructions to add a bookmark application or shortcut are provided for your convenience so your users don't need to remember the URL. +- **We only process service provider-initiated authentication flows.** Users must navigate to `https://console.clickhouse.cloud` and enter an email address to be redirected to your identity provider. Instructions to add a bookmark application or shortcut are provided for your convenience so your users don't need to remember the URL. - **All users assigned to your app via your IdP must have the same email domain.** If you have vendors, contractors or consultants you would like to have access to your ClickHouse account, they must have an email address with the same domain (e.g. user@domain.com) as your employees. diff --git a/docs/cloud/security/setting-ip-filters.md b/docs/cloud/security/setting-ip-filters.md index 0158a4d765b..5b698b4f749 100644 --- a/docs/cloud/security/setting-ip-filters.md +++ b/docs/cloud/security/setting-ip-filters.md @@ -9,7 +9,7 @@ import Image from '@theme/IdealImage'; import ip_filtering_after_provisioning from '@site/static/images/cloud/security/ip-filtering-after-provisioning.png'; import ip_filter_add_single_ip from '@site/static/images/cloud/security/ip-filter-add-single-ip.png'; -## Setting IP Filters {#setting-ip-filters} +## Setting IP filters {#setting-ip-filters} IP access lists filter traffic to ClickHouse services or API keys by specifying which source addresses are permitted to connect. These lists are configurable for each service and each API key. Lists can be configured during service or API key creation, or afterward. diff --git a/docs/concepts/olap.md b/docs/concepts/olap.md index f0033debb20..253bd85310b 100644 --- a/docs/concepts/olap.md +++ b/docs/concepts/olap.md @@ -19,7 +19,7 @@ keywords: ['OLAP'] **Online** …in real-time. -## OLAP from the Business Perspective {#olap-from-the-business-perspective} +## OLAP from the business perspective {#olap-from-the-business-perspective} In recent years business people started to realize the value of data. Companies who make their decisions blindly more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be even remotely useful for making business decisions, and imposes on them a need for mechanisms which allow them to analyze this data in a timely manner. Here's where OLAP database management systems (DBMS) come in. @@ -27,7 +27,7 @@ In a business sense, OLAP allows companies to continuously plan, analyze, and re ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and so an in-house data warehouse scenario is also viable. -## OLAP from the Technical Perspective {#olap-from-the-technical-perspective} +## OLAP from the technical perspective {#olap-from-the-technical-perspective} All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). The former focuses on building reports, each based on large volumes of historical data, but by doing it less frequently. The latter usually handles a continuous stream of transactions, constantly modifying the current state of data. diff --git a/docs/concepts/why-clickhouse-is-so-fast.md b/docs/concepts/why-clickhouse-is-so-fast.md index 272865cc4d4..d3fb45a5bfe 100644 --- a/docs/concepts/why-clickhouse-is-so-fast.md +++ b/docs/concepts/why-clickhouse-is-so-fast.md @@ -15,7 +15,7 @@ We will next explain in more detail what makes ClickHouse so fast, especially co From an architectural perspective, databases consist (at least) of a storage layer and a query processing layer. While the storage layer is responsible for saving, loading, and maintaining the table data, the query processing layer executes user queries. Compared to other databases, ClickHouse provides innovations in both layers that enable extremely fast inserts and Select queries. -## Storage Layer: Concurrent inserts are isolated from each other {#storage-layer-concurrent-inserts-are-isolated-from-each-other} +## Storage layer: concurrent inserts are isolated from each other {#storage-layer-concurrent-inserts-are-isolated-from-each-other} @@ -29,7 +29,7 @@ This approach has several advantages: All data processing can be [offloaded to b 🤿 Deep dive into this in the [On-Disk Format](/docs/academic_overview#3-1-on-disk-format) section of the web version of our VLDB 2024 paper. -## Storage Layer: Concurrent inserts and selects are isolated {#storage-layer-concurrent-inserts-and-selects-are-isolated} +## Storage layer: concurrent inserts and selects are isolated {#storage-layer-concurrent-inserts-and-selects-are-isolated} @@ -37,7 +37,7 @@ Inserts are fully isolated from SELECT queries, and merging inserted data parts 🤿 Deep dive into this in the [Storage Layer](/docs/academic_overview#3-storage-layer) section of the web version of our VLDB 2024 paper. -## Storage Layer: Merge-time computation {#storage-layer-merge-time-computation} +## Storage layer: merge-time computation {#storage-layer-merge-time-computation} @@ -57,7 +57,7 @@ On the other hand, the majority of the runtime of merges is consumed by loading 🤿 Deep dive into this in the [Merge-time Data Transformation](/docs/academic_overview#3-3-merge-time-data-transformation) section of the web version of our VLDB 2024 paper. -## Storage Layer: Data pruning {#storage-layer-data-pruning} +## Storage layer: data pruning {#storage-layer-data-pruning} @@ -73,7 +73,7 @@ All three techniques aim to skip as many rows during full-column reads as possib 🤿 Deep dive into this in the [Data Pruning](/docs/academic_overview#3-2-data-pruning) section of the web version of our VLDB 2024 paper. -## Storage Layer: Data compression {#storage-layer-data-compression} +## Storage layer: data compression {#storage-layer-data-compression} diff --git a/docs/data-modeling/projections.md b/docs/data-modeling/projections.md index 44231b1795a..1f45aa49917 100644 --- a/docs/data-modeling/projections.md +++ b/docs/data-modeling/projections.md @@ -21,7 +21,7 @@ queries by creating a reordering of data by attributes of interest. This can be: 1. A complete reordering 2. A subset of the original table with a different order -3. A precomputed aggregation (similar to a Materialized View) but with an ordering +3. A precomputed aggregation (similar to a materialized view) but with an ordering aligned to the aggregation.
@@ -70,8 +70,8 @@ users should be aware of and thus should be deployed sparingly. - Projections don't currently support `optimize_read_in_order` for the (hidden) target table. - Lightweight updates and deletes are not supported for tables with projections. -- Materialized Views can be chained: the target table of one Materialized View - can be the source table of another Materialized View, and so on. This is not +- Materialized Views can be chained: the target table of one materialized view + can be the source table of another materialized view, and so on. This is not possible with projections. - Projections don't support joins, but Materialized Views do. - Projections don't support filters (`WHERE` clause), but Materialized Views do. diff --git a/docs/data-modeling/schema-design.md b/docs/data-modeling/schema-design.md index 0027e84390a..51e8dbee58f 100644 --- a/docs/data-modeling/schema-design.md +++ b/docs/data-modeling/schema-design.md @@ -139,7 +139,7 @@ Compression in ClickHouse will be impacted by 3 main factors: the ordering key, The largest initial improvement in compression and query performance can be obtained through a simple process of type optimization. A few simple rules can be applied to optimize the schema: - **Use strict types** - Our initial schema used Strings for many columns which are clearly numerics. Usage of the correct types will ensure the expected semantics when filtering and aggregating. The same applies to date types, which have been correctly provided in the Parquet files. -- **Avoid Nullable Columns** - By default the above columns have been assumed to be Null. The Nullable type allows queries to determine the difference between an empty and Null value. This creates a separate column of UInt8 type. This additional column has to be processed every time a user works with a nullable column. This leads to additional storage space used and almost always negatively affects query performance. Only use Nullable if there is a difference between the default empty value for a type and Null. For example, a value of 0 for empty values in the `ViewCount` column will likely be sufficient for most queries and not impact results. If empty values should be treated differently, they can often also be excluded from queries with a filter. +- **Avoid nullable Columns** - By default the above columns have been assumed to be Null. The Nullable type allows queries to determine the difference between an empty and Null value. This creates a separate column of UInt8 type. This additional column has to be processed every time a user works with a nullable column. This leads to additional storage space used and almost always negatively affects query performance. Only use Nullable if there is a difference between the default empty value for a type and Null. For example, a value of 0 for empty values in the `ViewCount` column will likely be sufficient for most queries and not impact results. If empty values should be treated differently, they can often also be excluded from queries with a filter. Use the minimal precision for numeric types - ClickHouse has a number of numeric types designed for different numeric ranges and precision. Always aim to minimize the number of bits used to represent a column. As well as integers of different size e.g. Int16, ClickHouse offers unsigned variants whose minimum value is 0. These can allow fewer bits to be used for a column e.g. UInt16 has a maximum value of 65535, twice that of an Int16. Prefer these types over larger signed variants if possible. - **Minimal precision for date types** - ClickHouse supports a number of date and datetime types. Date and Date32 can be used for storing pure dates, with the latter supporting a larger date range at the expense of more bits. DateTime and DateTime64 provide support for date times. DateTime is limited to second granularity and uses 32 bits. DateTime64, as the name suggests, uses 64 bits but provides support up to nanosecond granularity. As ever, choose the more coarse version acceptable for queries, minimizing the number of bits needed. - **Use LowCardinality** - Numbers, strings, Date or DateTime columns with a low number of unique values can potentially be encoded using the LowCardinality type. This dictionary encodes values, reducing the size on disk. Consider this for columns with less than 10k unique values. diff --git a/docs/deployment-guides/horizontal-scaling.md b/docs/deployment-guides/horizontal-scaling.md index 7215e598799..1aac487a345 100644 --- a/docs/deployment-guides/horizontal-scaling.md +++ b/docs/deployment-guides/horizontal-scaling.md @@ -19,7 +19,7 @@ This example architecture is designed to provide scalability. It includes three ## Environment {#environment} -### Architecture Diagram {#architecture-diagram} +### Architecture diagram {#architecture-diagram} Architecture diagram for 2 shards and 1 replica @@ -41,7 +41,7 @@ Install Clickhouse on three servers following the [instructions for your archive -## chnode1 configuration {#chnode1-configuration} +## Chnode1 configuration {#chnode1-configuration} For `chnode1`, there are five configuration files. You may choose to combine these files into a single file, but for clarity in the documentation it may be simpler to look at them separately. As you read through the configuration files, you will see that most of the configuration is the same between `chnode1` and `chnode2`; the differences will be highlighted. @@ -183,7 +183,7 @@ Up above a few files ClickHouse Keeper was configured. This configuration file ``` -## chnode2 configuration {#chnode2-configuration} +## Chnode2 configuration {#chnode2-configuration} As the configuration is very similar on `chnode1` and `chnode2`, only the differences will be pointed out here. @@ -309,7 +309,7 @@ The macros configuration has one of the differences between `chnode1` and `chnod ``` -## chnode3 configuration {#chnode3-configuration} +## Chnode3 configuration {#chnode3-configuration} As `chnode3` is not storing data and is only used for ClickHouse Keeper to provide the third node in the quorum, `chnode3` has only two configuration files, one to configure the network and logging, and one to configure ClickHouse Keeper. @@ -481,7 +481,7 @@ SELECT * FROM db1.table1_dist; ``` -## More information about: {#more-information-about} +## More information about {#more-information-about} - The [Distributed Table Engine](/engines/table-engines/special/distributed.md) - [ClickHouse Keeper](/guides/sre/keeper/index.md) diff --git a/docs/deployment-guides/index.md b/docs/deployment-guides/index.md index 7543137f33b..81f83e1c016 100644 --- a/docs/deployment-guides/index.md +++ b/docs/deployment-guides/index.md @@ -4,7 +4,7 @@ title: 'Deployment Guides Overview' description: 'Landing page for the deployment and scaling section' --- -# Deployment and Scaling +# Deployment and scaling This section covers the following topics: @@ -13,4 +13,4 @@ This section covers the following topics: | [Introduction](/architecture/introduction) | | [Scaling Out](/architecture/horizontal-scaling) | | [Replication for fault tolerance](/architecture/replication) | -| [Cluster Deployment](/architecture/cluster-deployment) | +| [Cluster deployment](/architecture/cluster-deployment) | diff --git a/docs/deployment-guides/replicated.md b/docs/deployment-guides/replicated.md index 2fd2463ea28..405f8b7155c 100644 --- a/docs/deployment-guides/replicated.md +++ b/docs/deployment-guides/replicated.md @@ -20,7 +20,7 @@ In this architecture, there are five servers configured. Two are used to host co ## Environment {#environment} -### Architecture Diagram {#architecture-diagram} +### Architecture diagram {#architecture-diagram} Architecture diagram for 1 shard and 2 replicas with ReplicatedMergeTree @@ -46,7 +46,7 @@ Install ClickHouse Keeper on the three servers `clickhouse-keeper-01`, `clickhou -## clickhouse-01 configuration {#clickhouse-01-configuration} +## Clickhouse-01 configuration {#clickhouse-01-configuration} For clickhouse-01 there are five configuration files. You may choose to combine these files into a single file, but for clarity in the documentation it may be simpler to look at them separately. As you read through the configuration files you will see that most of the configuration is the same between clickhouse-01 and clickhouse-02; the differences will be highlighted. @@ -143,7 +143,7 @@ This configuration file `use-keeper.xml` is configuring ClickHouse Server to use ``` -## clickhouse-02 configuration {#clickhouse-02-configuration} +## Clickhouse-02 configuration {#clickhouse-02-configuration} As the configuration is very similar on clickhouse-01 and clickhouse-02 only the differences will be pointed out here. @@ -232,7 +232,7 @@ This file is the same on both clickhouse-01 and clickhouse-02. ``` -## clickhouse-keeper-01 configuration {#clickhouse-keeper-01-configuration} +## Clickhouse-keeper-01 configuration {#clickhouse-keeper-01-configuration} @@ -286,7 +286,7 @@ If for any reason a Keeper node is replaced or rebuilt, do not reuse an existing ``` -## clickhouse-keeper-02 configuration {#clickhouse-keeper-02-configuration} +## Clickhouse-keeper-02 configuration {#clickhouse-keeper-02-configuration} There is only one line difference between `clickhouse-keeper-01` and `clickhouse-keeper-02`. `server_id` is set to `2` on this node. @@ -334,7 +334,7 @@ There is only one line difference between `clickhouse-keeper-01` and `clickhouse ``` -## clickhouse-keeper-03 configuration {#clickhouse-keeper-03-configuration} +## Clickhouse-keeper-03 configuration {#clickhouse-keeper-03-configuration} There is only one line difference between `clickhouse-keeper-01` and `clickhouse-keeper-03`. `server_id` is set to `3` on this node. diff --git a/docs/dictionary/index.md b/docs/dictionary/index.md index 7eb5e1e8655..5bf2bb9bb5b 100644 --- a/docs/dictionary/index.md +++ b/docs/dictionary/index.md @@ -314,7 +314,7 @@ LIMIT 4 Peak memory usage: 666.82 MiB. ``` -## Advanced Dictionary Topics {#advanced-dictionary-topics} +## Advanced dictionary topics {#advanced-dictionary-topics} ### Choosing the Dictionary `LAYOUT` {#choosing-the-dictionary-layout} diff --git a/docs/faq/general/columnar-database.md b/docs/faq/general/columnar-database.md index 8e5202d499f..41f8c69497f 100644 --- a/docs/faq/general/columnar-database.md +++ b/docs/faq/general/columnar-database.md @@ -10,7 +10,7 @@ import Image from '@theme/IdealImage'; import RowOriented from '@site/static/images/row-oriented.gif'; import ColumnOriented from '@site/static/images/column-oriented.gif'; -# What Is a Columnar Database? {#what-is-a-columnar-database} +# What is a columnar database? {#what-is-a-columnar-database} A columnar database stores the data of each column independently. This allows reading data from disk only for those columns that are used in any given query. The cost is that operations that affect whole rows become proportionally more expensive. The synonym for a columnar database is a column-oriented database management system. ClickHouse is a typical example of such a system. diff --git a/docs/faq/general/dbms-naming.md b/docs/faq/general/dbms-naming.md index 8376ba4cfdf..5c54e43fe07 100644 --- a/docs/faq/general/dbms-naming.md +++ b/docs/faq/general/dbms-naming.md @@ -6,7 +6,7 @@ slug: /faq/general/dbms-naming description: 'Learn about What does "ClickHouse" mean?' --- -# What Does "ClickHouse" Mean? {#what-does-clickhouse-mean} +# What does "ClickHouse" mean? {#what-does-clickhouse-mean} It's a combination of "**Click**stream" and "Data ware**House**". It comes from the original use case at Yandex.Metrica, where ClickHouse was supposed to keep records of all clicks by people from all over the Internet, and it still does the job. You can read more about this use case on [ClickHouse history](../../about-us/history.md) page. diff --git a/docs/faq/general/index.md b/docs/faq/general/index.md index d735adcdfae..abe0a8decd2 100644 --- a/docs/faq/general/index.md +++ b/docs/faq/general/index.md @@ -7,7 +7,7 @@ title: 'General Questions About ClickHouse' description: 'Index page listing general questions about ClickHouse' --- -# General Questions About ClickHouse +# General questions about ClickHouse - [What is ClickHouse?](../../intro.md) - [Why is ClickHouse so fast?](../../concepts/why-clickhouse-is-so-fast.md) diff --git a/docs/faq/general/mapreduce.md b/docs/faq/general/mapreduce.md index 369632883db..b056ea32858 100644 --- a/docs/faq/general/mapreduce.md +++ b/docs/faq/general/mapreduce.md @@ -7,7 +7,7 @@ description: 'This page explains why you would use ClickHouse over MapReduce' keywords: ['MapReduce'] --- -# Why Not Use Something Like MapReduce? {#why-not-use-something-like-mapreduce} +# Why not use something like MapReduce? {#why-not-use-something-like-mapreduce} We can refer to systems like MapReduce as distributed computing systems in which the reduce operation is based on distributed sorting. The most common open-source solution in this class is [Apache Hadoop](http://hadoop.apache.org). diff --git a/docs/faq/general/ne-tormozit.md b/docs/faq/general/ne-tormozit.md index b7eb9dbda62..ec09494e9a8 100644 --- a/docs/faq/general/ne-tormozit.md +++ b/docs/faq/general/ne-tormozit.md @@ -7,7 +7,7 @@ description: 'This page explains what "Не тормозит" means' keywords: ['Yandex'] --- -# What Does "Не тормозит" Mean? {#what-does-ne-tormozit-mean} +# What does "Не тормозит" mean? {#what-does-ne-tormozit-mean} We often get this question when people see vintage (limited production) ClickHouse t-shirts. They have the words **"ClickHouse не тормозит"** written in big bold text on the front. diff --git a/docs/faq/general/olap.md b/docs/faq/general/olap.md index c786e01b212..1d3c9a99c13 100644 --- a/docs/faq/general/olap.md +++ b/docs/faq/general/olap.md @@ -20,7 +20,7 @@ Analytical Online : ...in real-time. -## OLAP from the Business Perspective {#olap-from-the-business-perspective} +## OLAP from the business perspective {#olap-from-the-business-perspective} In recent years, business people started to realize the value of data. Companies who make their decisions blindly, more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be remotely useful for making business decisions and need mechanisms to timely analyze them. Here's where OLAP database management systems (DBMS) come in. @@ -28,7 +28,7 @@ In a business sense, OLAP allows companies to continuously plan, analyze, and re ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and an in-house data warehouse scenario is also viable. -## OLAP from the Technical Perspective {#olap-from-the-technical-perspective} +## OLAP from the technical perspective {#olap-from-the-technical-perspective} All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). Former focuses on building reports, each based on large volumes of historical data, but doing it not so frequently. While the latter usually handle a continuous stream of transactions, constantly modifying the current state of data. diff --git a/docs/faq/integration/json-import.md b/docs/faq/integration/json-import.md index 6e1776a3a18..6363b725a52 100644 --- a/docs/faq/integration/json-import.md +++ b/docs/faq/integration/json-import.md @@ -26,7 +26,7 @@ $ echo '{"foo":"bar"}' | clickhouse-client --query="INSERT INTO test FORMAT JSO Instead of inserting data manually, you might consider to use an [integration tool](../../integrations/index.mdx) instead. -## Useful Settings {#useful-settings} +## Useful settings {#useful-settings} - `input_format_skip_unknown_fields` allows to insert JSON even if there were additional fields not present in table schema (by discarding them). - `input_format_import_nested_json` allows to insert nested JSON objects into columns of [Nested](../../sql-reference/data-types/nested-data-structures/index.md) type. diff --git a/docs/faq/integration/oracle-odbc.md b/docs/faq/integration/oracle-odbc.md index 9cf9e0d2a9b..630e99c089c 100644 --- a/docs/faq/integration/oracle-odbc.md +++ b/docs/faq/integration/oracle-odbc.md @@ -6,7 +6,7 @@ toc_priority: 20 description: 'This page provides guidance on what to do if you have a problem with encodings when using Oracle via ODBC' --- -# What If I Have a Problem with Encodings When Using Oracle Via ODBC? {#oracle-odbc-encodings} +# What if I have a problem with encodings when using Oracle via ODBC? {#oracle-odbc-encodings} If you use Oracle as a source of ClickHouse external dictionaries via Oracle ODBC driver, you need to set the correct value for the `NLS_LANG` environment variable in `/etc/default/clickhouse`. For more information, see the [Oracle NLS_LANG FAQ](https://www.oracle.com/technetwork/products/globalization/nls-lang-099431.html). diff --git a/docs/faq/operations/delete-old-data.md b/docs/faq/operations/delete-old-data.md index 4cbd8b65f08..2db4ff949b4 100644 --- a/docs/faq/operations/delete-old-data.md +++ b/docs/faq/operations/delete-old-data.md @@ -6,7 +6,7 @@ toc_priority: 20 description: 'This page answers the question of whether it is possible to delete old records from a ClickHouse table' --- -# Is It Possible to Delete Old Records from a ClickHouse Table? {#is-it-possible-to-delete-old-records-from-a-clickhouse-table} +# Is it possible to delete old records from a ClickHouse table? {#is-it-possible-to-delete-old-records-from-a-clickhouse-table} The short answer is “yes”. ClickHouse has multiple mechanisms that allow freeing up disk space by removing old data. Each mechanism is aimed for different scenarios. diff --git a/docs/faq/operations/production.md b/docs/faq/operations/production.md index 14505193002..193e01358cc 100644 --- a/docs/faq/operations/production.md +++ b/docs/faq/operations/production.md @@ -6,7 +6,7 @@ toc_priority: 10 description: 'This page provides guidance on which ClickHouse version to use in production' --- -# Which ClickHouse Version to Use in Production? {#which-clickhouse-version-to-use-in-production} +# Which ClickHouse version to use in production? {#which-clickhouse-version-to-use-in-production} First of all, let's discuss why people ask this question in the first place. There are two key reasons: @@ -15,7 +15,7 @@ First of all, let's discuss why people ask this question in the first place. The The second reason is more fundamental, so we'll start with that one and then get back to navigating through various ClickHouse releases. -## Which ClickHouse Version Do You Recommend? {#which-clickhouse-version-do-you-recommend} +## Which ClickHouse version do you recommend? {#which-clickhouse-version-do-you-recommend} It's tempting to hire consultants or trust some known experts to get rid of responsibility for your production environment. You install some specific ClickHouse version that someone else recommended; if there's some issue with it - it's not your fault, it's someone else's. This line of reasoning is a big trap. No external person knows better than you what's going on in your company's production environment. @@ -46,7 +46,7 @@ When you have your pre-production environment and testing infrastructure in plac As you might have noticed, there's nothing specific to ClickHouse in the approach described above - people do that for any piece of infrastructure they rely on if they take their production environment seriously. -## How to Choose Between ClickHouse Releases? {#how-to-choose-between-clickhouse-releases} +## How to choose between ClickHouse releases? {#how-to-choose-between-clickhouse-releases} If you look into the contents of the ClickHouse package repository, you'll see two kinds of packages: diff --git a/docs/faq/troubleshooting.md b/docs/faq/troubleshooting.md index ddb2f074267..4b1221d7dea 100644 --- a/docs/faq/troubleshooting.md +++ b/docs/faq/troubleshooting.md @@ -4,7 +4,7 @@ slug: /faq/troubleshooting description: 'How to troubleshoot common ClickHouse Cloud error messages.' --- -## ClickHouse Cloud Troubleshooting {#clickhouse-cloud-troubleshooting} +## ClickHouse Cloud troubleshooting {#clickhouse-cloud-troubleshooting} ### Unable to access a ClickHouse Cloud service {#unable-to-access-a-clickhouse-cloud-service} diff --git a/docs/faq/use-cases/index.md b/docs/faq/use-cases/index.md index e86365bdc28..6331eb4d6f0 100644 --- a/docs/faq/use-cases/index.md +++ b/docs/faq/use-cases/index.md @@ -6,7 +6,7 @@ title: 'Questions About ClickHouse Use Cases' description: 'Landing page listing common questions about ClickHouse use cases' --- -# Questions About ClickHouse Use Cases +# Questions about ClickHouse use cases - [Can I use ClickHouse as a time-series database?](/knowledgebase/time-series) - [Can I use ClickHouse as a key-value storage?](/knowledgebase/key-value) diff --git a/docs/faq/use-cases/key-value.md b/docs/faq/use-cases/key-value.md index 49e5e134e6a..b044eb38ef1 100644 --- a/docs/faq/use-cases/key-value.md +++ b/docs/faq/use-cases/key-value.md @@ -6,7 +6,7 @@ toc_priority: 101 description: 'Answers the frequently asked question of whether or not ClickHouse can be used as a key-value storage?' --- -# Can I Use ClickHouse As a Key-Value Storage? {#can-i-use-clickhouse-as-a-key-value-storage} +# Can I use ClickHouse as a key-value storage? {#can-i-use-clickhouse-as-a-key-value-storage} The short answer is **"no"**. The key-value workload is among top positions in the list of cases when **NOT** to use ClickHouse. It's an [OLAP](../../faq/general/olap.md) system after all, while there are many excellent key-value storage systems out there. diff --git a/docs/faq/use-cases/time-series.md b/docs/faq/use-cases/time-series.md index dc4ea344caa..1db9ac6cba3 100644 --- a/docs/faq/use-cases/time-series.md +++ b/docs/faq/use-cases/time-series.md @@ -6,7 +6,7 @@ toc_priority: 101 description: 'Page describing how to use ClickHouse as a time-series database' --- -# Can I Use ClickHouse As a Time-Series Database? {#can-i-use-clickhouse-as-a-time-series-database} +# Can I use ClickHouse as a time-series database? {#can-i-use-clickhouse-as-a-time-series-database} _Note: Please see the blog [Working with Time series data in ClickHouse](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse) for additional examples of using ClickHouse for time series analysis._ diff --git a/docs/getting-started/example-datasets/brown-benchmark.md b/docs/getting-started/example-datasets/brown-benchmark.md index bde0445dfdb..59f07d3197e 100644 --- a/docs/getting-started/example-datasets/brown-benchmark.md +++ b/docs/getting-started/example-datasets/brown-benchmark.md @@ -91,7 +91,7 @@ clickhouse-client --query "INSERT INTO mgbench.logs2 FORMAT CSVWithNames" < mgbe clickhouse-client --query "INSERT INTO mgbench.logs3 FORMAT CSVWithNames" < mgbench3.csv ``` -## Run benchmark queries: {#run-benchmark-queries} +## Run benchmark queries {#run-benchmark-queries} ```sql USE mgbench; diff --git a/docs/getting-started/example-datasets/cell-towers.md b/docs/getting-started/example-datasets/cell-towers.md index 22bda6de53a..5626cb4bf89 100644 --- a/docs/getting-started/example-datasets/cell-towers.md +++ b/docs/getting-started/example-datasets/cell-towers.md @@ -40,7 +40,7 @@ Here is a preview of the dashboard created in this guide: Dashboard of cell towers by radio type in mcc 204 -## Get the Dataset {#get-the-dataset} +## Get the dataset {#get-the-dataset} This dataset is from [OpenCelliD](https://www.opencellid.org/) - The world's largest Open Database of Cell Towers. @@ -167,7 +167,7 @@ Based on the above query and the [MCC list](https://en.wikipedia.org/wiki/Mobile You may want to create a [Dictionary](../../sql-reference/dictionaries/index.md) in ClickHouse to decode these values. -## Use case: Incorporate geo data {#use-case} +## Use case: incorporate geo data {#use-case} Using the [`pointInPolygon`](/sql-reference/functions/geo/coordinates.md/#pointinpolygon) function. @@ -312,7 +312,7 @@ To build a Superset dashboard using the OpenCelliD dataset you should: If **ClickHouse Connect** is not one of your options, then you will need to install it. The command is `pip install clickhouse-connect`, and more info is [available here](https://pypi.org/project/clickhouse-connect/). ::: -#### Add your connection details: {#add-your-connection-details} +#### Add your connection details {#add-your-connection-details} :::tip Make sure that you set **SSL** on when connecting to ClickHouse Cloud or other ClickHouse systems that enforce the use of SSL. diff --git a/docs/getting-started/example-datasets/github.md b/docs/getting-started/example-datasets/github.md index 0b25051b181..2298e485a99 100644 --- a/docs/getting-started/example-datasets/github.md +++ b/docs/getting-started/example-datasets/github.md @@ -2414,7 +2414,7 @@ FORMAT PrettyCompactMonoBlock 3 rows in set. Elapsed: 0.170 sec. Processed 611.53 thousand rows, 41.76 MB (3.60 million rows/s., 246.07 MB/s.) ``` -## Unsolved Questions {#unsolved-questions} +## Unsolved questions {#unsolved-questions} ### Git blame {#git-blame} diff --git a/docs/getting-started/example-datasets/menus.md b/docs/getting-started/example-datasets/menus.md index 3c8d6c77ce5..bace0087323 100644 --- a/docs/getting-started/example-datasets/menus.md +++ b/docs/getting-started/example-datasets/menus.md @@ -14,7 +14,7 @@ The data is in public domain. The data is from library's archive and it may be incomplete and difficult for statistical analysis. Nevertheless it is also very yummy. The size is just 1.3 million records about dishes in the menus — it's a very small data volume for ClickHouse, but it's still a good example. -## Download the Dataset {#download-dataset} +## Download the dataset {#download-dataset} Run the command: @@ -28,7 +28,7 @@ md5sum 2021_08_01_07_01_17_data.tgz Replace the link to the up to date link from http://menus.nypl.org/data if needed. Download size is about 35 MB. -## Unpack the Dataset {#unpack-dataset} +## Unpack the dataset {#unpack-dataset} ```bash tar xvf 2021_08_01_07_01_17_data.tgz @@ -42,7 +42,7 @@ The data is normalized consisted of four tables: - `MenuPage` — Information about the pages in the menus, because every page belongs to some menu. - `MenuItem` — An item of the menu. A dish along with its price on some menu page: links to dish and menu page. -## Create the Tables {#create-tables} +## Create the tables {#create-tables} We use [Decimal](../../sql-reference/data-types/decimal.md) data type to store prices. @@ -109,7 +109,7 @@ CREATE TABLE menu_item ) ENGINE = MergeTree ORDER BY id; ``` -## Import the Data {#import-data} +## Import the data {#import-data} Upload data into ClickHouse, run: @@ -128,7 +128,7 @@ We disable [input_format_null_as_default](/operations/settings/formats#input_for The setting [date_time_input_format best_effort](/operations/settings/formats#date_time_input_format) allows to parse [DateTime](../../sql-reference/data-types/datetime.md) fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed DateTime format is allowed. -## Denormalize the Data {#denormalize-data} +## Denormalize the data {#denormalize-data} Data is presented in multiple tables in [normalized form](https://en.wikipedia.org/wiki/Database_normalization#Normal_forms). It means you have to perform [JOIN](/sql-reference/statements/select/join) if you want to query, e.g. dish names from menu items. For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doing `JOIN` every time. It is called "denormalized" data. @@ -180,7 +180,7 @@ FROM menu_item JOIN menu ON menu_page.menu_id = menu.id; ``` -## Validate the Data {#validate-data} +## Validate the data {#validate-data} Query: @@ -196,7 +196,7 @@ Result: └─────────┘ ``` -## Run Some Queries {#run-queries} +## Run some queries {#run-queries} ### Averaged historical prices of dishes {#query-averaged-historical-prices} @@ -240,7 +240,7 @@ Result: Take it with a grain of salt. -### Burger Prices {#query-burger-prices} +### Burger prices {#query-burger-prices} Query: @@ -354,6 +354,6 @@ Result: At least they have caviar with vodka. Very nice. -## Online Playground {#playground} +## Online playground {#playground} The data is uploaded to ClickHouse Playground, [example](https://sql.clickhouse.com?query_id=KB5KQJJFNBKHE5GBUJCP1B). diff --git a/docs/getting-started/example-datasets/metrica.md b/docs/getting-started/example-datasets/metrica.md index e17b38016ab..b06074140bd 100644 --- a/docs/getting-started/example-datasets/metrica.md +++ b/docs/getting-started/example-datasets/metrica.md @@ -6,7 +6,7 @@ slug: /getting-started/example-datasets/metrica title: 'Anonymized Web Analytics' --- -# Anonymized Web Analytics Data +# Anonymized web analytics data This dataset consists of two tables containing anonymized web analytics data with hits (`hits_v1`) and visits (`visits_v1`). @@ -14,7 +14,7 @@ The tables can be downloaded as compressed `tsv.xz` files. In addition to the sa ## Download and ingest the data {#download-and-ingest-the-data} -### Download the hits compressed TSV file: {#download-the-hits-compressed-tsv-file} +### Download the hits compressed TSV file {#download-the-hits-compressed-tsv-file} ```bash curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv @@ -41,7 +41,7 @@ Or for hits_100m_obfuscated clickhouse-client --query="CREATE TABLE default.hits_100m_obfuscated (WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, Refresh UInt8, RefererCategoryID UInt16, RefererRegionID UInt32, URLCategoryID UInt16, URLRegionID UInt32, ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, OriginalURL String, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), LocalEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, RemoteIP UInt32, WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming UInt32, DNSTiming UInt32, ConnectTiming UInt32, ResponseStartTiming UInt32, ResponseEndTiming UInt32, FetchTiming UInt32, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192" ``` -### Import the hits data: {#import-the-hits-data} +### Import the hits data {#import-the-hits-data} ```bash cat hits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.hits_v1 FORMAT TSV" --max_insert_block_size=100000 @@ -57,7 +57,7 @@ clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1" 8873898 ``` -### Download the visits compressed TSV file: {#download-the-visits-compressed-tsv-file} +### Download the visits compressed TSV file {#download-the-visits-compressed-tsv-file} ```bash curl https://datasets.clickhouse.com/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv @@ -131,7 +131,7 @@ FORMAT PrettyCompact" └────────────┴─────────┴────────┘ ``` -## Next Steps {#next-steps} +## Next steps {#next-steps} [A Practical Introduction to Sparse Primary Indexes in ClickHouse](/guides/best-practices/sparse-primary-indexes.md) uses the hits dataset to discuss the differences in ClickHouse indexing compared to traditional relational databases, how ClickHouse builds and uses a sparse primary index, and indexing best practices. diff --git a/docs/getting-started/example-datasets/nyc-taxi.md b/docs/getting-started/example-datasets/nyc-taxi.md index 680fbe2fbb6..2471a4137bb 100644 --- a/docs/getting-started/example-datasets/nyc-taxi.md +++ b/docs/getting-started/example-datasets/nyc-taxi.md @@ -56,7 +56,7 @@ ENGINE = MergeTree PRIMARY KEY (pickup_datetime, dropoff_datetime); ``` -## Load the Data directly from Object Storage {#load-the-data-directly-from-object-storage} +## Load the data directly from object storage {#load-the-data-directly-from-object-storage} Users' can grab a small subset of the data (3 million rows) for getting familiar with it. The data is in TSV files in object storage, which is easily streamed into ClickHouse Cloud using the `s3` table function. @@ -126,7 +126,7 @@ FROM gcs( -## Sample Queries {#sample-queries} +## Sample queries {#sample-queries} The following queries are executed on the sample described above. Users can run the sample queries on the full dataset in [sql.clickhouse.com](https://sql.clickhouse.com/?query=U0VMRUNUIGNvdW50KCkgRlJPTSBueWNfdGF4aS50cmlwcw&chart=eyJ0eXBlIjoibGluZSIsImNvbmZpZyI6eyJ0aXRsZSI6IlRlbXBlcmF0dXJlIGJ5IGNvdW50cnkgYW5kIHllYXIiLCJ4YXhpcyI6InllYXIiLCJ5YXhpcyI6ImNvdW50KCkiLCJzZXJpZXMiOiJDQVNUKHBhc3Nlbmdlcl9jb3VudCwgJ1N0cmluZycpIn19), modifying the queries below to use the table `nyc_taxi.trips`. @@ -183,7 +183,7 @@ GROUP BY passenger_count ORDER BY passenger_count ASC ``` -## Download of Prepared Partitions {#download-of-prepared-partitions} +## Download of prepared partitions {#download-of-prepared-partitions} :::note The following steps provide information about the original dataset, and a method for loading prepared partitions into a self-managed ClickHouse server environment. @@ -209,7 +209,7 @@ $ clickhouse-client --query "select count(*) from datasets.trips_mergetree" If you will run the queries described below, you have to use the full table name, `datasets.trips_mergetree`. ::: -## Results on Single Server {#results-on-single-server} +## Results on single server {#results-on-single-server} Q1: diff --git a/docs/getting-started/example-datasets/nypd_complaint_data.md b/docs/getting-started/example-datasets/nypd_complaint_data.md index ea0e240f7ee..9d6fc7462ab 100644 --- a/docs/getting-started/example-datasets/nypd_complaint_data.md +++ b/docs/getting-started/example-datasets/nypd_complaint_data.md @@ -365,7 +365,7 @@ of `ORDER BY` or `PRIMARY KEY` must be specified. Here are some guidelines on d columns to includes in `ORDER BY`, and more information is in the *Next Steps* section at the end of this document. -### Order By and Primary Key clauses {#order-by-and-primary-key-clauses} +### Order by and primary key clauses {#order-by-and-primary-key-clauses} - The `ORDER BY` tuple should include fields that are used in query filters - To maximize compression on disk the `ORDER BY` tuple should be ordered by ascending cardinality @@ -485,7 +485,7 @@ table: NYPD_Complaint 1 row in set. Elapsed: 0.001 sec. ``` -## Preprocess and Import Data {#preprocess-import-data} +## Preprocess and import data {#preprocess-import-data} We will use `clickhouse-local` tool for data preprocessing and `clickhouse-client` to upload it. @@ -539,7 +539,7 @@ cat ${HOME}/NYPD_Complaint_Data_Current__Year_To_Date_.tsv \ | clickhouse-client --query='INSERT INTO NYPD_Complaint FORMAT TSV' ``` -## Validate the Data {#validate-data} +## Validate the data {#validate-data} :::note The dataset changes once or more per year, your counts may not match what is in this document. @@ -580,7 +580,7 @@ Result: ``` -## Run Some Queries {#run-queries} +## Run some queries {#run-queries} ### Query 1. Compare the number of complaints by month {#query-1-compare-the-number-of-complaints-by-month} @@ -618,7 +618,7 @@ Query id: 7fbd4244-b32a-4acf-b1f3-c3aa198e74d9 12 rows in set. Elapsed: 0.006 sec. Processed 208.99 thousand rows, 417.99 KB (37.48 million rows/s., 74.96 MB/s.) ``` -### Query 2. Compare total number of complaints by Borough {#query-2-compare-total-number-of-complaints-by-borough} +### Query 2. Compare total number of complaints by borough {#query-2-compare-total-number-of-complaints-by-borough} Query: @@ -648,6 +648,6 @@ Query id: 8cdcdfd4-908f-4be0-99e3-265722a2ab8d 6 rows in set. Elapsed: 0.008 sec. Processed 208.99 thousand rows, 209.43 KB (27.14 million rows/s., 27.20 MB/s.) ``` -## Next Steps {#next-steps} +## Next steps {#next-steps} [A Practical Introduction to Sparse Primary Indexes in ClickHouse](/guides/best-practices/sparse-primary-indexes.md) discusses the differences in ClickHouse indexing compared to traditional relational databases, how ClickHouse builds and uses a sparse primary index, and indexing best practices. diff --git a/docs/getting-started/example-datasets/ontime.md b/docs/getting-started/example-datasets/ontime.md index 16827ccf957..10de6e2ef54 100644 --- a/docs/getting-started/example-datasets/ontime.md +++ b/docs/getting-started/example-datasets/ontime.md @@ -125,7 +125,7 @@ CREATE TABLE `ontime` ORDER BY (Year, Quarter, Month, DayofMonth, FlightDate, IATA_CODE_Reporting_Airline); ``` -## Import from Raw Data {#import-from-raw-data} +## Import from raw data {#import-from-raw-data} Downloading data: diff --git a/docs/getting-started/example-datasets/uk-price-paid.md b/docs/getting-started/example-datasets/uk-price-paid.md index 1a1fa2d14ca..a80137b76b9 100644 --- a/docs/getting-started/example-datasets/uk-price-paid.md +++ b/docs/getting-started/example-datasets/uk-price-paid.md @@ -14,7 +14,7 @@ This data contains prices paid for real-estate property in England and Wales. Th - Description of the fields: https://www.gov.uk/guidance/about-the-price-paid-data - Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0. -## Create the Table {#create-table} +## Create the table {#create-table} ```sql CREATE DATABASE uk; @@ -40,7 +40,7 @@ ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2); ``` -## Preprocess and Insert the Data {#preprocess-import-data} +## Preprocess and insert the data {#preprocess-import-data} We will use the `url` function to stream the data into ClickHouse. We need to preprocess some of the incoming data first, which includes: - splitting the `postcode` to two different columns - `postcode1` and `postcode2`, which is better for storage and queries @@ -93,7 +93,7 @@ FROM url( Wait for the data to insert - it will take a minute or two depending on the network speed. -## Validate the Data {#validate-data} +## Validate the data {#validate-data} Let's verify it worked by seeing how many rows were inserted: @@ -112,11 +112,11 @@ WHERE name = 'uk_price_paid' Notice the size of the table is just 221.43 MiB! -## Run Some Queries {#run-queries} +## Run some queries {#run-queries} Let's run some queries to analyze the data: -### Query 1. Average Price Per Year {#average-price} +### Query 1. Average price per year {#average-price} ```sql runnable SELECT @@ -129,7 +129,7 @@ GROUP BY year ORDER BY year ``` -### Query 2. Average Price per Year in London {#average-price-london} +### Query 2. average price per year in London {#average-price-london} ```sql runnable SELECT @@ -145,7 +145,7 @@ ORDER BY year Something happened to home prices in 2020! But that is probably not a surprise... -### Query 3. The Most Expensive Neighborhoods {#most-expensive-neighborhoods} +### Query 3. The most expensive neighborhoods {#most-expensive-neighborhoods} ```sql runnable SELECT @@ -168,7 +168,7 @@ LIMIT 100 We can speed up these queries with projections. See ["Projections"](/data-modeling/projections) for examples with this dataset. -### Test it in the Playground {#playground} +### Test it in the playground {#playground} The dataset is also available in the [Online Playground](https://sql.clickhouse.com?query_id=TRCWH5ZETY4SEEK8ISCCAX). diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md index f452a728ce6..21de90ba480 100644 --- a/docs/getting-started/index.md +++ b/docs/getting-started/index.md @@ -8,7 +8,7 @@ slug: /getting-started/example-datasets/ title: 'Tutorials and Example Datasets' --- -# Tutorials and Example Datasets +# Tutorials and example datasets We have a lot of resources for helping you get started and learn how ClickHouse works: diff --git a/docs/getting-started/install/_snippets/_macos.md b/docs/getting-started/install/_snippets/_macos.md index 7b2eb975d4d..d3b21560ff4 100644 --- a/docs/getting-started/install/_snippets/_macos.md +++ b/docs/getting-started/install/_snippets/_macos.md @@ -16,7 +16,7 @@ the ClickHouse community [homebrew formula](https://formulae.brew.sh/cask/clickh brew install --cask clickhouse ``` -## Fix the developer verification error in MacOS {#fix-developer-verification-error-macos} +## Fix the developer verification error in macOS {#fix-developer-verification-error-macos} If you install ClickHouse using `brew`, you may encounter an error from MacOS. By default, MacOS will not run applications or tools created by a developer who cannot be verified. diff --git a/docs/getting-started/playground.md b/docs/getting-started/playground.md index 60fe41cba74..90371035308 100644 --- a/docs/getting-started/playground.md +++ b/docs/getting-started/playground.md @@ -7,7 +7,7 @@ slug: /getting-started/playground title: 'ClickHouse Playground' --- -# ClickHouse Playground +# ClickHouse playground [ClickHouse Playground](https://sql.clickhouse.com) allows people to experiment with ClickHouse by running queries instantly, without setting up their server or cluster. Several example datasets are available in Playground. diff --git a/docs/guides/best-practices/avoidnullablecolumns.md b/docs/guides/best-practices/avoidnullablecolumns.md index bcd9f6073a1..b7666559481 100644 --- a/docs/guides/best-practices/avoidnullablecolumns.md +++ b/docs/guides/best-practices/avoidnullablecolumns.md @@ -1,7 +1,7 @@ --- slug: /optimize/avoid-nullable-columns -sidebar_label: 'Avoid Nullable Columns' -title: 'Avoid Nullable Columns' +sidebar_label: 'Avoid nullable Columns' +title: 'Avoid nullable Columns' description: 'Why Nullable Columns should be avoided in ClickHouse' --- diff --git a/docs/guides/best-practices/index.md b/docs/guides/best-practices/index.md index 1b7acbac54a..6ff0bd04c5f 100644 --- a/docs/guides/best-practices/index.md +++ b/docs/guides/best-practices/index.md @@ -5,7 +5,7 @@ description: 'Overview page of Performance and Optimizations' title: 'Performance and Optimizations' --- -# Performance and Optimizations +# Performance and optimizations This section contains tips and best practices for improving performance with ClickHouse. We recommend users read [Core Concepts](/parts) as a precursor to this section, @@ -22,7 +22,7 @@ which covers the main concepts required to improve performance. | [Bulk Inserts](/optimize/bulk-inserts) | Explains the benefits of using bulk inserts in ClickHouse. | | [Asynchronous Inserts](/optimize/asynchronous-inserts) | Focuses on ClickHouse's asynchronous inserts feature. It likely explains how asynchronous inserts work (batching data on the server for efficient insertion) and their benefits (improved performance by offloading insert processing). It might also cover enabling asynchronous inserts and considerations for using them effectively in your ClickHouse environment. | | [Avoid Mutations](/optimize/avoid-mutations) | Discusses the importance of avoiding mutations (updates and deletes) in ClickHouse. It recommends using append-only inserts for optimal performance and suggests alternative approaches for handling data changes. | -| [Avoid Nullable Columns](/optimize/avoid-nullable-columns) | Discusses why you may want to avoid Nullable columns to save space and increase performance. Demonstrates how to set a default value for a column. | +| [Avoid nullable columns](/optimize/avoid-nullable-columns) | Discusses why you may want to avoid nullable columns to save space and increase performance. Demonstrates how to set a default value for a column. | | [Avoid Optimize Final](/optimize/avoidoptimizefinal) | Explains how the `OPTIMIZE TABLE ... FINAL` query is resource-intensive and suggests alternative approaches to optimize ClickHouse performance. | | [Analyzer](/operations/analyzer) | Looks at the ClickHouse Analyzer, a tool for analyzing and optimizing queries. Discusses how the Analyzer works, its benefits (e.g., identifying performance bottlenecks), and how to use it to improve your ClickHouse queries' efficiency. | | [Query Profiling](/operations/optimizing-performance/sampling-query-profiler) | Explains ClickHouse's Sampling Query Profiler, a tool that helps analyze query execution. | diff --git a/docs/guides/best-practices/query-optimization.md b/docs/guides/best-practices/query-optimization.md index 4ed659e9c7b..842cc5bb450 100644 --- a/docs/guides/best-practices/query-optimization.md +++ b/docs/guides/best-practices/query-optimization.md @@ -11,7 +11,7 @@ import Image from '@theme/IdealImage'; # A simple guide for query optimization -This section aims to illustrate through common scenarios how to use different performance and optimization techniques, such as [analyzer](/operations/analyzer), [query profiling](/operations/optimizing-performance/sampling-query-profiler) or [avoid Nullable Columns](/optimize/avoid-nullable-columns), in order to improve your ClickHouse query performances. +This section aims to illustrate through common scenarios how to use different performance and optimization techniques, such as [analyzer](/operations/analyzer), [query profiling](/operations/optimizing-performance/sampling-query-profiler) or [avoid nullable Columns](/optimize/avoid-nullable-columns), in order to improve your ClickHouse query performances. ## Understand query performance {#understand-query-performance} diff --git a/docs/guides/best-practices/skipping-indexes.md b/docs/guides/best-practices/skipping-indexes.md index 60d2c7ed0e6..75bbe0b52b7 100644 --- a/docs/guides/best-practices/skipping-indexes.md +++ b/docs/guides/best-practices/skipping-indexes.md @@ -10,7 +10,7 @@ import simple_skip from '@site/static/images/guides/best-practices/simple_skip.p import bad_skip from '@site/static/images/guides/best-practices/bad_skip.png'; import Image from '@theme/IdealImage'; -# Understanding ClickHouse Data Skipping Indexes +# Understanding ClickHouse data skipping indexes ## Introduction {#introduction} @@ -22,7 +22,7 @@ In a traditional relational database, one approach to this problem is to attach Instead, ClickHouse provides a different type of index, which in specific circumstances can significantly improve query speed. These structures are labeled "Skip" indexes because they enable ClickHouse to skip reading significant chunks of data that are guaranteed to have no matching values. -## Basic Operation {#basic-operation} +## Basic operation {#basic-operation} Users can only employ Data Skipping Indexes on the MergeTree family of tables. Each data skipping has four primary arguments: @@ -113,7 +113,7 @@ example, the debug log shows that the skip index dropped all but two granules: ```sql default.skip_table (933d4b2c-8cea-4bf9-8c93-c56e900eefd1) (SelectExecutor): Index `vix` has dropped 6102/6104 granules. ``` -## Skip Index Types {#skip-index-types} +## Skip index types {#skip-index-types} ### minmax {#minmax} @@ -130,7 +130,7 @@ an unlimited number of discrete values). This set contains all values in the bl The cost, performance, and effectiveness of this index is dependent on the cardinality within blocks. If each block contains a large number of unique values, either evaluating the query condition against a large index set will be very expensive, or the index will not be applied because the index is empty due to exceeding max_size. -### Bloom Filter Types {#bloom-filter-types} +### Bloom filter types {#bloom-filter-types} A *Bloom filter* is a data structure that allows space-efficient testing of set membership at the cost of a slight chance of false positives. A false positive is not a significant concern in the case of skip indexes because the only disadvantage is reading a few unnecessary blocks. However, the potential for false positives does mean that the indexed expression should be expected to be true, otherwise valid data may be skipped. @@ -149,7 +149,7 @@ This index works only with String, FixedString, and Map datatypes. The input exp ``` This index can also be useful for text searches, particularly languages without word breaks, such as Chinese. -## Skip Index Functions {#skip-index-functions} +## Skip index functions {#skip-index-functions} The core purpose of data-skipping indexes is to limit the amount of data analyzed by popular queries. Given the analytic nature of ClickHouse data, the pattern of those queries in most cases includes functional expressions. Accordingly, skip indexes must interact correctly with common functions to be efficient. This can happen either when: * data is inserted and the index is defined as a functional expression (with the result of the expression stored in the index files), or @@ -158,7 +158,7 @@ The core purpose of data-skipping indexes is to limit the amount of data analyze Each type of skip index works on a subset of available ClickHouse functions appropriate to the index implementation listed [here](/engines/table-engines/mergetree-family/mergetree/#functions-support). In general, set indexes and Bloom filter based indexes (another type of set index) are both unordered and therefore do not work with ranges. In contrast, minmax indexes work particularly well with ranges since determining whether ranges intersect is very fast. The efficacy of partial match functions LIKE, startsWith, endsWith, and hasToken depend on the index type used, the index expression, and the particular shape of the data. -## Skip Index Settings {#skip-index-settings} +## Skip index settings {#skip-index-settings} There are two available settings that apply to skip indexes. @@ -170,7 +170,7 @@ queries. In circumstances where querying a table is too expensive unless a skip names will return an exception for any query that does not use the listed index. This would prevent poorly written queries from consuming server resources. -## Skip Best Practices {#skip-best-practices} +## Skip best practices {#skip-best-practices} Skip indexes are not intuitive, especially for users accustomed to secondary row-based indexes from the RDMS realm or inverted indexes from document stores. To get any benefit, applying a ClickHouse data skipping index must avoid enough granule reads to offset the cost of calculating the index. Critically, if a value occurs even once in an indexed block, it means the entire block must be read into memory and evaluated, and the index cost has been needlessly incurred. diff --git a/docs/guides/best-practices/sparse-primary-indexes.md b/docs/guides/best-practices/sparse-primary-indexes.md index f4565584fa6..ae246d1b11d 100644 --- a/docs/guides/best-practices/sparse-primary-indexes.md +++ b/docs/guides/best-practices/sparse-primary-indexes.md @@ -33,7 +33,7 @@ import sparsePrimaryIndexes15a from '@site/static/images/guides/best-practices/s import sparsePrimaryIndexes15b from '@site/static/images/guides/best-practices/sparse-primary-indexes-15b.png'; import Image from '@theme/IdealImage'; -# A Practical Introduction to Primary Indexes in ClickHouse +# A practical introduction to primary indexes in ClickHouse ## Introduction {#introduction} @@ -52,7 +52,7 @@ For ClickHouse [secondary data skipping indexes](/engines/table-engines/mergetre ::: -### Data Set {#data-set} +### Data set {#data-set} Throughout this guide we will use a sample anonymized web traffic data set. @@ -66,7 +66,7 @@ With these three columns we can already formulate some typical web analytics que - "What are the top 10 users that most frequently clicked a specific URL?" - "What are the most popular times (e.g. days of the week) at which a user clicks on a specific URL?" -### Test Machine {#test-machine} +### Test machine {#test-machine} All runtime numbers given in this document are based on running ClickHouse 22.2.1 locally on a MacBook Pro with the Apple M1 Pro chip and 16GB of RAM. @@ -157,7 +157,7 @@ ClickHouse client's result output indicates that ClickHouse executed a full tabl To make this (way) more efficient and (much) faster, we need to use a table with a appropriate primary key. This will allow ClickHouse to automatically (based on the primary key's column(s)) create a sparse primary index which can then be used to significantly speed up the execution of our example query. -## ClickHouse Index Design {#clickhouse-index-design} +## ClickHouse index design {#clickhouse-index-design} ### An index design for massive data scales {#an-index-design-for-massive-data-scales} diff --git a/docs/guides/creating-tables.md b/docs/guides/creating-tables.md index 924955d1c39..fe1ecf915df 100644 --- a/docs/guides/creating-tables.md +++ b/docs/guides/creating-tables.md @@ -47,7 +47,7 @@ The table engine determines: There are many engines to choose from, but for a simple table on a single-node ClickHouse server, [MergeTree](/engines/table-engines/mergetree-family/mergetree.md) is your likely choice. ::: -## A Brief Intro to Primary Keys {#a-brief-intro-to-primary-keys} +## A brief intro to primary keys {#a-brief-intro-to-primary-keys} Before you go any further, it is important to understand how primary keys work in ClickHouse (the implementation of primary keys might seem unexpected!): diff --git a/docs/guides/developer/alternative-query-languages.md b/docs/guides/developer/alternative-query-languages.md index 0ba281b61ce..9ca05f7ac0d 100644 --- a/docs/guides/developer/alternative-query-languages.md +++ b/docs/guides/developer/alternative-query-languages.md @@ -24,7 +24,7 @@ Standard SQL is the default query language of ClickHouse. SET dialect = 'clickhouse' ``` -## Pipelined Relational Query Language (PRQL) {#pipelined-relational-query-language-prql} +## Pipelined relational query language (PRQL) {#pipelined-relational-query-language-prql} @@ -48,7 +48,7 @@ aggregate { Under the hood, ClickHouse uses transpilation from PRQL to SQL to run PRQL queries. -## Kusto Query Language (KQL) {#kusto-query-language-kql} +## Kusto query language (KQL) {#kusto-query-language-kql} diff --git a/docs/guides/developer/cascading-materialized-views.md b/docs/guides/developer/cascading-materialized-views.md index 85d11a15624..c9a9a0d2f2d 100644 --- a/docs/guides/developer/cascading-materialized-views.md +++ b/docs/guides/developer/cascading-materialized-views.md @@ -5,9 +5,9 @@ description: 'How to use multiple materialized views from a source table.' keywords: ['materialized view', 'aggregation'] --- -# Cascading Materialized Views +# Cascading materialized views -This example demonstrates how to create a Materialized View, and then how to cascade a second Materialized View on to the first. In this page, you will see how to do it, many of the possibilities, and the limitations. Different use cases can be answered by creating a Materialized view using a second Materialized view as the source. +This example demonstrates how to create a materialized view, and then how to cascade a second materialized view on to the first. In this page, you will see how to do it, many of the possibilities, and the limitations. Different use cases can be answered by creating a Materialized view using a second Materialized view as the source. @@ -54,7 +54,7 @@ You can create a materialized view on a Null table. So the data written to the t ## Monthly aggregated table and materialized view {#monthly-aggregated-table-and-materialized-view} -For the first Materialized View, we need to create the `Target` table, for this example, it will be `analytics.monthly_aggregated_data` and we will store the sum of the views by month and domain name. +For the first materialized view, we need to create the `Target` table, for this example, it will be `analytics.monthly_aggregated_data` and we will store the sum of the views by month and domain name. ```sql CREATE TABLE analytics.monthly_aggregated_data @@ -67,7 +67,7 @@ ENGINE = AggregatingMergeTree ORDER BY (domain_name, month) ``` -The Materialized View that will forward the data on the target table will look like this: +The materialized view that will forward the data on the target table will look like this: ```sql CREATE MATERIALIZED VIEW analytics.monthly_aggregated_data_mv @@ -103,7 +103,7 @@ ORDER BY (domain_name, year) This step defines the cascade. The `FROM` statement will use the `monthly_aggregated_data` table, this means the data flow will be: 1. The data comes to the `hourly_data` table. -2. ClickHouse will forward the data received to the first Materialized View `monthly_aggregated_data` table, +2. ClickHouse will forward the data received to the first materialized view `monthly_aggregated_data` table, 3. Finally, the data received in step 2 will be forwarded to the `year_aggregated_data`. ```sql diff --git a/docs/guides/developer/deduplication.md b/docs/guides/developer/deduplication.md index af77b949b72..24ad29c694e 100644 --- a/docs/guides/developer/deduplication.md +++ b/docs/guides/developer/deduplication.md @@ -10,7 +10,7 @@ import deduplication from '@site/static/images/guides/developer/de_duplication.p import Image from '@theme/IdealImage'; -# Deduplication Strategies +# Deduplication strategies **Deduplication** refers to the process of ***removing duplicate rows of a dataset***. In an OLTP database, this is done easily because each row has a unique primary key-but at the cost of slower inserts. Every inserted row needs to first be searched for and, if found, needs to be replaced. @@ -164,7 +164,7 @@ Grouping as shown in the query above can actually be more efficient (in terms of Our [Deleting and Updating Data training module](https://learn.clickhouse.com/visitor_catalog_class/show/1328954/?utm_source=clickhouse&utm_medium=docs) expands on this example, including how to use a `version` column with `ReplacingMergeTree`. -## Using CollapsingMergeTree for Updating Columns Frequently {#using-collapsingmergetree-for-updating-columns-frequently} +## Using CollapsingMergeTree for updating columns frequently {#using-collapsingmergetree-for-updating-columns-frequently} Updating a column involves deleting an existing row and replacing it with new values. As you have already seen, this type of mutation in ClickHouse happens _eventually_ - during merges. If you have a lot of rows to update, it can actually be more efficient to avoid `ALTER TABLE..UPDATE` and instead just insert the new data alongside the existing data. We could add a column that denotes whether or not the data is stale or new... and there is actually a table engine that already implements this behavior very nicely, especially considering that it deletes the stale data automatically for you. Let's see how it works. @@ -248,7 +248,7 @@ INSERT INTO hackernews_views(id, author, sign) VALUES ``` ::: -## Real-time Updates from Multiple Threads {#real-time-updates-from-multiple-threads} +## Real-time updates from multiple threads {#real-time-updates-from-multiple-threads} With a `CollapsingMergeTree` table, rows cancel each other using a sign column, and the state of a row is determined by the last row inserted. But this can be problematic if you are inserting rows from different threads where rows can be inserted out of order. Using the "last" row does not work in this situation. diff --git a/docs/guides/developer/index.md b/docs/guides/developer/index.md index 76ac534f25b..aad644e8cda 100644 --- a/docs/guides/developer/index.md +++ b/docs/guides/developer/index.md @@ -5,7 +5,7 @@ description: 'Overview of the advanced guides' title: 'Advanced Guides' --- -# Advanced Guides +# Advanced guides This section contains the following advanced guides: diff --git a/docs/guides/developer/lightweight-update.md b/docs/guides/developer/lightweight-update.md index 06760711571..1e319dc689e 100644 --- a/docs/guides/developer/lightweight-update.md +++ b/docs/guides/developer/lightweight-update.md @@ -6,9 +6,13 @@ keywords: ['lightweight update'] description: 'Provides a description of lightweight updates' --- -## Lightweight Update {#lightweight-update} +# Lightweight update {#lightweight-update} -When lightweight updates are enabled, updated rows are marked as updated immediately and subsequent `SELECT` queries will automatically return with the changed values. When lightweight updates are not enabled, you may have to wait for your mutations to be applied via a background process to see the changed values. +## Introduction {#introduction} + +When lightweight updates are enabled, updated rows are marked as updated immediately and subsequent `SELECT` queries will +automatically return with the changed values. When lightweight updates are not enabled, you may have to wait for your +mutations to be applied via a background process to see the changed values. Lightweight updates can be enabled for `MergeTree`-family tables by enabling the query-level setting `apply_mutations_on_fly`. @@ -16,6 +20,10 @@ Lightweight updates can be enabled for `MergeTree`-family tables by enabling the SET apply_mutations_on_fly = 1; ``` +## How lightweight updates work {#how-lightweight-updates-work} + +[`ALTER TABLE ... UPDATE`]() queries in ClickHouse are implemented as mutations. A mutation is a heavyweight operation that rewrites parts, either synchronously or asynchronously. + ## Example {#example} Let's create a table and run some mutations: diff --git a/docs/guides/developer/mutations.md b/docs/guides/developer/mutations.md index 04d8e24661f..6f19b0cadd7 100644 --- a/docs/guides/developer/mutations.md +++ b/docs/guides/developer/mutations.md @@ -2,19 +2,21 @@ slug: /guides/developer/mutations sidebar_label: 'Updating and Deleting Data' sidebar_position: 1 -keywords: ['UPDATE', 'DELETE'] +keywords: ['UPDATE', 'DELETE', 'mutations'] title: 'Updating and deleting ClickHouse data' description: 'Describes how to perform update and delete operations in ClickHouse' show_related_blogs: false --- -# Updating and deleting ClickHouse data +# Updating and deleting ClickHouse data with mutations -Although ClickHouse is geared toward high volume analytic workloads, it is possible in some situations to modify or delete existing data. These operations are labeled "mutations" and are executed using the `ALTER TABLE` command. You can also `DELETE` a row using the lightweight -delete capability of ClickHouse. +Although ClickHouse is geared toward high volume analytic workloads, it is possible in some situations to modify or +delete existing data. These operations are labeled "mutations" and are executed using the `ALTER TABLE` command. :::tip -If you need to perform frequent updates, consider using [deduplication](../developer/deduplication.md) in ClickHouse, which allows you to update and/or delete rows without generating a mutation event. +If you need to perform frequent updates, consider using [deduplication](../developer/deduplication.md) in ClickHouse, which allows you to update +and/or delete rows without generating a mutation event. Alternatively, use [lightweight updates](/guides/developer/lightweight-update) +or [lightweight deletes](/guides/developer/lightweight-delete) ::: ## Updating data {#updating-data} diff --git a/docs/guides/developer/replacing-merge-tree.md b/docs/guides/developer/replacing-merge-tree.md index 9b8f3f79ca9..cb9d33b01ea 100644 --- a/docs/guides/developer/replacing-merge-tree.md +++ b/docs/guides/developer/replacing-merge-tree.md @@ -312,15 +312,15 @@ ORDER BY year ASC As shown, partitioning has significantly improved query performance in this case by allowing the deduplication process to occur at a partition level in parallel. -## Merge Behavior Considerations {#merge-behavior-considerations} +## Merge behavior considerations {#merge-behavior-considerations} ClickHouse's merge selection mechanism goes beyond simple merging of parts. Below, we examine this behavior in the context of ReplacingMergeTree, including configuration options for enabling more aggressive merging of older data and considerations for larger parts. -### Merge Selection Logic {#merge-selection-logic} +### Merge selection logic {#merge-selection-logic} While merging aims to minimize the number of parts, it also balances this goal against the cost of write amplification. Consequently, some ranges of parts are excluded from merging if they would lead to excessive write amplification, based on internal calculations. This behavior helps prevent unnecessary resource usage and extends the lifespan of storage components. -### Merging Behavior on Large Parts {#merging-behavior-on-large-parts} +### Merging behavior on large parts {#merging-behavior-on-large-parts} The ReplacingMergeTree engine in ClickHouse is optimized for managing duplicate rows by merging data parts, keeping only the latest version of each row based on a specified unique key. However, when a merged part reaches the max_bytes_to_merge_at_max_space_in_pool threshold, it will no longer be selected for further merging, even if min_age_to_force_merge_seconds is set. As a result, automatic merges can no longer be relied upon to remove duplicates that may accumulate with ongoing data insertion. @@ -328,11 +328,11 @@ To address this, users can invoke OPTIMIZE FINAL to manually merge parts and rem For a more sustainable solution that maintains performance, partitioning the table is recommended. This can help prevent data parts from reaching the maximum merge size and reduces the need for ongoing manual optimizations. -### Partitioning and Merging Across Partitions {#partitioning-and-merging-across-partitions} +### Partitioning and merging across partitions {#partitioning-and-merging-across-partitions} As discussed in Exploiting Partitions with ReplacingMergeTree, we recommend partitioning tables as a best practice. Partitioning isolates data for more efficient merges and avoids merging across partitions, particularly during query execution. This behavior is enhanced in versions from 23.12 onward: if the partition key is a prefix of the sorting key, merging across partitions is not performed at query time, leading to faster query performance. -### Tuning Merges for Better Query Performance {#tuning-merges-for-better-query-performance} +### Tuning merges for better query performance {#tuning-merges-for-better-query-performance} By default, min_age_to_force_merge_seconds and min_age_to_force_merge_on_partition_only are set to 0 and false, respectively, disabling these features. In this configuration, ClickHouse will apply standard merging behavior without forcing merges based on partition age. @@ -340,7 +340,7 @@ If a value for min_age_to_force_merge_seconds is specified, ClickHouse will igno This behavior can be further tuned by setting min_age_to_force_merge_on_partition_only=true, requiring all parts in the partition to be older than min_age_to_force_merge_seconds for aggressive merging. This configuration allows older partitions to merge down to a single part over time, which consolidates data and maintains query performance. -### Recommended Settings {#recommended-settings} +### Recommended settings {#recommended-settings} :::warning Tuning merge behavior is an advanced operation. We recommend consulting with ClickHouse support before enabling these settings in production workloads. diff --git a/docs/guides/developer/ttl.md b/docs/guides/developer/ttl.md index 4b29ca64cfc..49361a6c6cc 100644 --- a/docs/guides/developer/ttl.md +++ b/docs/guides/developer/ttl.md @@ -10,7 +10,7 @@ show_related_blogs: true import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge'; -# Manage Data with TTL (Time-to-live) +# Manage data with TTL (time-to-live) ## Overview of TTL {#overview-of-ttl} @@ -24,7 +24,7 @@ TTL (time-to-live) refers to the capability of having rows or columns moved, del TTL can be applied to entire tables or specific columns. ::: -## TTL Syntax {#ttl-syntax} +## TTL syntax {#ttl-syntax} The `TTL` clause can appear after a column definition and/or at the end of the table definition. Use the `INTERVAL` clause to define a length of time (which needs to be a `Date` or `DateTime` data type). For example, the following table has two columns with `TTL` clauses: @@ -48,7 +48,7 @@ ORDER BY tuple() TTL rules can be altered or deleted. See the [Manipulations with Table TTL](/sql-reference/statements/alter/ttl.md) page for more details. ::: -## Triggering TTL Events {#triggering-ttl-events} +## Triggering TTL events {#triggering-ttl-events} The deleting or aggregating of expired rows is not immediate - it only occurs during table merges. If you have a table that's not actively merging (for whatever reason), there are two settings that trigger TTL events: @@ -67,7 +67,7 @@ OPTIMIZE TABLE example1 FINAL `OPTIMIZE` initializes an unscheduled merge of the parts of your table, and `FINAL` forces a reoptimization if your table is already a single part. ::: -## Removing Rows {#removing-rows} +## Removing rows {#removing-rows} To remove entire rows from a table after a certain amount of time, define the TTL rule at the table level: @@ -100,7 +100,7 @@ TTL time + INTERVAL 1 MONTH DELETE WHERE event != 'error', time + INTERVAL 6 MONTH DELETE WHERE event = 'error' ``` -## Removing Columns {#removing-columns} +## Removing columns {#removing-columns} Instead of deleting the entire row, suppose you want just the balance and address columns to expire. Let's modify the `customers` table and add a TTL for both columns to be 2 hours: @@ -110,7 +110,7 @@ MODIFY COLUMN balance Int32 TTL timestamp + INTERVAL 2 HOUR, MODIFY COLUMN address String TTL timestamp + INTERVAL 2 HOUR ``` -## Implementing a Rollup {#implementing-a-rollup} +## Implementing a rollup {#implementing-a-rollup} Suppose we want to delete rows after a certain amount of time but hang on to some of the data for reporting purposes. We don't want all the details - just a few aggregated results of historical data. This can be implemented by adding a `GROUP BY` clause to your `TTL` expression, along with some columns in your table to store the aggregated results. Suppose in the following `hits` table we want to delete old rows, but hang on to the sum and maximum of the `hits` columns before removing the rows. We will need a field to store those values in, and we will need to add a `GROUP BY` clause to the `TTL` clause that rolls up the sum and maximum: diff --git a/docs/guides/developer/understanding-query-execution-with-the-analyzer.md b/docs/guides/developer/understanding-query-execution-with-the-analyzer.md index 6b8cd015399..24620bf6eb1 100644 --- a/docs/guides/developer/understanding-query-execution-with-the-analyzer.md +++ b/docs/guides/developer/understanding-query-execution-with-the-analyzer.md @@ -12,7 +12,7 @@ import analyzer4 from '@site/static/images/guides/developer/analyzer4.png'; import analyzer5 from '@site/static/images/guides/developer/analyzer5.png'; import Image from '@theme/IdealImage'; -# Understanding Query Execution with the Analyzer +# Understanding query execution with the analyzer ClickHouse processes queries extremely quickly, but the execution of a query is not a simple story. Let's try to understand how a `SELECT` query gets executed. To illustrate it, let's add some data in a table in ClickHouse: @@ -241,7 +241,7 @@ GROUP BY type You can now see all the inputs, functions, aliases, and data types that are being used. You can see some of the optimizations that the planner is going to apply [here](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/QueryPlan/Optimizations/Optimizations.h). -## Query Pipeline {#query-pipeline} +## Query pipeline {#query-pipeline} A query pipeline is generated from the query plan. The query pipeline is very similar to the query plan, with the difference that it's not a tree but a graph. It highlights how ClickHouse is going to execute a query and what resources are going to be used. Analyzing the query pipeline is very useful to see where the bottleneck is in terms of inputs/outputs. Let's take our previous query and look at the query pipeline execution: diff --git a/docs/guides/examples/aggregate_function_combinators/anyIf.md b/docs/guides/examples/aggregate_function_combinators/anyIf.md index f3de0478ce9..daa2e4e5ab3 100644 --- a/docs/guides/examples/aggregate_function_combinators/anyIf.md +++ b/docs/guides/examples/aggregate_function_combinators/anyIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be aggregate function to select the first encountered element from a given column that matches the given condition. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores sales data with success flags, and we'll use `anyIf` to select the first `transaction_id`s which are above and diff --git a/docs/guides/examples/aggregate_function_combinators/argMaxIf.md b/docs/guides/examples/aggregate_function_combinators/argMaxIf.md index 566f7ee29ab..9e0968a6c4d 100644 --- a/docs/guides/examples/aggregate_function_combinators/argMaxIf.md +++ b/docs/guides/examples/aggregate_function_combinators/argMaxIf.md @@ -18,7 +18,7 @@ The `argMaxIf` function is useful when you need to find the value associated wit the maximum value in a dataset, but only for rows that satisfy a specific condition. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll use a sample dataset of product sales to demonstrate how `argMaxIf` works. We'll find the product name that has the highest price, but diff --git a/docs/guides/examples/aggregate_function_combinators/argMinIf.md b/docs/guides/examples/aggregate_function_combinators/argMinIf.md index 76772f77c3c..2399d51525d 100644 --- a/docs/guides/examples/aggregate_function_combinators/argMinIf.md +++ b/docs/guides/examples/aggregate_function_combinators/argMinIf.md @@ -18,7 +18,7 @@ The `argMinIf` function is useful when you need to find the value associated with the minimum value in a dataset, but only for rows that satisfy a specific condition. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores product prices and their timestamps, and we'll use `argMinIf` to find the lowest price for each product when it's in stock. diff --git a/docs/guides/examples/aggregate_function_combinators/avgIf.md b/docs/guides/examples/aggregate_function_combinators/avgIf.md index 235f6150067..937f180d7cd 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgIf.md +++ b/docs/guides/examples/aggregate_function_combinators/avgIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be function to calculate the arithmetic mean of values for rows where the condition is true, using the `avgIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores sales data with success flags, and we'll use `avgIf` to calculate the average sale amount for successful transactions. diff --git a/docs/guides/examples/aggregate_function_combinators/avgMap.md b/docs/guides/examples/aggregate_function_combinators/avgMap.md index 1dcdbd0e51b..51f73f3cf48 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgMap.md +++ b/docs/guides/examples/aggregate_function_combinators/avgMap.md @@ -14,7 +14,7 @@ The [`Map`](/sql-reference/aggregate-functions/combinators#-map) combinator can function to calculate the arithmetic mean of values in a Map according to each key, using the `avgMap` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores status codes and their counts for different timeslots, where each row contains a Map of status codes to their corresponding counts. We'll use diff --git a/docs/guides/examples/aggregate_function_combinators/avgMerge.md b/docs/guides/examples/aggregate_function_combinators/avgMerge.md index 72fb2c9dbf5..34a9827561f 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgMerge.md +++ b/docs/guides/examples/aggregate_function_combinators/avgMerge.md @@ -14,7 +14,7 @@ The [`Merge`](/sql-reference/aggregate-functions/combinators#-state) combinator can be applied to the [`avg`](/sql-reference/aggregate-functions/reference/avg) function to produce a final result by combining partial aggregate states. -## Example Usage {#example-usage} +## Example usage {#example-usage} The `Merge` combinator is closely related to the `State` combinator. Refer to ["avgState example usage"](/examples/aggregate-function-combinators/avgState/#example-usage) diff --git a/docs/guides/examples/aggregate_function_combinators/avgMergeState.md b/docs/guides/examples/aggregate_function_combinators/avgMergeState.md index d4c132b16a3..916e21fb12a 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgMergeState.md +++ b/docs/guides/examples/aggregate_function_combinators/avgMergeState.md @@ -18,7 +18,7 @@ can be applied to the [`avg`](/sql-reference/aggregate-functions/reference/avg) function to merge partial aggregate states of type `AverageFunction(avg, T)` and return a new intermediate aggregation state. -## Example Usage {#example-usage} +## Example usage {#example-usage} The `MergeState` combinator is particularly useful for multi-level aggregation scenarios where you want to combine pre-aggregated states and maintain them as @@ -43,7 +43,7 @@ ORDER BY (region, server_id, timestamp); ``` We'll create a server-level aggregation target table and define an Incremental -Materialized View acting as an insert trigger to it: +materialized view acting as an insert trigger to it: ```sql CREATE TABLE server_performance @@ -88,7 +88,7 @@ AS SELECT FROM server_performance GROUP BY region, datacenter; --- datacenter level table and Materialized View +-- datacenter level table and materialized view CREATE TABLE datacenter_performance ( diff --git a/docs/guides/examples/aggregate_function_combinators/avgResample.md b/docs/guides/examples/aggregate_function_combinators/avgResample.md index 029efdecdc4..bdbeb9f91d5 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgResample.md +++ b/docs/guides/examples/aggregate_function_combinators/avgResample.md @@ -15,7 +15,7 @@ combinator can be applied to the [`count`](/sql-reference/aggregate-functions/re aggregate function to count values of a specified key column in a fixed number of intervals (`N`). -## Example Usage {#example-usage} +## Example usage {#example-usage} ### Basic example {#basic-example} diff --git a/docs/guides/examples/aggregate_function_combinators/avgState.md b/docs/guides/examples/aggregate_function_combinators/avgState.md index fa6536ffb8e..e0e2317701d 100644 --- a/docs/guides/examples/aggregate_function_combinators/avgState.md +++ b/docs/guides/examples/aggregate_function_combinators/avgState.md @@ -15,7 +15,7 @@ can be applied to the [`avg`](/sql-reference/aggregate-functions/reference/avg) function to produce an intermediate state of `AggregateFunction(avg, T)` type where `T` is the specified type for the average. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll look at how we can use the `AggregateFunction` type, together with the `avgState` function to aggregate website traffic data. @@ -49,7 +49,7 @@ ENGINE = AggregatingMergeTree() ORDER BY page_id; ``` -Create an Incremental Materialized View that will act as an insert trigger to +Create an Incremental materialized view that will act as an insert trigger to new data and store the intermediate state data in the target table defined above: ```sql diff --git a/docs/guides/examples/aggregate_function_combinators/countIf.md b/docs/guides/examples/aggregate_function_combinators/countIf.md index 1bd7c343e85..55caea9a253 100644 --- a/docs/guides/examples/aggregate_function_combinators/countIf.md +++ b/docs/guides/examples/aggregate_function_combinators/countIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be function to count the number of rows where the condition is true, using the `countIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores user login attempts, and we'll use `countIf` to count the number of successful logins. diff --git a/docs/guides/examples/aggregate_function_combinators/countResample.md b/docs/guides/examples/aggregate_function_combinators/countResample.md index c20f0aca74d..f90bb6a168c 100644 --- a/docs/guides/examples/aggregate_function_combinators/countResample.md +++ b/docs/guides/examples/aggregate_function_combinators/countResample.md @@ -15,7 +15,7 @@ combinator can be applied to the [`count`](/sql-reference/aggregate-functions/re aggregate function to count values of a specified key column in a fixed number of intervals (`N`). -## Example Usage {#example-usage} +## Example usage {#example-usage} ### Basic example {#basic-example} diff --git a/docs/guides/examples/aggregate_function_combinators/groupArrayDistinct.md b/docs/guides/examples/aggregate_function_combinators/groupArrayDistinct.md index fd98073b543..dd7350258fc 100644 --- a/docs/guides/examples/aggregate_function_combinators/groupArrayDistinct.md +++ b/docs/guides/examples/aggregate_function_combinators/groupArrayDistinct.md @@ -14,7 +14,7 @@ The [`groupArrayDistinct`](/sql-reference/aggregate-functions/combinators#-forea can be applied to the [`groupArray`](/sql-reference/aggregate-functions/reference/sum) aggregate function to create an array of distinct argument values. -## Example Usage {#example-usage} +## Example usage {#example-usage} For this example we'll make use of the `hits` dataset available in our [SQL playground](https://sql.clickhouse.com/). diff --git a/docs/guides/examples/aggregate_function_combinators/groupArrayResample.md b/docs/guides/examples/aggregate_function_combinators/groupArrayResample.md index 9abea133bb0..38176eaa49f 100644 --- a/docs/guides/examples/aggregate_function_combinators/groupArrayResample.md +++ b/docs/guides/examples/aggregate_function_combinators/groupArrayResample.md @@ -17,7 +17,7 @@ and construct the resulting array by selecting one representative value (corresponding to the minimum key) from the data points falling into each interval. It creates a downsampled view of the data rather than collecting all values. -## Example Usage {#example-usage} +## Example usage {#example-usage} Let's look at an example. We'll create a table which contains the `name`, `age` and `wage` of employees, and we'll insert some data into it: diff --git a/docs/guides/examples/aggregate_function_combinators/maxMap.md b/docs/guides/examples/aggregate_function_combinators/maxMap.md index dd2e524ac04..e1ffe4907fb 100644 --- a/docs/guides/examples/aggregate_function_combinators/maxMap.md +++ b/docs/guides/examples/aggregate_function_combinators/maxMap.md @@ -14,7 +14,7 @@ The [`Map`](/sql-reference/aggregate-functions/combinators#-map) combinator can function to calculate the maximum value in a Map according to each key, using the `maxMap` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores status codes and their counts for different timeslots, where each row contains a Map of status codes to their corresponding counts. We'll use diff --git a/docs/guides/examples/aggregate_function_combinators/maxSimpleState.md b/docs/guides/examples/aggregate_function_combinators/maxSimpleState.md index 2b758124b06..729637ce814 100644 --- a/docs/guides/examples/aggregate_function_combinators/maxSimpleState.md +++ b/docs/guides/examples/aggregate_function_combinators/maxSimpleState.md @@ -14,7 +14,7 @@ The [`SimpleState`](/sql-reference/aggregate-functions/combinators#-simplestate) function to return the maximum value across all input values. It returns the result with type `SimpleAggregateState`. -## Example Usage {#example-usage} +## Example usage {#example-usage} The example given in [`minSimpleState`](/examples/aggregate-function-combinators/minSimpleState/#example-usage) demonstrates a usage of both `maxSimpleState` and `minSimpleState`. diff --git a/docs/guides/examples/aggregate_function_combinators/minMap.md b/docs/guides/examples/aggregate_function_combinators/minMap.md index e72d0d86954..e8843244f64 100644 --- a/docs/guides/examples/aggregate_function_combinators/minMap.md +++ b/docs/guides/examples/aggregate_function_combinators/minMap.md @@ -14,7 +14,7 @@ The [`Map`](/sql-reference/aggregate-functions/combinators#-map) combinator can function to calculate the minimum value in a Map according to each key, using the `minMap` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores status codes and their counts for different timeslots, where each row contains a Map of status codes to their corresponding counts. We'll use diff --git a/docs/guides/examples/aggregate_function_combinators/minSimpleState.md b/docs/guides/examples/aggregate_function_combinators/minSimpleState.md index fdad0a2e374..0d7fa86a4ac 100644 --- a/docs/guides/examples/aggregate_function_combinators/minSimpleState.md +++ b/docs/guides/examples/aggregate_function_combinators/minSimpleState.md @@ -14,7 +14,7 @@ The [`SimpleState`](/sql-reference/aggregate-functions/combinators#-simplestate) function to return the minimum value across all input values. It returns the result with type [`SimpleAggregateFunction`](/docs/sql-reference/data-types/simpleaggregatefunction). -## Example Usage {#example-usage} +## Example usage {#example-usage} Let's look at a practical example using a table that tracks daily temperature readings. For each location, we want to maintain the lowest temperature recorded. @@ -49,7 +49,7 @@ ENGINE = AggregatingMergeTree() ORDER BY location_id; ``` -Create an Incremental Materialized View that will act as an insert trigger +Create an Incremental materialized view that will act as an insert trigger for inserted data and maintains the minimum, maximum temperatures per location. ```sql @@ -74,7 +74,7 @@ INSERT INTO raw_temperature_readings (location_id, location_name, temperature) V (4, 'East', 8); ``` -These readings are automatically processed by the Materialized View. Let's check +These readings are automatically processed by the materialized view. Let's check the current state: ```sql diff --git a/docs/guides/examples/aggregate_function_combinators/quantilesTimingArrayIf.md b/docs/guides/examples/aggregate_function_combinators/quantilesTimingArrayIf.md index 651734121f2..3df6548f982 100644 --- a/docs/guides/examples/aggregate_function_combinators/quantilesTimingArrayIf.md +++ b/docs/guides/examples/aggregate_function_combinators/quantilesTimingArrayIf.md @@ -15,7 +15,7 @@ combinator can be applied to the [`quantilesTiming`](/sql-reference/aggregate-fu function to calculate quantiles of timing values in arrays for rows where the condition is true, using the `quantilesTimingArrayIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores API response times for different endpoints, and we'll use `quantilesTimingArrayIf` to calculate response time quantiles for successful requests. diff --git a/docs/guides/examples/aggregate_function_combinators/quantilesTimingIf.md b/docs/guides/examples/aggregate_function_combinators/quantilesTimingIf.md index 6d33707032c..4dc1ae5743b 100644 --- a/docs/guides/examples/aggregate_function_combinators/quantilesTimingIf.md +++ b/docs/guides/examples/aggregate_function_combinators/quantilesTimingIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be function to calculate quantiles of timing values for rows where the condition is true, using the `quantilesTimingIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores API response times for different endpoints, and we'll use `quantilesTimingIf` to calculate response time quantiles for successful requests. diff --git a/docs/guides/examples/aggregate_function_combinators/sumArray.md b/docs/guides/examples/aggregate_function_combinators/sumArray.md index 3569d914cc9..6ce78553bfd 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumArray.md +++ b/docs/guides/examples/aggregate_function_combinators/sumArray.md @@ -18,7 +18,7 @@ aggregate combinator function. The `sumArray` function is useful when you need to calculate the total sum of all elements across multiple arrays in a dataset. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll use a sample dataset of daily sales across different product categories to demonstrate how `sumArray` works. We'll calculate the total diff --git a/docs/guides/examples/aggregate_function_combinators/sumForEach.md b/docs/guides/examples/aggregate_function_combinators/sumForEach.md index 1b3be84900e..8184ccf01f2 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumForEach.md +++ b/docs/guides/examples/aggregate_function_combinators/sumForEach.md @@ -15,7 +15,7 @@ can be applied to the [`sum`](/sql-reference/aggregate-functions/reference/sum) function which operates on row values to an aggregate function which operates on array columns, applying the aggregate to each element in the array across rows. -## Example Usage {#example-usage} +## Example usage {#example-usage} For this example we'll make use of the `hits` dataset available in our [SQL playground](https://sql.clickhouse.com/). diff --git a/docs/guides/examples/aggregate_function_combinators/sumIf.md b/docs/guides/examples/aggregate_function_combinators/sumIf.md index 324868f6344..6161be33acf 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumIf.md +++ b/docs/guides/examples/aggregate_function_combinators/sumIf.md @@ -14,7 +14,7 @@ The [`If`](/sql-reference/aggregate-functions/combinators#-if) combinator can be function to calculate the sum of values for rows where the condition is true, using the `sumIf` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores sales data with success flags, and we'll use `sumIf` to calculate the total sales amount for successful transactions. diff --git a/docs/guides/examples/aggregate_function_combinators/sumMap.md b/docs/guides/examples/aggregate_function_combinators/sumMap.md index 17829dbb144..fda6b895388 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumMap.md +++ b/docs/guides/examples/aggregate_function_combinators/sumMap.md @@ -14,7 +14,7 @@ The [`Map`](/sql-reference/aggregate-functions/combinators#-map) combinator can function to calculate the sum of values in a Map according to each key, using the `sumMap` aggregate combinator function. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll create a table that stores status codes and their counts for different timeslots, where each row contains a Map of status codes to their corresponding counts. We'll use diff --git a/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md b/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md index d048462ba85..35b8759ea35 100644 --- a/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md +++ b/docs/guides/examples/aggregate_function_combinators/sumSimpleState.md @@ -14,7 +14,7 @@ The [`SimpleState`](/sql-reference/aggregate-functions/combinators#-simplestate) function to return the sum across all input values. It returns the result with type [`SimpleAggregateFunction`](/docs/sql-reference/data-types/simpleaggregatefunction). -## Example Usage {#example-usage} +## Example usage {#example-usage} ### Tracking upvotes and downvotes {#tracking-post-votes} @@ -51,7 +51,7 @@ ENGINE = AggregatingMergeTree() ORDER BY post_id; ``` -We then create a Materialized View with `SimpleAggregateFunction` type columns: +We then create a materialized view with `SimpleAggregateFunction` type columns: ```sql CREATE MATERIALIZED VIEW mv_vote_processor TO vote_aggregates @@ -79,7 +79,7 @@ INSERT INTO raw_votes VALUES (3, 'downvote'); ``` -Query the Materialized View using the `SimpleState` combinator: +Query the materialized view using the `SimpleState` combinator: ```sql SELECT diff --git a/docs/guides/examples/aggregate_function_combinators/uniqArray.md b/docs/guides/examples/aggregate_function_combinators/uniqArray.md index 8d2181e829a..89f73200e71 100644 --- a/docs/guides/examples/aggregate_function_combinators/uniqArray.md +++ b/docs/guides/examples/aggregate_function_combinators/uniqArray.md @@ -19,7 +19,7 @@ The `uniqArray` function is useful when you need to count unique elements across multiple arrays in a dataset. It's equivalent to using `uniq(arrayJoin())`, where `arrayJoin` first flattens the arrays and then `uniq` counts the unique elements. -## Example Usage {#example-usage} +## Example usage {#example-usage} In this example, we'll use a sample dataset of user interests across different categories to demonstrate how `uniqArray` works. We'll compare it with diff --git a/docs/guides/examples/aggregate_function_combinators/uniqArrayIf.md b/docs/guides/examples/aggregate_function_combinators/uniqArrayIf.md index 04caee8afc3..31470be7e31 100644 --- a/docs/guides/examples/aggregate_function_combinators/uniqArrayIf.md +++ b/docs/guides/examples/aggregate_function_combinators/uniqArrayIf.md @@ -21,7 +21,7 @@ condition is true, using the `uniqArrayIf` aggregate combinator function. This is useful when you want to count unique elements in an array based on specific conditions without having to use `arrayJoin`. -## Example Usage {#example-usage} +## Example usage {#example-usage} ### Count unique products viewed by segment type and engagement level {#count-unique-products} diff --git a/docs/guides/inserting-data.md b/docs/guides/inserting-data.md index 2c9a9e2fd5a..4e26d2c13ff 100644 --- a/docs/guides/inserting-data.md +++ b/docs/guides/inserting-data.md @@ -10,7 +10,7 @@ show_related_blogs: true import postgres_inserts from '@site/static/images/guides/postgres-inserts.png'; import Image from '@theme/IdealImage'; -## Basic Example {#basic-example} +## Basic example {#basic-example} You can use the familiar `INSERT INTO TABLE` command with ClickHouse. Let's insert some data into the table that we created in the start guide ["Creating Tables in ClickHouse"](./creating-tables). @@ -38,7 +38,7 @@ user_id message timestamp 102 Sort your data based on your commonly-used queries 2024-11-13 00:00:00 2.718 ``` -## Inserting into ClickHouse vs. OLTP Databases {#inserting-into-clickhouse-vs-oltp-databases} +## Inserting into ClickHouse vs. OLTP databases {#inserting-into-clickhouse-vs-oltp-databases} As an OLAP (Online Analytical Processing) database, ClickHouse is optimized for high performance and scalability, allowing potentially millions of rows to be inserted per second. This is achieved through a combination of a highly parallelized architecture and efficient column-oriented compression, but with compromises on immediate consistency. @@ -51,7 +51,7 @@ These transactions can potentially involve a small number of rows at a time, wit To achieve high insert performance while maintaining strong consistency guarantees, users should adhere to the simple rules described below when inserting data into ClickHouse. Following these rules will help to avoid issues users commonly encounter the first time they use ClickHouse, and try to replicate an insert strategy that works for OLTP databases. -## Best Practices for Inserts {#best-practices-for-inserts} +## Best practices for Inserts {#best-practices-for-inserts} ### Insert in large batch sizes {#insert-in-large-batch-sizes} @@ -118,7 +118,7 @@ These are optimized to ensure that inserts are performed correctly and natively See [Clients and Drivers](/interfaces/cli) for a full list of available ClickHouse clients and drivers. -### Prefer the Native format {#prefer-the-native-format} +### Prefer the native format {#prefer-the-native-format} ClickHouse supports many [input formats](/interfaces/formats) at insert (and query) time. This is a significant difference with OLTP databases and makes loading data from external sources much easier - especially when coupled with [table functions](/sql-reference/table-functions) and the ability to load data from files on disk. diff --git a/docs/guides/joining-tables.md b/docs/guides/joining-tables.md index 1d588e5129b..f6e6073de5a 100644 --- a/docs/guides/joining-tables.md +++ b/docs/guides/joining-tables.md @@ -123,7 +123,7 @@ WHERE (VoteTypeId = 2) AND (PostId IN ( Peak memory usage: 250.66 MiB. ``` -## Choosing a join algorithm {#choosing-a-join-algorithm} +## Choosing a JOIN algorithm {#choosing-a-join-algorithm} ClickHouse supports a number of [join algorithms](https://clickhouse.com/blog/clickhouse-fully-supports-joins-part1). These algorithms typically trade memory usage for performance. The following provides an overview of the ClickHouse join algorithms based on their relative memory consumption and execution time: diff --git a/docs/guides/manage-and-deploy-index.md b/docs/guides/manage-and-deploy-index.md index 68159deef47..388bfcaa4c1 100644 --- a/docs/guides/manage-and-deploy-index.md +++ b/docs/guides/manage-and-deploy-index.md @@ -4,7 +4,7 @@ description: 'Overview page for Manage and Deploy' slug: /guides/manage-and-deploy-index --- -# Manage and Deploy +# Manage and deploy This section contains the following topics: @@ -12,7 +12,7 @@ This section contains the following topics: |-------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------| | [Deployment and Scaling](/deployment-guides/index) | Working deployment examples based on the advice provided to ClickHouse users by the ClickHouse Support and Services organization. | | [Separation of Storage and Compute](/guides/separation-storage-compute) | Guide exploring how you can use ClickHouse and S3 to implement an architecture with separated storage and compute. | -| [Sizing and Hardware Recommendations](/guides/sizing-and-hardware-recommendations) | Guide discussing general recommendations regarding hardware, compute, memory, and disk configurations for open-source users. | +| [Sizing and hardware recommendations'](/guides/sizing-and-hardware-recommendations) | Guide discussing general recommendations regarding hardware, compute, memory, and disk configurations for open-source users. | | [Configuring ClickHouse Keeper](/guides/sre/keeper/clickhouse-keeper) | Information and examples on how to configure ClickHouse Keeper. | | [Network ports](/guides/sre/network-ports) | List of network ports used by ClickHouse. | | [Re-balancing Shards](/guides/sre/scaling-clusters) | Recommendations on re-balancing shards. | diff --git a/docs/guides/separation-storage-compute.md b/docs/guides/separation-storage-compute.md index 8fe6d258ac8..b404f55e7f1 100644 --- a/docs/guides/separation-storage-compute.md +++ b/docs/guides/separation-storage-compute.md @@ -10,7 +10,7 @@ import Image from '@theme/IdealImage'; import BucketDetails from '@site/docs/_snippets/_S3_authentication_and_bucket.md'; import s3_bucket_example from '@site/static/images/guides/s3_bucket_example.png'; -# Separation of Storage and Compute +# Separation of storage and compute ## Overview {#overview} @@ -170,7 +170,7 @@ For fault tolerance, you can use multiple ClickHouse server nodes distributed ac Replication with S3 disks can be accomplished by using the `ReplicatedMergeTree` table engine. See the following guide for details: - [Replicating a single shard across two AWS regions using S3 Object Storage](/integrations/s3#s3-multi-region). -## Further Reading {#further-reading} +## Further reading {#further-reading} - [SharedMergeTree table engine](/cloud/reference/shared-merge-tree) - [SharedMergeTree announcement blog](https://clickhouse.com/blog/clickhouse-cloud-boosts-performance-with-sharedmergetree-and-lightweight-updates) diff --git a/docs/guides/sre/keeper/index.md b/docs/guides/sre/keeper/index.md index b25fa32da25..116f19cc1f0 100644 --- a/docs/guides/sre/keeper/index.md +++ b/docs/guides/sre/keeper/index.md @@ -160,7 +160,7 @@ If you don't have the symlink (`clickhouse-keeper`) you can create it or specify clickhouse keeper --config /etc/your_path_to_config/config.xml ``` -### Four Letter Word Commands {#four-letter-word-commands} +### Four letter word commands {#four-letter-word-commands} ClickHouse Keeper also provides 4lw commands which are almost the same with Zookeeper. Each command is composed of four letters such as `mntr`, `stat` etc. There are some more interesting commands: `stat` gives some general information about the server and connected clients, while `srvr` and `cons` give extended details on server and connections respectively. @@ -409,7 +409,7 @@ AIOWriteBytes 0 Number of bytes written with Linux or FreeBSD AIO interf ... ``` -### HTTP Control {#http-control} +### HTTP control {#http-control} ClickHouse Keeper provides an HTTP interface to check if a replica is ready to receive traffic. It may be used in cloud environments, such as [Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes). @@ -672,11 +672,11 @@ curl 127.0.0.1:9363/metrics Please also see the ClickHouse Cloud [Prometheus integration](/integrations/prometheus). -## ClickHouse Keeper User Guide {#clickhouse-keeper-user-guide} +## ClickHouse Keeper user guide {#clickhouse-keeper-user-guide} This guide provides simple and minimal settings to configure ClickHouse Keeper with an example on how to test distributed operations. This example is performed using 3 nodes on Linux. -### 1. Configure Nodes with Keeper settings {#1-configure-nodes-with-keeper-settings} +### 1. Configure nodes with Keeper settings {#1-configure-nodes-with-keeper-settings} 1. Install 3 ClickHouse instances on 3 hosts (`chnode1`, `chnode2`, `chnode3`). (View the [Quick Start](/getting-started/install/install.mdx) for details on installing ClickHouse.) @@ -947,7 +947,7 @@ this avoids having to wait several minutes for Keeper garbage collection to remove path entries as each time a path is created a new `uuid` is used in that path; paths are never reused. -### Example Environment {#example-environment} +### Example environment {#example-environment} A three node cluster that will be configured to have ClickHouse Keeper on all three nodes, and ClickHouse on two of the nodes. This provides ClickHouse Keeper with three nodes (including a tiebreaker node), and @@ -1340,7 +1340,7 @@ Sometimes it's necessary to extend experimental keeper node into a cluster. Here To get confident with the process, here's a [sandbox repository](https://github.com/ClickHouse/keeper-extend-cluster). -## Unsupported Features {#unsupported-features} +## Unsupported features {#unsupported-features} While ClickHouse Keeper aims to be fully compatible with ZooKeeper, there are some features that are currently not implemented (although development is ongoing): diff --git a/docs/guides/sre/scaling-clusters.md b/docs/guides/sre/scaling-clusters.md index d2663f78f12..b9a6bfecbcd 100644 --- a/docs/guides/sre/scaling-clusters.md +++ b/docs/guides/sre/scaling-clusters.md @@ -6,7 +6,7 @@ description: 'ClickHouse does not support automatic shard rebalancing, so we pro title: 'Rebalancing Data' --- -# Rebalancing Data +# Rebalancing data ClickHouse does not support automatic shard rebalancing. However, there are ways to rebalance shards in order of preference: diff --git a/docs/guides/sre/user-management/configuring-ldap.md b/docs/guides/sre/user-management/configuring-ldap.md index e660e6ac459..e2ca9b41230 100644 --- a/docs/guides/sre/user-management/configuring-ldap.md +++ b/docs/guides/sre/user-management/configuring-ldap.md @@ -8,7 +8,7 @@ description: 'Describes how to configure ClickHouse to use LDAP for authenticati import SelfManaged from '@site/docs/_snippets/_self_managed_only_no_roadmap.md'; -# Configuring ClickHouse to Use LDAP for Authentication and Role Mapping +# Configuring ClickHouse to use LDAP for authentication and role mapping diff --git a/docs/guides/sre/user-management/index.md b/docs/guides/sre/user-management/index.md index c25d07c17ca..c1d3c587358 100644 --- a/docs/guides/sre/user-management/index.md +++ b/docs/guides/sre/user-management/index.md @@ -7,7 +7,7 @@ keywords: ['ClickHouse Cloud', 'Access Control', 'User Management', 'RBAC', 'Sec description: 'Describes access control and account management in ClickHouse Cloud' --- -# Creating Users and Roles in ClickHouse +# Creating users and roles in ClickHouse ClickHouse supports access control management based on [RBAC](https://en.wikipedia.org/wiki/Role-based_access_control) approach. @@ -33,7 +33,7 @@ You can't manage the same access entity by both configuration methods simultaneo ::: :::note -If you are looking to manage ClickHouse Cloud Console users, please refer to this [page](/cloud/security/cloud-access-management) +If you are looking to manage ClickHouse Cloud console users, please refer to this [page](/cloud/security/cloud-access-management) ::: To see all users, roles, profiles, etc. and all their grants use [`SHOW ACCESS`](/sql-reference/statements/show#show-access) statement. @@ -48,13 +48,13 @@ If you just started using ClickHouse, consider the following scenario: 2. Log in to the `default` user account and create all the required users. Don't forget to create an administrator account (`GRANT ALL ON *.* TO admin_user_account WITH GRANT OPTION`). 3. [Restrict permissions](/operations/settings/permissions-for-queries) for the `default` user and disable SQL-driven access control and account management for it. -### Properties of Current Solution {#access-control-properties} +### Properties of current solution {#access-control-properties} - You can grant permissions for databases and tables even if they do not exist. - If a table is deleted, all the privileges that correspond to this table are not revoked. This means that even if you create a new table with the same name later, all the privileges remain valid. To revoke privileges corresponding to the deleted table, you need to execute, for example, the `REVOKE ALL PRIVILEGES ON db.table FROM ALL` query. - There are no lifetime settings for privileges. -### User Account {#user-account-management} +### User account {#user-account-management} A user account is an access entity that allows to authorize someone in ClickHouse. A user account contains: @@ -75,7 +75,7 @@ Management queries: - [SHOW CREATE USER](/sql-reference/statements/show#show-create-user) - [SHOW USERS](/sql-reference/statements/show#show-users) -### Settings Applying {#access-control-settings-applying} +### Settings applying {#access-control-settings-applying} Settings can be configured differently: for a user account, in its granted roles and in settings profiles. At user login, if a setting is configured for different access entities, the value and constraints of this setting are applied as follows (from higher to lower priority): @@ -106,7 +106,7 @@ Management queries: Privileges can be granted to a role by the [GRANT](/sql-reference/statements/grant.md) query. To revoke privileges from a role ClickHouse provides the [REVOKE](/sql-reference/statements/revoke.md) query. -#### Row Policy {#row-policy-management} +#### Row policy {#row-policy-management} Row policy is a filter that defines which of the rows are available to a user or a role. Row policy contains filters for one particular table, as well as a list of roles and/or users which should use this row policy. @@ -122,7 +122,7 @@ Management queries: - [SHOW CREATE ROW POLICY](/sql-reference/statements/show#show-create-row-policy) - [SHOW POLICIES](/sql-reference/statements/show#show-policies) -### Settings Profile {#settings-profiles-management} +### Settings profile {#settings-profiles-management} Settings profile is a collection of [settings](/operations/settings/index.md). Settings profile contains settings and constraints, as well as a list of roles and/or users to which this profile is applied. @@ -149,7 +149,7 @@ Management queries: - [SHOW QUOTA](/sql-reference/statements/show#show-quota) - [SHOW QUOTAS](/sql-reference/statements/show#show-quotas) -### Enabling SQL-driven Access Control and Account Management {#enabling-access-control} +### Enabling SQL-driven access control and account management {#enabling-access-control} - Setup a directory for configuration storage. @@ -160,7 +160,7 @@ Management queries: By default, SQL-driven access control and account management is disabled for all users. You need to configure at least one user in the `users.xml` configuration file and set the values of the [`access_management`](/operations/settings/settings-users.md#access_management-user-setting), `named_collection_control`, `show_named_collections`, and `show_named_collections_secrets` settings to 1. -## Defining SQL Users and Roles {#defining-sql-users-and-roles} +## Defining SQL users and roles {#defining-sql-users-and-roles} :::tip If you are working in ClickHouse Cloud, please see [Cloud access management](/cloud/security/cloud-access-management). diff --git a/docs/guides/sre/user-management/ssl-user-auth.md b/docs/guides/sre/user-management/ssl-user-auth.md index 606c3c41330..c7135ff5261 100644 --- a/docs/guides/sre/user-management/ssl-user-auth.md +++ b/docs/guides/sre/user-management/ssl-user-auth.md @@ -6,7 +6,7 @@ title: 'Configuring SSL User Certificate for Authentication' description: 'This guide provides simple and minimal settings to configure authentication with SSL user certificates.' --- -# Configuring SSL User Certificate for Authentication +# Configuring SSL user certificate for authentication import SelfManaged from '@site/docs/_snippets/_self_managed_only_no_roadmap.md'; diff --git a/docs/integrations/data-ingestion/clickpipes/kafka.md b/docs/integrations/data-ingestion/clickpipes/kafka.md index 95e7001ef85..bc9e174c084 100644 --- a/docs/integrations/data-ingestion/clickpipes/kafka.md +++ b/docs/integrations/data-ingestion/clickpipes/kafka.md @@ -80,7 +80,7 @@ without an embedded schema id, then the specific schema ID or subject must be sp 9. Finally, you can configure permissions for the internal ClickPipes user. **Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role: - - `Full access`: with the full access to the cluster. It might be useful if you use Materialized View or Dictionary with the destination table. + - `Full access`: with the full access to the cluster. It might be useful if you use materialized view or Dictionary with the destination table. - `Only destination table`: with the `INSERT` permissions to the destination table only. Permissions diff --git a/docs/integrations/data-ingestion/clickpipes/kinesis.md b/docs/integrations/data-ingestion/clickpipes/kinesis.md index 772e4a5ab24..90d27825030 100644 --- a/docs/integrations/data-ingestion/clickpipes/kinesis.md +++ b/docs/integrations/data-ingestion/clickpipes/kinesis.md @@ -62,7 +62,7 @@ You have familiarized yourself with the [ClickPipes intro](./index.md) and setup 8. Finally, you can configure permissions for the internal ClickPipes user. **Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role: - - `Full access`: with the full access to the cluster. It might be useful if you use Materialized View or Dictionary with the destination table. + - `Full access`: with the full access to the cluster. It might be useful if you use materialized view or Dictionary with the destination table. - `Only destination table`: with the `INSERT` permissions to the destination table only. Permissions diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/index.md b/docs/integrations/data-ingestion/clickpipes/mysql/index.md index d0966fc46e7..2c6fd8a3f3e 100644 --- a/docs/integrations/data-ingestion/clickpipes/mysql/index.md +++ b/docs/integrations/data-ingestion/clickpipes/mysql/index.md @@ -49,7 +49,7 @@ Once your source MySQL database is set up, you can continue creating your ClickP Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/). [//]: # ( TODO update image here) -1. In the ClickHouse Cloud Console, navigate to your ClickHouse Cloud Service. +1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service. ClickPipes service diff --git a/docs/integrations/data-ingestion/clickpipes/mysql/source/gcp.md b/docs/integrations/data-ingestion/clickpipes/mysql/source/gcp.md index 90a78354300..39ffe0e75da 100644 --- a/docs/integrations/data-ingestion/clickpipes/mysql/source/gcp.md +++ b/docs/integrations/data-ingestion/clickpipes/mysql/source/gcp.md @@ -65,14 +65,14 @@ Connect to your Cloud SQL MySQL instance as the root user and execute the follow ## Configure network access {#configure-network-access-gcp-mysql} If you want to restrict traffic to your Cloud SQL instance, please add the [documented static NAT IPs](../../index.md#list-of-static-ips) to the allowlisted IPs of your Cloud SQL MySQL instance. -This can be done either by editing the instance or by heading over to the `Connections` tab in the sidebar in Cloud Console. +This can be done either by editing the instance or by heading over to the `Connections` tab in the sidebar in Cloud console. IP allowlisting in GCP MySQL ## Download and Use Root CA certificate {#download-root-ca-certificate-gcp-mysql} To connect to your Cloud SQL instance, you need to download the root CA certificate. -1. Go to your Cloud SQL instance in the Cloud Console. +1. Go to your Cloud SQL instance in the Cloud console. 2. Click on `Connections` in the sidebar. 3. Click on the `Security` tab. 4. In the `Manage server CA certificates` section, click on the `DOWNLOAD CERTIFICATES` button at the bottom. diff --git a/docs/integrations/data-ingestion/clickpipes/object-storage.md b/docs/integrations/data-ingestion/clickpipes/object-storage.md index 6ab6284b697..de03658b58c 100644 --- a/docs/integrations/data-ingestion/clickpipes/object-storage.md +++ b/docs/integrations/data-ingestion/clickpipes/object-storage.md @@ -67,7 +67,7 @@ You can also map [virtual columns](../../sql-reference/table-functions/s3#virtua 7. Finally, you can configure permissions for the internal ClickPipes user. **Permissions:** ClickPipes will create a dedicated user for writing data into a destination table. You can select a role for this internal user using a custom role or one of the predefined role: - - `Full access`: with the full access to the cluster. Required if you use Materialized View or Dictionary with the destination table. + - `Full access`: with the full access to the cluster. Required if you use materialized view or Dictionary with the destination table. - `Only destination table`: with the `INSERT` permissions to the destination table only. Permissions diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md b/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md index 6e68fe9e57c..e67883efaff 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/deduplication.md @@ -181,7 +181,7 @@ LIMIT 10 #### Refreshable Material view {#refreshable-material-view} -Another approach is to use a [Refreshable Materialized View](/materialized-view/refreshable-materialized-view), which enables you to schedule query execution for deduplicating rows and storing the results in a destination table. With each scheduled refresh, the destination table is replaced with the latest query results. +Another approach is to use a [refreshable materialized view](/materialized-view/refreshable-materialized-view), which enables you to schedule query execution for deduplicating rows and storing the results in a destination table. With each scheduled refresh, the destination table is replaced with the latest query results. The key advantage of this method is that the query using the FINAL keyword runs only once during the refresh, eliminating the need for subsequent queries on the destination table to use FINAL. diff --git a/docs/integrations/data-ingestion/clickpipes/postgres/index.md b/docs/integrations/data-ingestion/clickpipes/postgres/index.md index ff7513b65e8..af6a1a4a7e6 100644 --- a/docs/integrations/data-ingestion/clickpipes/postgres/index.md +++ b/docs/integrations/data-ingestion/clickpipes/postgres/index.md @@ -56,7 +56,7 @@ Once your source Postgres database is set up, you can continue creating your Cli Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/). [//]: # ( TODO update image here) -1. In the ClickHouse Cloud Console, navigate to your ClickHouse Cloud Service. +1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service. ClickPipes service diff --git a/docs/integrations/data-ingestion/dbms/dynamodb/index.md b/docs/integrations/data-ingestion/dbms/dynamodb/index.md index 8ce600102db..972cd108ae1 100644 --- a/docs/integrations/data-ingestion/dbms/dynamodb/index.md +++ b/docs/integrations/data-ingestion/dbms/dynamodb/index.md @@ -61,12 +61,12 @@ The snapshot data from DynamoDB will look something this: } ``` -Observe that the data is in a nested format. We will need to flatten this data before loading it into ClickHouse. This can be done using the `JSONExtract` function in ClickHouse in a Materialized View. +Observe that the data is in a nested format. We will need to flatten this data before loading it into ClickHouse. This can be done using the `JSONExtract` function in ClickHouse in a materialized view. We will want to create three tables: 1. A table to store the raw data from DynamoDB 2. A table to store the final flattened data (destination table) -3. A Materialized View to flatten the data +3. A materialized view to flatten the data For the example DynamoDB data above, the ClickHouse tables would look like this: diff --git a/docs/integrations/data-ingestion/emqx/index.md b/docs/integrations/data-ingestion/emqx/index.md index 731dbe74151..430a8f3ced3 100644 --- a/docs/integrations/data-ingestion/emqx/index.md +++ b/docs/integrations/data-ingestion/emqx/index.md @@ -107,7 +107,7 @@ Start at the [EMQX Cloud sign up](https://accounts.emqx.com/signup?continue=http ### Create an MQTT cluster {#create-an-mqtt-cluster} -Once logged in, click on "Cloud Console" under the account menu and you will be able to see the green button to create a new deployment. +Once logged in, click on "Cloud console" under the account menu and you will be able to see the green button to create a new deployment. EMQX Cloud Create Deployment Step 1 showing deployment options diff --git a/docs/integrations/data-ingestion/gcs/index.md b/docs/integrations/data-ingestion/gcs/index.md index 548b6c95489..6c136b603f9 100644 --- a/docs/integrations/data-ingestion/gcs/index.md +++ b/docs/integrations/data-ingestion/gcs/index.md @@ -624,7 +624,7 @@ formatReadableSize(total_bytes): 36.42 MiB 1 row in set. Elapsed: 0.002 sec. ``` -#### Verify in Google Cloud Console {#verify-in-google-cloud-console} +#### Verify in Google Cloud console {#verify-in-google-cloud-console} Looking at the buckets you will see that a folder was created in each bucket with the name that was used in the `storage.xml` configuration file. Expand the folders and you will see many files, representing the data partitions. #### Bucket for replica one {#bucket-for-replica-one} diff --git a/docs/integrations/data-ingestion/google-dataflow/java-runner.md b/docs/integrations/data-ingestion/google-dataflow/java-runner.md index 08926d6c423..dc6a4f07eaa 100644 --- a/docs/integrations/data-ingestion/google-dataflow/java-runner.md +++ b/docs/integrations/data-ingestion/google-dataflow/java-runner.md @@ -14,7 +14,7 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported'; The Dataflow Java Runner lets you execute custom Apache Beam pipelines on Google Cloud's Dataflow service. This approach provides maximum flexibility and is well-suited for advanced ETL workflows. -## How It Works {#how-it-works} +## How it works {#how-it-works} 1. **Pipeline Implementation** To use the Java Runner, you need to implement your Beam pipeline using the `ClickHouseIO` - our official Apache Beam connector. For code examples and instructions on how to use the `ClickHouseIO`, please visit [ClickHouse Apache Beam](/integrations/apache-beam). diff --git a/docs/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md b/docs/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md index 5dd1db34834..9918f44f7d4 100644 --- a/docs/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md +++ b/docs/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md @@ -136,7 +136,7 @@ job: ### Monitor the Job {#monitor-the-job} -Navigate to the [Dataflow Jobs tab](https://console.cloud.google.com/dataflow/jobs) in your Google Cloud Console to +Navigate to the [Dataflow Jobs tab](https://console.cloud.google.com/dataflow/jobs) in your Google Cloud console to monitor the status of the job. You'll find the job details, including progress and any errors: DataFlow console showing a running BigQuery to ClickHouse job diff --git a/docs/intro.md b/docs/intro.md index f7473e88d93..27679f8e1a3 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -65,7 +65,7 @@ Because the block-wise storage and transfer from disk to memory is aligned with Column-oriented database structure -## Data Replication and Integrity {#data-replication-and-integrity} +## Data replication and integrity {#data-replication-and-integrity} ClickHouse uses an asynchronous multi-master replication scheme to ensure that data is stored redundantly on multiple nodes. After being written to any available replica, all the remaining replicas retrieve their copy in the background. The system maintains identical data on different replicas. Recovery after most failures is performed automatically, or semi-automatically in complex cases. @@ -73,7 +73,7 @@ ClickHouse uses an asynchronous multi-master replication scheme to ensure that d ClickHouse implements user account management using SQL queries and allows for role-based access control configuration similar to what can be found in ANSI SQL standard and popular relational database management systems. -## SQL Support {#sql-support} +## SQL support {#sql-support} ClickHouse supports a [declarative query language based on SQL](/sql-reference) that is identical to the ANSI SQL standard in many cases. Supported query clauses include [GROUP BY](/sql-reference/statements/select/group-by), [ORDER BY](/sql-reference/statements/select/order-by), subqueries in [FROM](/sql-reference/statements/select/from), [JOIN](/sql-reference/statements/select/join) clause, [IN](/sql-reference/operators/in) operator, [window functions](/sql-reference/window-functions) and scalar subqueries. @@ -100,12 +100,12 @@ OLAP scenarios require real-time responses on top of large datasets for complex - Only a few columns are selected to answer any particular query - Results must be returned in milliseconds or seconds -## Column-Oriented vs Row-Oriented Databases {#column-oriented-vs-row-oriented-databases} +## Column-oriented vs row-oriented databases {#column-oriented-vs-row-oriented-databases} In a row-oriented DBMS, data is stored in rows, with all the values related to a row physically stored next to each other. In a column-oriented DBMS, data is stored in columns, with values from the same columns stored together. -## Why Column-Oriented Databases Work Better in the OLAP Scenario {#why-column-oriented-databases-work-better-in-the-olap-scenario} +## Why column-oriented databases work better in the OLAP scenario {#why-column-oriented-databases-work-better-in-the-olap-scenario} Column-oriented databases are better suited to OLAP scenarios: they are at least 100 times faster in processing most queries. The reasons are explained in detail below, but the fact is easier to demonstrate visually: @@ -123,7 +123,7 @@ Helpful articles to dive deeper into this topic include: - [Distinctive Features of ClickHouse](/about-us/distinctive-features.md) - [FAQ: Why is ClickHouse so fast?](/knowledgebase/why-clickhouse-is-so-fast) -## Processing Analytical Queries in Real Time {#processing-analytical-queries-in-real-time} +## Processing analytical queries in real time {#processing-analytical-queries-in-real-time} In a row-oriented DBMS, data is stored in this order: @@ -156,7 +156,7 @@ Different orders for storing data are better suited to different scenarios. The The higher the load on the system, the more important it is to customize the system set up to match the requirements of the usage scenario, and the more fine grained this customization becomes. There is no system that is equally well-suited to significantly different scenarios. If a system is adaptable to a wide set of scenarios, under a high load, the system will handle all the scenarios equally poorly, or will work well for just one or few of possible scenarios. -### Key Properties of OLAP Scenario {#key-properties-of-olap-scenario} +### Key properties of the OLAP scenario {#key-properties-of-olap-scenario} - Tables are "wide," meaning they contain a large number of columns. - Datasets are large and queries require high throughput when processing a single query (up to billions of rows per second per server). diff --git a/docs/managing-data/deleting-data/index.md b/docs/managing-data/deleting-data/index.md index 37d94625dcb..718507f9671 100644 --- a/docs/managing-data/deleting-data/index.md +++ b/docs/managing-data/deleting-data/index.md @@ -8,11 +8,11 @@ keywords: ['delete', 'truncate', 'drop', 'lightweight delete'] In this section of the documentation, we will explore how to delete data in ClickHouse. -| Page | Description | -|---------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------| +| Page | Description | +|-------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------| | [Overview](/deletes/overview) | Provides an overview of the various ways to delete data in ClickHouse. | -| [Lightweight Deletes](/guides/developer/lightweight-delete) | Learn how to use the Lightweight Delete to delete data. | -| [Delete Mutations](/managing-data/delete_mutations) | Learn about Delete Mutations. | -| [Truncate Table](/managing-data/truncate) | Learn about how to use Truncate, which allows the data in a table or database to be removed, while preserving its existence. | -| [Drop Partitions](/managing-data/drop_partition) | Learn about Dropping Partitions in ClickHouse. | +| [Lightweight deletes](/guides/developer/lightweight-delete) | Learn how to use the Lightweight Delete to delete data. | +| [Delete mutations](/managing-data/delete_mutations) | Learn about Delete Mutations. | +| [Truncate table](/managing-data/truncate) | Learn about how to use Truncate, which allows the data in a table or database to be removed, while preserving its existence. | +| [Drop partitions](/managing-data/drop_partition) | Learn about Dropping Partitions in ClickHouse. | diff --git a/docs/managing-data/deleting-data/overview.md b/docs/managing-data/deleting-data/overview.md index 946ad268a2c..224923946ba 100644 --- a/docs/managing-data/deleting-data/overview.md +++ b/docs/managing-data/deleting-data/overview.md @@ -66,6 +66,6 @@ ALTER TABLE posts (DROP PARTITION '2008') Read more about [DROP PARTITION](/sql-reference/statements/alter/partition). -## More Resources {#more-resources} +## More resources {#more-resources} - [Handling Updates and Deletes in ClickHouse](https://clickhouse.com/blog/handling-updates-and-deletes-in-clickhouse) diff --git a/docs/managing-data/drop_partition.md b/docs/managing-data/drop_partition.md index d221c34b0d3..d3476ac4280 100644 --- a/docs/managing-data/drop_partition.md +++ b/docs/managing-data/drop_partition.md @@ -31,7 +31,7 @@ Read about setting the partition expression in a section [How to set the partiti In ClickHouse, users should principally consider partitioning to be a data management feature, not a query optimization technique. By separating data logically based on a key, each partition can be operated on independently e.g. deleted. This allows users to move partitions, and thus subsets, between [storage tiers](/integrations/s3#storage-tiers) efficiently on time or [expire data/efficiently delete from the cluster](/sql-reference/statements/alter/partition). -## Drop Partitions {#drop-partitions} +## Drop partitions {#drop-partitions} `ALTER TABLE ... DROP PARTITION` provides a cost-efficient way to drop a whole partition. diff --git a/docs/managing-data/updating-data/index.md b/docs/managing-data/updating-data/index.md index 566ce47dcc7..ea49014f1c5 100644 --- a/docs/managing-data/updating-data/index.md +++ b/docs/managing-data/updating-data/index.md @@ -7,9 +7,9 @@ keywords: ['update', 'updating data'] In this section of the documentation, you will learn how you can update your data. -| Page | Description | -|----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [Overview](/updating-data/overview) | Provides an overview of the differences in updating data between ClickHouse and OLTP databases, as well as the various methods available to do so in ClickHouse. | -| [Update Mutations](/managing-data/update_mutations) | Learn how to update using Update Mutations. | -| [Lightweight Updates](/guides/developer/lightweight-update) | Learn how to update using Lightweight Updates. | -| [ReplacingMergeTree](/guides/replacing-merge-tree) | Learn how to update using the ReplacingMergeTree. | +| Page | Description | +|-------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [Overview](/updating-data/overview) | Provides an overview of the differences in updating data between ClickHouse and OLTP databases, as well as the various methods available to do so in ClickHouse. | +| [Update mutations](/managing-data/update_mutations) | Learn how to update using Update Mutations. | +| [Lightweight updates](/guides/developer/lightweight-update) | Learn how to update using Lightweight Updates. | +| [ReplacingMergeTree](/guides/replacing-merge-tree) | Learn how to update using the ReplacingMergeTree. | diff --git a/docs/managing-data/updating-data/overview.md b/docs/managing-data/updating-data/overview.md index 361585dcb5b..b6deaa72cd0 100644 --- a/docs/managing-data/updating-data/overview.md +++ b/docs/managing-data/updating-data/overview.md @@ -28,7 +28,7 @@ In summary, update operations should be issued carefully, and the mutations queu Here is a summary of the different ways to update data in ClickHouse: -## Update Mutations {#update-mutations} +## Update mutations {#update-mutations} Update mutations can be issued through a `ALTER TABLE ... UPDATE` command e.g. @@ -40,7 +40,7 @@ These are extremely IO-heavy, rewriting all the parts that match the `WHERE` exp Read more about [update mutations](/sql-reference/statements/alter/update). -## Lightweight Updates {#lightweight-updates} +## Lightweight updates {#lightweight-updates} Lightweight updates provide a mechanism to update rows such that they are updated immediately, and subsequent `SELECT` queries will automatically return with the changed values (this incurs an overhead and will slow queries). This effectively addresses the atomicity limitation of normal mutations. We show an example below: @@ -77,7 +77,7 @@ Note that for lightweight updates, a mutation is still used to update the data; Read more about [lightweight updates](/guides/developer/lightweight-update). -## Collapsing Merge Tree {#collapsing-merge-tree} +## Collapsing merge tree {#collapsing-merge-tree} Stemming from the idea that updates are expensive but inserts can be leveraged to perform updates, the [`CollapsingMergeTree`](/engines/table-engines/mergetree-family/collapsingmergetree) table engine @@ -124,6 +124,6 @@ for [`CollapsingMergeTree`](/engines/table-engines/mergetree-family/collapsingme for a more comprehensive overview. ::: -## More Resources {#more-resources} +## More resources {#more-resources} - [Handling Updates and Deletes in ClickHouse](https://clickhouse.com/blog/handling-updates-and-deletes-in-clickhouse) diff --git a/docs/managing-data/updating-data/update_mutations.md b/docs/managing-data/updating-data/update_mutations.md index 7c9e42fc1db..123505a9440 100644 --- a/docs/managing-data/updating-data/update_mutations.md +++ b/docs/managing-data/updating-data/update_mutations.md @@ -1,7 +1,7 @@ --- slug: /managing-data/update_mutations -sidebar_label: 'Update Mutations' -title: 'Update Mutations' +sidebar_label: 'Update mutations' +title: 'Update mutations' hide_title: false description: 'Page describing update mutations - ALTER queries that manipulate table data through updates' --- diff --git a/docs/materialized-view/incremental-materialized-view.md b/docs/materialized-view/incremental-materialized-view.md index 650f4e205c6..39fdc47cac3 100644 --- a/docs/materialized-view/incremental-materialized-view.md +++ b/docs/materialized-view/incremental-materialized-view.md @@ -1,6 +1,6 @@ --- slug: /materialized-view/incremental-materialized-view -title: 'Incremental Materialized View' +title: 'Incremental materialized view' description: 'How to use incremental materialized views to speed up queries' keywords: ['incremental materialized views', 'speed up queries', 'query optimization'] score: 10000 @@ -13,7 +13,7 @@ import Image from '@theme/IdealImage'; Incremental Materialized Views (Materialized Views) allow users to shift the cost of computation from query time to insert time, resulting in faster `SELECT` queries. -Unlike in transactional databases like Postgres, a ClickHouse Materialized View is just a trigger that runs a query on blocks of data as they are inserted into a table. The result of this query is inserted into a second "target" table. Should more rows be inserted, results will again be sent to the target table where the intermediate results will be updated and merged. This merged result is the equivalent of running the query over all of the original data. +Unlike in transactional databases like Postgres, a ClickHouse materialized view is just a trigger that runs a query on blocks of data as they are inserted into a table. The result of this query is inserted into a second "target" table. Should more rows be inserted, results will again be sent to the target table where the intermediate results will be updated and merged. This merged result is the equivalent of running the query over all of the original data. The principal motivation for Materialized Views is that the results inserted into the target table represent the results of an aggregation, filtering, or transformation on rows. These results will often be a smaller representation of the original data (a partial sketch in the case of aggregations). This, along with the resulting query for reading the results from the target table being simple, ensures query times are faster than if the same computation was performed on the original data, shifting computation (and thus query latency) from query time to insert time. @@ -75,7 +75,7 @@ Peak memory usage: 363.22 MiB. This query is already fast thanks to ClickHouse, but can we do better? -If we want to compute this at insert time using a Materialized View, we need a table to receive the results. This table should only keep 1 row per day. If an update is received for an existing day, the other columns should be merged into the existing day's row. For this merge of incremental states to happen, partial states must be stored for the other columns. +If we want to compute this at insert time using a materialized view, we need a table to receive the results. This table should only keep 1 row per day. If an update is received for an existing day, the other columns should be merged into the existing day's row. For this merge of incremental states to happen, partial states must be stored for the other columns. This requires a special engine type in ClickHouse: the [SummingMergeTree](/engines/table-engines/mergetree-family/summingmergetree). This replaces all the rows with the same ordering key with one row which contains summed values for the numeric columns. The following table will merge any rows with the same date, summing any numerical columns: @@ -90,7 +90,7 @@ ENGINE = SummingMergeTree ORDER BY Day ``` -To demonstrate our Materialized View, assume our votes table is empty and have yet to receive any data. Our Materialized View performs the above `SELECT` on data inserted into `votes`, with the results sent to `up_down_votes_per_day`: +To demonstrate our materialized view, assume our votes table is empty and have yet to receive any data. Our materialized view performs the above `SELECT` on data inserted into `votes`, with the results sent to `up_down_votes_per_day`: ```sql CREATE MATERIALIZED VIEW up_down_votes_per_day_mv TO up_down_votes_per_day AS @@ -206,17 +206,17 @@ LIMIT 10 Peak memory usage: 658.84 MiB. ``` -As before, we can create a Materialized View which executes the above query as new posts are inserted into our `posts` table. +As before, we can create a materialized view which executes the above query as new posts are inserted into our `posts` table. -For the purposes of example, and to avoid loading the posts data from S3, we will create a duplicate table `posts_null` with the same schema as `posts`. However, this table will not store any data and simply be used by the Materialized View when rows are inserted. To prevent storage of data, we can use the [`Null` table engine type](/engines/table-engines/special/null). +For the purposes of example, and to avoid loading the posts data from S3, we will create a duplicate table `posts_null` with the same schema as `posts`. However, this table will not store any data and simply be used by the materialized view when rows are inserted. To prevent storage of data, we can use the [`Null` table engine type](/engines/table-engines/special/null). ```sql CREATE TABLE posts_null AS posts ENGINE = Null ``` -The Null table engine is a powerful optimization - think of it as `/dev/null`. Our Materialized View will compute and store our summary statistics when our `posts_null` table receives rows at insert time - it's just a trigger. However, the raw data will not be stored. While in our case, we probably still want to store the original posts, this approach can be used to compute aggregates while avoiding storage overhead of the raw data. +The Null table engine is a powerful optimization - think of it as `/dev/null`. Our materialized view will compute and store our summary statistics when our `posts_null` table receives rows at insert time - it's just a trigger. However, the raw data will not be stored. While in our case, we probably still want to store the original posts, this approach can be used to compute aggregates while avoiding storage overhead of the raw data. -The Materialized View thus becomes: +The materialized view thus becomes: ```sql CREATE MATERIALIZED VIEW post_stats_mv TO post_stats_per_day AS @@ -247,7 +247,7 @@ ORDER BY Day While earlier the `SummingMergeTree` was sufficient to store counts, we require a more advanced engine type for other functions: the [`AggregatingMergeTree`](/engines/table-engines/mergetree-family/aggregatingmergetree). To ensure ClickHouse knows that aggregate states will be stored, we define the `Score_quantiles` and `AvgCommentCount` as the type `AggregateFunction`, specifying the function source of the partial states and the type of their source columns. Like the `SummingMergeTree`, rows with the same `ORDER BY` key value will be merged (`Day` in the above example). -To populate our `post_stats_per_day` via our Materialized View, we can simply insert all rows from `posts` into `posts_null`: +To populate our `post_stats_per_day` via our materialized view, we can simply insert all rows from `posts` into `posts_null`: ```sql INSERT INTO posts_null SELECT * FROM posts @@ -255,7 +255,7 @@ INSERT INTO posts_null SELECT * FROM posts 0 rows in set. Elapsed: 13.329 sec. Processed 119.64 million rows, 76.99 GB (8.98 million rows/s., 5.78 GB/s.) ``` -> In production, you would likely attach the Materialized View to the `posts` table. We have used `posts_null` here to demonstrate the null table. +> In production, you would likely attach the materialized view to the `posts` table. We have used `posts_null` here to demonstrate the null table. Our final query needs to utilize the `Merge` suffix for our functions (as the columns store partial aggregation states): @@ -280,9 +280,9 @@ The above focuses primarily on using Materialized Views to incrementally update In some situations, we may wish to only insert a subset of the rows and columns on insertion. In this case, our `posts_null` table could receive inserts, with a `SELECT` query filtering rows prior to insertion into the `posts` table. For example, suppose we wished to transform a `Tags` column in our `posts` table. This contains a pipe delimited list of tag names. By converting these into an array, we can more easily aggregate by individual tag values. -> We could perform this transformation when running an `INSERT INTO SELECT`. The Materialized View allows us to encapsulate this logic in ClickHouse DDL and keep our `INSERT` simple, with the transformation applied to any new rows. +> We could perform this transformation when running an `INSERT INTO SELECT`. The materialized view allows us to encapsulate this logic in ClickHouse DDL and keep our `INSERT` simple, with the transformation applied to any new rows. -Our Materialized View for this transformation is shown below: +Our materialized view for this transformation is shown below: ```sql CREATE MATERIALIZED VIEW posts_mv TO posts AS @@ -327,9 +327,9 @@ WHERE UserId = 8592047 Peak memory usage: 217.08 MiB. ``` -While fast (the data is small for ClickHouse), we can tell this requires a full table scan from the number of rows processed - 90.38 million. For larger datasets, we can use a Materialized View to lookup our ordering key values `PostId` for filtering column `UserId`. These values can then be used to perform an efficient lookup. +While fast (the data is small for ClickHouse), we can tell this requires a full table scan from the number of rows processed - 90.38 million. For larger datasets, we can use a materialized view to lookup our ordering key values `PostId` for filtering column `UserId`. These values can then be used to perform an efficient lookup. -In this example, our Materialized View can be very simple, selecting only the `PostId` and `UserId` from `comments` on insert. These results are in turn sent to a table `comments_posts_users` which is ordered by `UserId`. We create a null version of the `Comments` table below and use this to populate our view and `comments_posts_users` table: +In this example, our materialized view can be very simple, selecting only the `PostId` and `UserId` from `comments` on insert. These results are in turn sent to a table `comments_posts_users` which is ordered by `UserId`. We create a null version of the `Comments` table below and use this to populate our view and `comments_posts_users` table: ```sql CREATE TABLE comments_posts_users ( @@ -371,23 +371,23 @@ WHERE PostId IN ( Materialized views can be chained, allowing complex workflows to be established. For a practical example, we recommend reading this [blog post](https://clickhouse.com/blog/chaining-materialized-views). -## Materialized Views and JOINs {#materialized-views-and-joins} +## Materialized views and JOINs {#materialized-views-and-joins} :::note Refreshable Materialized Views The following applies to Incremental Materialized Views only. Refreshable Materialized Views execute their query periodically over the full target dataset and fully support JOINs. Consider using them for complex JOINs if a reduction in result freshness can be tolerated. ::: -Incremental Materialized views in ClickHouse fully support `JOIN` operations, but with one crucial constraint: **the Materialized View only triggers on inserts to the source table (the left-most table in the query).** Right-side tables in JOINs do not trigger updates, even if their data changes. This behavior is especially important when building **Incremental** Materialized Views, where data is aggregated or transformed during insert time. +Incremental Materialized views in ClickHouse fully support `JOIN` operations, but with one crucial constraint: **the materialized view only triggers on inserts to the source table (the left-most table in the query).** Right-side tables in JOINs do not trigger updates, even if their data changes. This behavior is especially important when building **Incremental** Materialized Views, where data is aggregated or transformed during insert time. -When an Incremental Materialized View is defined using a `JOIN`, the left-most table in the `SELECT` query acts as the source. When new rows are inserted into this table, ClickHouse executes the Materialized View query *only* with those newly inserted rows. Right-side tables in the JOIN are read in full during this execution, but changes to them alone do not trigger the view. +When an Incremental materialized view is defined using a `JOIN`, the left-most table in the `SELECT` query acts as the source. When new rows are inserted into this table, ClickHouse executes the materialized view query *only* with those newly inserted rows. Right-side tables in the JOIN are read in full during this execution, but changes to them alone do not trigger the view. This behavior makes JOINs in Materialized Views similar to a snapshot join against static dimension data. -This works well for enriching data with reference or dimension tables. However, any updates to the right-side tables (e.g., user metadata) will not retroactively update the Materialized View. To see updated data, new inserts must arrive in the source table. +This works well for enriching data with reference or dimension tables. However, any updates to the right-side tables (e.g., user metadata) will not retroactively update the materialized view. To see updated data, new inserts must arrive in the source table. ### Example {#materialized-views-and-joins-example} -Let's walk through a concrete example using the [Stack Overflow dataset](/data-modeling/schema-design). We'll use a Materialized View to compute **daily badges per user**, including the display name of the user from the `users` table. +Let's walk through a concrete example using the [Stack Overflow dataset](/data-modeling/schema-design). We'll use a materialized view to compute **daily badges per user**, including the display name of the user from the `users` table. As a reminder, our table schemas are: @@ -427,7 +427,7 @@ INSERT INTO users SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/stackoverflow/parquet/users.parquet'); ``` -The Materialized View and its associated target table are defined as: +The materialized view and its associated target table are defined as: ```sql CREATE TABLE daily_badges_by_user @@ -456,7 +456,7 @@ GROUP BY Day, b.UserId, u.DisplayName; ``` :::note Grouping and Ordering Alignment -The `GROUP BY` clause in the Materialized View must include `DisplayName`, `UserId`, and `Day` to match the `ORDER BY` in the `SummingMergeTree` target table. This ensures rows are correctly aggregated and merged. Omitting any of these can lead to incorrect results or inefficient merges. +The `GROUP BY` clause in the materialized view must include `DisplayName`, `UserId`, and `Day` to match the `ORDER BY` in the `SummingMergeTree` target table. This ensures rows are correctly aggregated and merged. Omitting any of these can lead to incorrect results or inefficient merges. ::: If we now populate the badges, the view will be triggered - populating our `daily_badges_by_user` table. @@ -517,10 +517,10 @@ WHERE DisplayName = 'gingerwizard' ``` :::warning -Notice the latency of the insert here. The inserted user row is joined against the entire `users` table, significantly impacting insert performance. We propose approaches to address this below in ["Using Source Table in Filters and Joins"](/materialized-view/incremental-materialized-view#using-source-table-in-filters-and-joins-in-materialized-views). +Notice the latency of the insert here. The inserted user row is joined against the entire `users` table, significantly impacting insert performance. We propose approaches to address this below in ["Using source table in filters and joins"](/materialized-view/incremental-materialized-view#using-source-table-in-filters-and-joins-in-materialized-views). ::: -Conversely, if we insert a badge for a new user, followed by the row for the user, our Materialized View will fail to capture the users' metrics. +Conversely, if we insert a badge for a new user, followed by the row for the user, our materialized view will fail to capture the users' metrics. ```sql INSERT INTO badges VALUES (53505059, 23923286, 'Good Answer', now(), 'Bronze', 0); @@ -555,9 +555,9 @@ WHERE DisplayName = 'brand_new_user' Note, however, that this result is incorrect. -### Best Practices for JOINs in Materialized Views {#join-best-practices} +### Best practices for JOINs in materialized views {#join-best-practices} -- **Use the left-most table as the trigger.** Only the table on the left side of the `SELECT` statement triggers the Materialized View. Changes to right-side tables will not trigger updates. +- **Use the left-most table as the trigger.** Only the table on the left side of the `SELECT` statement triggers the materialized view. Changes to right-side tables will not trigger updates. - **Pre-insert joined data.** Ensure that data in joined tables exists before inserting rows into the source table. The JOIN is evaluated at insert time, so missing data will result in unmatched rows or nulls. @@ -573,11 +573,11 @@ Note, however, that this result is incorrect. - **Consider insert volume and frequency.** JOINs work well in moderate insert workloads. For high-throughput ingestion, consider using staging tables, pre-joins, or other approaches such as Dictionaries and [Refreshable Materialized Views](/materialized-view/refreshable-materialized-view). -### Using Source Table in Filters and Joins {#using-source-table-in-filters-and-joins-in-materialized-views} +### Using source table in filters and joins {#using-source-table-in-filters-and-joins-in-materialized-views} -When working with Materialized Views in ClickHouse, it's important to understand how the source table is treated during the execution of the Materialized View's query. Specifically, the source table in the Materialized View's query is replaced with the inserted block of data. This behavior can lead to some unexpected results if not properly understood. +When working with Materialized Views in ClickHouse, it's important to understand how the source table is treated during the execution of the materialized view's query. Specifically, the source table in the materialized view's query is replaced with the inserted block of data. This behavior can lead to some unexpected results if not properly understood. -#### Example Scenario {#example-scenario} +#### Example scenario {#example-scenario} Consider the following setup: @@ -620,15 +620,15 @@ SELECT * FROM mvw2; In the above example, we have two Materialized Views `mvw1` and `mvw2` that perform similar operations but with a slight difference in how they reference the source table `t0`. -In `mvw1`, table `t0` is directly referenced inside a `(SELECT * FROM t0)` subquery on the right side of the JOIN. When data is inserted into `t0`, the Materialized View's query is executed with the inserted block of data replacing `t0`. This means that the JOIN operation is performed only on the newly inserted rows, not the entire table. +In `mvw1`, table `t0` is directly referenced inside a `(SELECT * FROM t0)` subquery on the right side of the JOIN. When data is inserted into `t0`, the materialized view's query is executed with the inserted block of data replacing `t0`. This means that the JOIN operation is performed only on the newly inserted rows, not the entire table. In the second case with joining `vt0`, the view reads all the data from `t0`. This ensures that the JOIN operation considers all rows in `t0`, not just the newly inserted block. -The key difference lies in how ClickHouse handles the source table in the Materialized View's query. When a Materialized View is triggered by an insert, the source table (`t0` in this case) is replaced by the inserted block of data. This behavior can be leveraged to optimize queries but also requires careful consideration to avoid unexpected results. +The key difference lies in how ClickHouse handles the source table in the materialized view's query. When a materialized view is triggered by an insert, the source table (`t0` in this case) is replaced by the inserted block of data. This behavior can be leveraged to optimize queries but also requires careful consideration to avoid unexpected results. -### Use Cases and Caveats {#use-cases-and-caveats} +### Use cases and caveats {#use-cases-and-caveats} -In practice, you may use this behavior to optimize Materialized Views that only need to process a subset of the source table's data. For example, you can use a subquery to filter the source table before joining it with other tables. This can help reduce the amount of data processed by the Materialized View and improve performance. +In practice, you may use this behavior to optimize Materialized Views that only need to process a subset of the source table's data. For example, you can use a subquery to filter the source table before joining it with other tables. This can help reduce the amount of data processed by the materialized view and improve performance. ```sql CREATE TABLE t0 (id UInt32, value String) ENGINE = MergeTree() ORDER BY id; @@ -646,9 +646,9 @@ ON t0.id = t1.id; In this example, the set built from the `IN (SELECT id FROM t0)` subquery has only the newly inserted rows, which can help to filter `t1` against it. -#### Example with Stack Overflow {#example-with-stack-overflow} +#### Example with stack overflow {#example-with-stack-overflow} -Consider our [earlier Materialized View example](/materialized-view/incremental-materialized-view#example) to compute **daily badges per user**, including the user's display name from the `users` table. +Consider our [earlier materialized view example](/materialized-view/incremental-materialized-view#example) to compute **daily badges per user**, including the user's display name from the `users` table. ```sql CREATE MATERIALIZED VIEW daily_badges_by_user_mv TO daily_badges_by_user @@ -725,7 +725,7 @@ In the above operation, only one row is retrieved from the users table for the u `UNION ALL` queries are commonly used to combine data from multiple source tables into a single result set. -While `UNION ALL` is not directly supported in Incremental Materialized Views, you can achieve the same outcome by creating a separate Materialized View for each `SELECT` branch and writing their results to a shared target table. +While `UNION ALL` is not directly supported in Incremental Materialized Views, you can achieve the same outcome by creating a separate materialized view for each `SELECT` branch and writing their results to a shared target table. For our example, we'll use the Stack Overflow dataset. Consider the `badges` and `comments` tables below, which represent the badges earned by a user and the comments they make on posts: @@ -809,7 +809,7 @@ ENGINE = AggregatingMergeTree ORDER BY UserId ``` -Wanting this table to update as new rows are inserted into either `badges` or `comments`, a naive approach to this problem may be to try and create a Materialized View with the previous union query: +Wanting this table to update as new rows are inserted into either `badges` or `comments`, a naive approach to this problem may be to try and create a materialized view with the previous union query: ```sql CREATE MATERIALIZED VIEW user_activity_mv TO user_activity AS @@ -881,7 +881,7 @@ GROUP BY UserId; 1 row in set. Elapsed: 0.005 sec. ``` -To solve this, we simply create a Materialized View for each SELECT statement: +To solve this, we simply create a materialized view for each SELECT statement: ```sql DROP TABLE user_activity_mv; @@ -1083,7 +1083,7 @@ Enabling `parallel_view_processing=1` can significantly improve insert throughpu - **Need for strict execution order**: In rare workflows where the order of view execution matters (e.g., chained dependencies), parallel execution may lead to inconsistent state or race conditions. While possible to design around this, such setups are fragile and may break with future versions. :::note Historical defaults and stability -Sequential execution was the default for a long time, in part due to error handling complexities. Historically, a failure in one Materialized View could prevent others from executing. Newer versions have improved this by isolating failures per block, but sequential execution still provides clearer failure semantics. +Sequential execution was the default for a long time, in part due to error handling complexities. Historically, a failure in one materialized view could prevent others from executing. Newer versions have improved this by isolating failures per block, but sequential execution still provides clearer failure semantics. ::: In general, enable `parallel_view_processing=1` when: @@ -1184,8 +1184,8 @@ Peak memory usage: 989.53 KiB. In ClickHouse, CTEs are inlined which means they are effectively copy-pasted into the query during optimization and **not** materialized. This means: -- If your CTE references a different table from the source table (i.e., the one the Materialized View is attached to), and is used in a `JOIN` or `IN` clause, it will behave like a subquery or join, not a trigger. -- The Materialized View will still only trigger on inserts into the main source table, but the CTE will be re-executed on every insert, which may cause unnecessary overhead, especially if the referenced table is large. +- If your CTE references a different table from the source table (i.e., the one the materialized view is attached to), and is used in a `JOIN` or `IN` clause, it will behave like a subquery or join, not a trigger. +- The materialized view will still only trigger on inserts into the main source table, but the CTE will be re-executed on every insert, which may cause unnecessary overhead, especially if the referenced table is large. For example, @@ -1196,6 +1196,6 @@ WITH recent_users AS ( SELECT * FROM stackoverflow.posts WHERE OwnerUserId IN (SELECT Id FROM recent_users) ``` -In this case, the users CTE is re-evaluated on every insert into posts, and the Materialized View will not update when new users are inserted - only when posts are. +In this case, the users CTE is re-evaluated on every insert into posts, and the materialized view will not update when new users are inserted - only when posts are. -Generally, use CTEs for logic that operates on the same source table the Materialized View is attached to or ensure that referenced tables are small and unlikely to cause performance bottlenecks. Alternatively, consider [the same optimizations as JOINs with Materialized Views](/materialized-view/incremental-materialized-view#join-best-practices). +Generally, use CTEs for logic that operates on the same source table the materialized view is attached to or ensure that referenced tables are small and unlikely to cause performance bottlenecks. Alternatively, consider [the same optimizations as JOINs with Materialized Views](/materialized-view/incremental-materialized-view#join-best-practices). diff --git a/docs/materialized-view/index.md b/docs/materialized-view/index.md index 6fa99f1b448..e6b3062f911 100644 --- a/docs/materialized-view/index.md +++ b/docs/materialized-view/index.md @@ -7,8 +7,8 @@ keywords: ['materialized views', 'speed up queries', 'query optimization', 'refr | Page | Description | |-------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [Incremental Materialized View](/materialized-view/incremental-materialized-view) | Allow users to shift the cost of computation from query time to insert time, resulting in faster `SELECT` queries. | -| [Refreshable Materialized View](/materialized-view/refreshable-materialized-view) | Conceptually similar to incremental materialized views but require the periodic execution of the query over the full dataset - the results of which are stored in a target table for querying. | +| [Incremental materialized view](/materialized-view/incremental-materialized-view) | Allow users to shift the cost of computation from query time to insert time, resulting in faster `SELECT` queries. | +| [Refreshable materialized view](/materialized-view/refreshable-materialized-view) | Conceptually similar to incremental materialized views but require the periodic execution of the query over the full dataset - the results of which are stored in a target table for querying. | diff --git a/docs/materialized-view/refreshable-materialized-view.md b/docs/materialized-view/refreshable-materialized-view.md index 1d797380ac7..9b2362b868d 100644 --- a/docs/materialized-view/refreshable-materialized-view.md +++ b/docs/materialized-view/refreshable-materialized-view.md @@ -1,6 +1,6 @@ --- slug: /materialized-view/refreshable-materialized-view -title: 'Refreshable Materialized View' +title: 'Refreshable materialized view' description: 'How to use materialized views to speed up queries' keywords: ['refreshable materialized view', 'refresh', 'materialized views', 'speed up queries', 'query optimization'] --- diff --git a/docs/migrations/bigquery/equivalent-concepts.md b/docs/migrations/bigquery/equivalent-concepts.md index f12bab2bb28..48352b93c42 100644 --- a/docs/migrations/bigquery/equivalent-concepts.md +++ b/docs/migrations/bigquery/equivalent-concepts.md @@ -9,7 +9,7 @@ show_related_blogs: true import bigquery_1 from '@site/static/images/migrations/bigquery-1.png'; import Image from '@theme/IdealImage'; -# BigQuery vs ClickHouse Cloud: Equivalent and different concepts +# BigQuery vs ClickHouse Cloud: equivalent and different concepts ## Resource organization {#resource-organization} @@ -21,7 +21,7 @@ The way resources are organized in ClickHouse Cloud is similar to [BigQuery's re Similar to BigQuery, organizations are the root nodes in the ClickHouse cloud resource hierarchy. The first user you set up in your ClickHouse Cloud account is automatically assigned to an organization owned by the user. The user may invite additional users to the organization. -### BigQuery Projects vs ClickHouse Cloud Services {#bigquery-projects-vs-clickhouse-cloud-services} +### BigQuery projects vs ClickHouse Cloud services {#bigquery-projects-vs-clickhouse-cloud-services} Within organizations, you can create services loosely equivalent to BigQuery projects because stored data in ClickHouse Cloud is associated with a service. There are [several service types available](/cloud/manage/cloud-tiers) in ClickHouse Cloud. Each ClickHouse Cloud service is deployed in a specific region and includes: @@ -29,15 +29,15 @@ Within organizations, you can create services loosely equivalent to BigQuery pro 2. An object storage folder where the service stores all the data. 3. An endpoint (or multiple endpoints created via ClickHouse Cloud UI console) - a service URL that you use to connect to the service (for example, `https://dv2fzne24g.us-east-1.aws.clickhouse.cloud:8443`) -### BigQuery Datasets vs ClickHouse Cloud Databases {#bigquery-datasets-vs-clickhouse-cloud-databases} +### BigQuery datasets vs ClickHouse Cloud databases {#bigquery-datasets-vs-clickhouse-cloud-databases} ClickHouse logically groups tables into databases. Like BigQuery datasets, ClickHouse databases are logical containers that organize and control access to table data. -### BigQuery Folders {#bigquery-folders} +### BigQuery folders {#bigquery-folders} ClickHouse Cloud currently has no concept equivalent to BigQuery folders. -### BigQuery Slot reservations and Quotas {#bigquery-slot-reservations-and-quotas} +### BigQuery slot reservations and quotas {#bigquery-slot-reservations-and-quotas} Like BigQuery slot reservations, you can [configure vertical and horizontal autoscaling](/manage/scaling#configuring-vertical-auto-scaling) in ClickHouse Cloud. For vertical autoscaling, you can set the minimum and maximum size for the memory and CPU cores of the compute nodes for a service. The service will then scale as needed within those bounds. These settings are also available during the initial service creation flow. Each compute node in the service has the same size. You can change the number of compute nodes within a service with [horizontal scaling](/manage/scaling#manual-horizontal-scaling). @@ -78,7 +78,7 @@ When presented with multiple options for ClickHouse types, consider the actual r ## Query acceleration techniques {#query-acceleration-techniques} -### Primary and Foreign keys and Primary index {#primary-and-foreign-keys-and-primary-index} +### Primary and foreign keys and primary index {#primary-and-foreign-keys-and-primary-index} In BigQuery, a table can have [primary key and foreign key constraints](https://cloud.google.com/bigquery/docs/information-schema-table-constraints). Typically, primary and foreign keys are used in relational databases to ensure data integrity. A primary key value is normally unique for each row and is not `NULL`. Each foreign key value in a row must be present in the primary key column of the primary key table or be `NULL`. In BigQuery, these constraints are not enforced, but the query optimizer may use this information to optimize queries better. diff --git a/docs/migrations/postgres/appendix.md b/docs/migrations/postgres/appendix.md index 99e0286be0c..6e4b762cfbd 100644 --- a/docs/migrations/postgres/appendix.md +++ b/docs/migrations/postgres/appendix.md @@ -13,7 +13,7 @@ import Image from '@theme/IdealImage'; Users coming from OLTP systems who are used to ACID transactions should be aware that ClickHouse makes deliberate compromises in not fully providing these in exchange for performance. ClickHouse semantics can deliver high durability guarantees and high write throughput if well understood. We highlight some key concepts below that users should be familiar with prior to working with ClickHouse from Postgres. -### Shards vs Replicas {#shards-vs-replicas} +### Shards vs replicas {#shards-vs-replicas} Sharding and replication are two strategies used for scaling beyond one Postgres instance when storage and/or compute become a bottleneck to performance. Sharding in Postgres involves splitting a large database into smaller, more manageable pieces across multiple nodes. However, Postgres does not support sharding natively. Instead, sharding can be achieved using extensions such as [Citus](https://www.citusdata.com/), in which Postgres becomes a distributed database capable of scaling horizontally. This approach allows Postgres to handle higher transaction rates and larger datasets by spreading the load across several machines. Shards can be row or schema-based in order to provide flexibility for workload types, such as transactional or analytical. Sharding can introduce significant complexity in terms of data management and query execution as it requires coordination across multiple machines and consistency guarantees. diff --git a/docs/migrations/postgres/data-modeling-techniques.md b/docs/migrations/postgres/data-modeling-techniques.md index 4a6d57a486d..9a4b17ecee8 100644 --- a/docs/migrations/postgres/data-modeling-techniques.md +++ b/docs/migrations/postgres/data-modeling-techniques.md @@ -71,7 +71,7 @@ PARTITION BY toYear(CreationDate) For a full description of partitioning see ["Table partitions"](/partitions). -### Applications of Partitions {#applications-of-partitions} +### Applications of partitions {#applications-of-partitions} Partitioning in ClickHouse has similar applications as in Postgres but with some subtle differences. More specifically: @@ -114,7 +114,7 @@ Ok. - **Query optimization** - While partitions can assist with query performance, this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key is not in the primary key and you are filtering by it. However, queries that need to cover many partitions may perform worse than if no partitioning is used (as there may possibly be more parts as a result of partitioning). The benefit of targeting a single partition will be even less pronounced to non-existence if the partitioning key is already an early entry in the primary key. Partitioning can also be used to [optimize GROUP BY queries](/engines/table-engines/mergetree-family/custom-partitioning-key#group-by-optimisation-using-partition-key) if values in each partition are unique. However, in general, users should ensure the primary key is optimized and only consider partitioning as a query optimization technique in exceptional cases where access patterns access a specific predictable subset of the day, e.g., partitioning by day, with most queries in the last day. -### Recommendations for Partitions {#recommendations-for-partitions} +### Recommendations for partitions {#recommendations-for-partitions} Users should consider partitioning a data management technique. It is ideal when data needs to be expired from the cluster when operating with time series data e.g. the oldest partition can [simply be dropped](/sql-reference/statements/alter/partition#drop-partitionpart). diff --git a/docs/use-cases/observability/build-your-own/schema-design.md b/docs/use-cases/observability/build-your-own/schema-design.md index 496ce09f563..0e6ec581907 100644 --- a/docs/use-cases/observability/build-your-own/schema-design.md +++ b/docs/use-cases/observability/build-your-own/schema-design.md @@ -225,7 +225,7 @@ Materialized columns will, by default, not be returned in a `SELECT *`. This is [Materialized views](/materialized-views) provide a more powerful means of applying SQL filtering and transformations to logs and traces. -Materialized Views allow users to shift the cost of computation from query time to insert time. A ClickHouse Materialized View is just a trigger that runs a query on blocks of data as they are inserted into a table. The results of this query are inserted into a second "target" table. +Materialized Views allow users to shift the cost of computation from query time to insert time. A ClickHouse materialized view is just a trigger that runs a query on blocks of data as they are inserted into a table. The results of this query are inserted into a second "target" table. Materialized view diff --git a/docs/whats-new/changelog/2021.md b/docs/whats-new/changelog/2021.md index f107acb1791..391f8e08d1f 100644 --- a/docs/whats-new/changelog/2021.md +++ b/docs/whats-new/changelog/2021.md @@ -989,7 +989,7 @@ description: 'Changelog for 2021' * Fix limit/offset settings for distributed queries (ignore on the remote nodes). [#24940](https://github.com/ClickHouse/ClickHouse/pull/24940) ([Azat Khuzhin](https://github.com/azat)). * Fix possible heap-buffer-overflow in `Arrow` format. [#24922](https://github.com/ClickHouse/ClickHouse/pull/24922) ([Kruglov Pavel](https://github.com/Avogar)). * Fixed possible error 'Cannot read from istream at offset 0' when reading a file from DiskS3 (S3 virtual filesystem is an experimental feature under development that should not be used in production). [#24885](https://github.com/ClickHouse/ClickHouse/pull/24885) ([Pavel Kovalenko](https://github.com/Jokser)). -* Fix "Missing columns" exception when joining Distributed Materialized View. [#24870](https://github.com/ClickHouse/ClickHouse/pull/24870) ([Azat Khuzhin](https://github.com/azat)). +* Fix "Missing columns" exception when joining distributed materialized view. [#24870](https://github.com/ClickHouse/ClickHouse/pull/24870) ([Azat Khuzhin](https://github.com/azat)). * Allow `NULL` values in postgresql compatibility protocol. Closes [#22622](https://github.com/ClickHouse/ClickHouse/issues/22622). [#24857](https://github.com/ClickHouse/ClickHouse/pull/24857) ([Kseniia Sumarokova](https://github.com/kssenii)). * Fix bug when exception `Mutation was killed` can be thrown to the client on mutation wait when mutation not loaded into memory yet. [#24809](https://github.com/ClickHouse/ClickHouse/pull/24809) ([alesapin](https://github.com/alesapin)). * Fixed bug in deserialization of random generator state with might cause some data types such as `AggregateFunction(groupArraySample(N), T))` to behave in a non-deterministic way. [#24538](https://github.com/ClickHouse/ClickHouse/pull/24538) ([tavplubix](https://github.com/tavplubix)). @@ -1000,7 +1000,7 @@ description: 'Changelog for 2021' * When user authentication is managed by LDAP. Fixed potential deadlock that can happen during LDAP role (re)mapping, when LDAP group is mapped to a nonexistent local role. [#24431](https://github.com/ClickHouse/ClickHouse/pull/24431) ([Denis Glazachev](https://github.com/traceon)). * In "multipart/form-data" message consider the CRLF preceding a boundary as part of it. Fixes [#23905](https://github.com/ClickHouse/ClickHouse/issues/23905). [#24399](https://github.com/ClickHouse/ClickHouse/pull/24399) ([Ivan](https://github.com/abyss7)). * Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. [#24321](https://github.com/ClickHouse/ClickHouse/pull/24321) ([Amos Bird](https://github.com/amosbird)). -* Fixed a bug in moving Materialized View from Ordinary to Atomic database (`RENAME TABLE` query). Now inner table is moved to new database together with Materialized View. Fixes [#23926](https://github.com/ClickHouse/ClickHouse/issues/23926). [#24309](https://github.com/ClickHouse/ClickHouse/pull/24309) ([tavplubix](https://github.com/tavplubix)). +* Fixed a bug in moving materialized view from Ordinary to Atomic database (`RENAME TABLE` query). Now inner table is moved to new database together with materialized view. Fixes [#23926](https://github.com/ClickHouse/ClickHouse/issues/23926). [#24309](https://github.com/ClickHouse/ClickHouse/pull/24309) ([tavplubix](https://github.com/tavplubix)). * Allow empty HTTP headers. Fixes [#23901](https://github.com/ClickHouse/ClickHouse/issues/23901). [#24285](https://github.com/ClickHouse/ClickHouse/pull/24285) ([Ivan](https://github.com/abyss7)). * Correct processing of mutations (ALTER UPDATE/DELETE) in Memory tables. Closes [#24274](https://github.com/ClickHouse/ClickHouse/issues/24274). [#24275](https://github.com/ClickHouse/ClickHouse/pull/24275) ([flynn](https://github.com/ucasfl)). * Make column LowCardinality property in JOIN output the same as in the input, close [#23351](https://github.com/ClickHouse/ClickHouse/issues/23351), close [#20315](https://github.com/ClickHouse/ClickHouse/issues/20315). [#24061](https://github.com/ClickHouse/ClickHouse/pull/24061) ([Vladimir](https://github.com/vdimir)). @@ -1109,7 +1109,7 @@ description: 'Changelog for 2021' * Fixed the behavior when query `SYSTEM RESTART REPLICA` or `SYSTEM SYNC REPLICA` is being processed infinitely. This was detected on server with extremely little amount of RAM. [#24457](https://github.com/ClickHouse/ClickHouse/pull/24457) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). * Fix incorrect monotonicity of `toWeek` function. This fixes [#24422](https://github.com/ClickHouse/ClickHouse/issues/24422) . This bug was introduced in [#5212](https://github.com/ClickHouse/ClickHouse/pull/5212), and was exposed later by smarter partition pruner. [#24446](https://github.com/ClickHouse/ClickHouse/pull/24446) ([Amos Bird](https://github.com/amosbird)). * Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. [#24321](https://github.com/ClickHouse/ClickHouse/pull/24321) ([Amos Bird](https://github.com/amosbird)). -* Fixed a bug in moving Materialized View from Ordinary to Atomic database (`RENAME TABLE` query). Now inner table is moved to new database together with Materialized View. Fixes [#23926](https://github.com/ClickHouse/ClickHouse/issues/23926). [#24309](https://github.com/ClickHouse/ClickHouse/pull/24309) ([tavplubix](https://github.com/tavplubix)). +* Fixed a bug in moving materialized view from Ordinary to Atomic database (`RENAME TABLE` query). Now inner table is moved to new database together with materialized view. Fixes [#23926](https://github.com/ClickHouse/ClickHouse/issues/23926). [#24309](https://github.com/ClickHouse/ClickHouse/pull/24309) ([tavplubix](https://github.com/tavplubix)). * Allow empty HTTP headers in client requests. Fixes [#23901](https://github.com/ClickHouse/ClickHouse/issues/23901). [#24285](https://github.com/ClickHouse/ClickHouse/pull/24285) ([Ivan](https://github.com/abyss7)). * Set `max_threads = 1` to fix mutation fail of `Memory` tables. Closes [#24274](https://github.com/ClickHouse/ClickHouse/issues/24274). [#24275](https://github.com/ClickHouse/ClickHouse/pull/24275) ([flynn](https://github.com/ucasFL)). * Fix typo in implementation of `Memory` tables, this bug was introduced at [#15127](https://github.com/ClickHouse/ClickHouse/issues/15127). Closes [#24192](https://github.com/ClickHouse/ClickHouse/issues/24192). [#24193](https://github.com/ClickHouse/ClickHouse/pull/24193) ([张中南](https://github.com/plugine)). @@ -1245,7 +1245,7 @@ description: 'Changelog for 2021' * Correct aliases handling if subquery was optimized to constant. Fixes [#22924](https://github.com/ClickHouse/ClickHouse/issues/22924). Fixes [#10401](https://github.com/ClickHouse/ClickHouse/issues/10401). [#23191](https://github.com/ClickHouse/ClickHouse/pull/23191) ([Maksim Kita](https://github.com/kitaisreal)). * Server might fail to start if `data_type_default_nullable` setting is enabled in default profile, it's fixed. Fixes [#22573](https://github.com/ClickHouse/ClickHouse/issues/22573). [#23185](https://github.com/ClickHouse/ClickHouse/pull/23185) ([tavplubix](https://github.com/tavplubix)). * Fixed a crash on shutdown which happened because of wrong accounting of current connections. [#23154](https://github.com/ClickHouse/ClickHouse/pull/23154) ([Vitaly Baranov](https://github.com/vitlibar)). -* Fixed `Table .inner_id... doesn't exist` error when selecting from Materialized View after detaching it from Atomic database and attaching back. [#23047](https://github.com/ClickHouse/ClickHouse/pull/23047) ([tavplubix](https://github.com/tavplubix)). +* Fixed `Table .inner_id... doesn't exist` error when selecting from materialized view after detaching it from Atomic database and attaching back. [#23047](https://github.com/ClickHouse/ClickHouse/pull/23047) ([tavplubix](https://github.com/tavplubix)). * Fix error `Cannot find column in ActionsDAG result` which may happen if subquery uses `untuple`. Fixes [#22290](https://github.com/ClickHouse/ClickHouse/issues/22290). [#22991](https://github.com/ClickHouse/ClickHouse/pull/22991) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Fix usage of constant columns of type `Map` with nullable values. [#22939](https://github.com/ClickHouse/ClickHouse/pull/22939) ([Anton Popov](https://github.com/CurtizJ)). * fixed `formatDateTime()` on `DateTime64` and "%C" format specifier fixed `toDateTime64()` for large values and non-zero scale. [#22937](https://github.com/ClickHouse/ClickHouse/pull/22937) ([Vasily Nemkov](https://github.com/Enmk)). @@ -1764,7 +1764,7 @@ description: 'Changelog for 2021' * Uninitialized memory read was possible in encrypt/decrypt functions if empty string was passed as IV. This closes [#19391](https://github.com/ClickHouse/ClickHouse/issues/19391). [#19397](https://github.com/ClickHouse/ClickHouse/pull/19397) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix possible buffer overflow in Uber H3 library. See https://github.com/uber/h3/issues/392. This closes [#19219](https://github.com/ClickHouse/ClickHouse/issues/19219). [#19383](https://github.com/ClickHouse/ClickHouse/pull/19383) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix system.parts _state column (LOGICAL_ERROR when querying this column, due to incorrect order). [#19346](https://github.com/ClickHouse/ClickHouse/pull/19346) ([Azat Khuzhin](https://github.com/azat)). -* Fixed possible wrong result or segfault on aggregation when Materialized View and its target table have different structure. Fixes [#18063](https://github.com/ClickHouse/ClickHouse/issues/18063). [#19322](https://github.com/ClickHouse/ClickHouse/pull/19322) ([tavplubix](https://github.com/tavplubix)). +* Fixed possible wrong result or segfault on aggregation when materialized view and its target table have different structure. Fixes [#18063](https://github.com/ClickHouse/ClickHouse/issues/18063). [#19322](https://github.com/ClickHouse/ClickHouse/pull/19322) ([tavplubix](https://github.com/tavplubix)). * Fix error `Cannot convert column now64() because it is constant but values of constants are different in source and result`. Continuation of [#7156](https://github.com/ClickHouse/ClickHouse/issues/7156). [#19316](https://github.com/ClickHouse/ClickHouse/pull/19316) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Fix bug when concurrent `ALTER` and `DROP` queries may hang while processing ReplicatedMergeTree table. [#19237](https://github.com/ClickHouse/ClickHouse/pull/19237) ([alesapin](https://github.com/alesapin)). * Fixed `There is no checkpoint` error when inserting data through http interface using `Template` or `CustomSeparated` format. Fixes [#19021](https://github.com/ClickHouse/ClickHouse/issues/19021). [#19072](https://github.com/ClickHouse/ClickHouse/pull/19072) ([tavplubix](https://github.com/tavplubix)). @@ -1803,7 +1803,7 @@ description: 'Changelog for 2021' * Add functions `countMatches`/`countMatchesCaseInsensitive`. [#17459](https://github.com/ClickHouse/ClickHouse/pull/17459) ([Azat Khuzhin](https://github.com/azat)). * Implement `countSubstrings()`/`countSubstringsCaseInsensitive()`/`countSubstringsCaseInsensitiveUTF8()` (Count the number of substring occurrences). [#17347](https://github.com/ClickHouse/ClickHouse/pull/17347) ([Azat Khuzhin](https://github.com/azat)). * Add information about used databases, tables and columns in system.query_log. Add `query_kind` and `normalized_query_hash` fields. [#17726](https://github.com/ClickHouse/ClickHouse/pull/17726) ([Amos Bird](https://github.com/amosbird)). -* Add a setting `optimize_on_insert`. When enabled, do the same transformation for INSERTed block of data as if merge was done on this block (e.g. Replacing, Collapsing, Aggregating...). This setting is enabled by default. This can influence Materialized View and MaterializeMySQL behaviour (see detailed description). This closes [#10683](https://github.com/ClickHouse/ClickHouse/issues/10683). [#16954](https://github.com/ClickHouse/ClickHouse/pull/16954) ([Kruglov Pavel](https://github.com/Avogar)). +* Add a setting `optimize_on_insert`. When enabled, do the same transformation for INSERTed block of data as if merge was done on this block (e.g. Replacing, Collapsing, Aggregating...). This setting is enabled by default. This can influence materialized view and MaterializeMySQL behaviour (see detailed description). This closes [#10683](https://github.com/ClickHouse/ClickHouse/issues/10683). [#16954](https://github.com/ClickHouse/ClickHouse/pull/16954) ([Kruglov Pavel](https://github.com/Avogar)). * Kerberos Authenticaiton for HDFS. [#16621](https://github.com/ClickHouse/ClickHouse/pull/16621) ([Ilya Golshtein](https://github.com/ilejn)). * Support `SHOW SETTINGS` statement to show parameters in system.settings. `SHOW CHANGED SETTINGS` and `LIKE/ILIKE` clause are also supported. [#18056](https://github.com/ClickHouse/ClickHouse/pull/18056) ([Jianmei Zhang](https://github.com/zhangjmruc)). * Function `position` now supports `POSITION(needle IN haystack)` synax for SQL compatibility. This closes [#18701](https://github.com/ClickHouse/ClickHouse/issues/18701). ... [#18779](https://github.com/ClickHouse/ClickHouse/pull/18779) ([Jianmei Zhang](https://github.com/zhangjmruc)). diff --git a/docs/whats-new/changelog/2022.md b/docs/whats-new/changelog/2022.md index 1b9fc200cc1..4d7dcf4d498 100644 --- a/docs/whats-new/changelog/2022.md +++ b/docs/whats-new/changelog/2022.md @@ -95,7 +95,7 @@ Refer to this issue on GitHub for more details: https://github.com/ClickHouse/Cl * Fix functions `arrayFirstOrNull` and `arrayLastOrNull` or null when the array contains `Nullable` elements. [#43274](https://github.com/ClickHouse/ClickHouse/pull/43274) ([Duc Canh Le](https://github.com/canhld94)). * Fix incorrect `UserTimeMicroseconds`/`SystemTimeMicroseconds` accounting related to Kafka tables. [#42791](https://github.com/ClickHouse/ClickHouse/pull/42791) ([Azat Khuzhin](https://github.com/azat)). * Do not suppress exceptions in `web` disks. Fix retries for the `web` disk. [#42800](https://github.com/ClickHouse/ClickHouse/pull/42800) ([Azat Khuzhin](https://github.com/azat)). -* Fixed (logical) race condition between inserts and dropping materialized views. A race condition happened when a Materialized View was dropped at the same time as an INSERT, where the MVs were present as a dependency of the insert at the begining of the execution, but the table has been dropped by the time the insert chain tries to access it, producing either an `UNKNOWN_TABLE` or `TABLE_IS_DROPPED` exception, and stopping the insertion. After this change, we avoid these exceptions and just continue with the insert if the dependency is gone. [#43161](https://github.com/ClickHouse/ClickHouse/pull/43161) ([AlfVII](https://github.com/AlfVII)). +* Fixed (logical) race condition between inserts and dropping materialized views. A race condition happened when a materialized view was dropped at the same time as an INSERT, where the MVs were present as a dependency of the insert at the begining of the execution, but the table has been dropped by the time the insert chain tries to access it, producing either an `UNKNOWN_TABLE` or `TABLE_IS_DROPPED` exception, and stopping the insertion. After this change, we avoid these exceptions and just continue with the insert if the dependency is gone. [#43161](https://github.com/ClickHouse/ClickHouse/pull/43161) ([AlfVII](https://github.com/AlfVII)). * Fix undefined behavior in the `quantiles` function, which might lead to uninitialized memory. Found by fuzzer. This closes [#44066](https://github.com/ClickHouse/ClickHouse/issues/44066). [#44067](https://github.com/ClickHouse/ClickHouse/pull/44067) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Additional check on zero uncompressed size is added to `CompressionCodecDelta`. [#43255](https://github.com/ClickHouse/ClickHouse/pull/43255) ([Nikita Taranov](https://github.com/nickitat)). * Flatten arrays from Parquet to avoid an issue with inconsistent data in arrays. These incorrect files can be generated by Apache Iceberg. [#43297](https://github.com/ClickHouse/ClickHouse/pull/43297) ([Arthur Passos](https://github.com/arthurpassos)). @@ -1789,7 +1789,7 @@ Refer to this issue on GitHub for more details: https://github.com/ClickHouse/Cl * Out of band `offset` and `limit` settings may be applied incorrectly for views. Close [#33289](https://github.com/ClickHouse/ClickHouse/issues/33289) [#33518](https://github.com/ClickHouse/ClickHouse/pull/33518) ([hexiaoting](https://github.com/hexiaoting)). * Fix an exception `Block structure mismatch` which may happen during insertion into table with default nested `LowCardinality` column. Fixes [#33028](https://github.com/ClickHouse/ClickHouse/issues/33028). [#33504](https://github.com/ClickHouse/ClickHouse/pull/33504) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Fix dictionary expressions for `range_hashed` range min and range max attributes when created using DDL. Closes [#30809](https://github.com/ClickHouse/ClickHouse/issues/30809). [#33478](https://github.com/ClickHouse/ClickHouse/pull/33478) ([Maksim Kita](https://github.com/kitaisreal)). -* Fix possible use-after-free for INSERT into Materialized View with concurrent DROP ([Azat Khuzhin](https://github.com/azat)). +* Fix possible use-after-free for INSERT into materialized view with concurrent DROP ([Azat Khuzhin](https://github.com/azat)). * Do not try to read pass EOF (to workaround for a bug in the Linux kernel), this bug can be reproduced on kernels (3.14..5.9), and requires `index_granularity_bytes=0` (i.e. turn off adaptive index granularity). [#33372](https://github.com/ClickHouse/ClickHouse/pull/33372) ([Azat Khuzhin](https://github.com/azat)). * The commands `SYSTEM SUSPEND` and `SYSTEM ... THREAD FUZZER` missed access control. It is fixed. Author: Kevin Michel. [#33333](https://github.com/ClickHouse/ClickHouse/pull/33333) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix when `COMMENT` for dictionaries does not appear in `system.tables`, `system.dictionaries`. Allow to modify the comment for `Dictionary` engine. Closes [#33251](https://github.com/ClickHouse/ClickHouse/issues/33251). [#33261](https://github.com/ClickHouse/ClickHouse/pull/33261) ([Maksim Kita](https://github.com/kitaisreal)). diff --git a/docs/whats-new/changelog/2024.md b/docs/whats-new/changelog/2024.md index d74962ebe45..1f32611026e 100644 --- a/docs/whats-new/changelog/2024.md +++ b/docs/whats-new/changelog/2024.md @@ -1239,7 +1239,7 @@ description: 'Changelog for 2024' * Fix analyzer: only interpolate expression should be used for DAG [#64096](https://github.com/ClickHouse/ClickHouse/pull/64096) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). * Fix azure backup writing multipart blocks by 1 MiB (read buffer size) instead of `max_upload_part_size` (in non-native copy case) [#64117](https://github.com/ClickHouse/ClickHouse/pull/64117) ([Kseniia Sumarokova](https://github.com/kssenii)). * Correctly fallback during backup copy [#64153](https://github.com/ClickHouse/ClickHouse/pull/64153) ([Antonio Andelic](https://github.com/antonio2368)). -* Prevent LOGICAL_ERROR on CREATE TABLE as Materialized View [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)). +* Prevent LOGICAL_ERROR on CREATE TABLE as materialized view [#64174](https://github.com/ClickHouse/ClickHouse/pull/64174) ([Raúl Marín](https://github.com/Algunenano)). * Query Cache: Consider identical queries against different databases as different [#64199](https://github.com/ClickHouse/ClickHouse/pull/64199) ([Robert Schulze](https://github.com/rschu1ze)). * Ignore `text_log` for Keeper [#64218](https://github.com/ClickHouse/ClickHouse/pull/64218) ([Antonio Andelic](https://github.com/antonio2368)). * Fix Logical error: Bad cast for Buffer table with prewhere. [#64388](https://github.com/ClickHouse/ClickHouse/pull/64388) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). @@ -1588,7 +1588,7 @@ description: 'Changelog for 2024' * Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with `Not enough privileges`. To address this problem, the release introduces a new feature of SQL security for views https://clickhouse.com/docs/sql-reference/statements/create/view#sql_security. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). #### New Feature {#new-feature-10} -* Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. So, a View will encapsulate the grants. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). +* Added new syntax which allows to specify definer user in view/materialized view. This allows to execute selects/inserts from views without explicit grants for underlying tables. So, a View will encapsulate the grants. [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) [#60439](https://github.com/ClickHouse/ClickHouse/pull/60439) ([pufit](https://github.com/pufit)). * Try to detect file format automatically during schema inference if it's unknown in `file/s3/hdfs/url/azureBlobStorage` engines. Closes [#50576](https://github.com/ClickHouse/ClickHouse/issues/50576). [#59092](https://github.com/ClickHouse/ClickHouse/pull/59092) ([Kruglov Pavel](https://github.com/Avogar)). * Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate. [#58486](https://github.com/ClickHouse/ClickHouse/pull/58486) ([Julia Kartseva](https://github.com/jkartseva)). * Allow to set up a quota for maximum sequential login failures. [#54737](https://github.com/ClickHouse/ClickHouse/pull/54737) ([Alexey Gerasimchuck](https://github.com/Demilivor)). diff --git a/docs/whats-new/changelog/index.md b/docs/whats-new/changelog/index.md index e221f116fba..adfec2b8bea 100644 --- a/docs/whats-new/changelog/index.md +++ b/docs/whats-new/changelog/index.md @@ -8,6 +8,7 @@ title: '2025 Changelog' --- ### Table of Contents +**[ClickHouse release v25.5, 2025-05-22](#255)**
**[ClickHouse release v25.4, 2025-04-22](#254)**
**[ClickHouse release v25.3 LTS, 2025-03-20](#253)**
**[ClickHouse release v25.2, 2025-02-27](#252)**
@@ -22,6 +23,192 @@ title: '2025 Changelog' **[Changelog for 2017](https://clickhouse.com/docs/whats-new/changelog/2017/)**
+### ClickHouse release 25.5, 2025-05-22 {#255} + +#### Backward Incompatible Change +* Function `geoToH3` now accepts the input in the order (lat, lon, res) (which is common for other geometric functions). Users who wish to retain the previous result order (lon, lat, res) can set setting `geotoh3_argument_order = 'lon_lat'`. [#78852](https://github.com/ClickHouse/ClickHouse/pull/78852) ([Pratima Patel](https://github.com/pratimapatel2008)). +* Add a filesystem cache setting `allow_dynamic_cache_resize`, by default `false`, to allow dynamic resize of filesystem cache. Why: in certain environments (ClickHouse Cloud) all the scaling events happen through the restart of the process and we would love this feature to be explicitly disabled to have more control over the behaviour + as a safety measure. This PR is marked as backward incompatible, because in older versions dynamic cache resize worked by default without special setting. [#79148](https://github.com/ClickHouse/ClickHouse/pull/79148) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Removed support for legacy index types `annoy` and `usearch`. Both have been stubs for a long time, i.e. every attempt to use the legacy indexes returned an error anyways. If you still have `annoy` and `usearch` indexes, please drop them. [#79802](https://github.com/ClickHouse/ClickHouse/pull/79802) ([Robert Schulze](https://github.com/rschu1ze)). +* Remove `format_alter_commands_with_parentheses` server setting. The setting was introduced and disabled by default in 24.2. It was enabled by default in 25.2. As there are no LTS versions that don't support the new format, we can remove the setting. [#79970](https://github.com/ClickHouse/ClickHouse/pull/79970) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Enable `DeltaLake` storage `delta-kernel-rs` implementation by default. [#79541](https://github.com/ClickHouse/ClickHouse/pull/79541) ([Kseniia Sumarokova](https://github.com/kssenii)). +* If reading from an `URL` involves multiple redirects, setting `enable_url_encoding` is correctly applied across all redirects in the chain. [#79563](https://github.com/ClickHouse/ClickHouse/pull/79563) ([Shankar Iyer](https://github.com/shankar-iyer)). Setting `enble_url_encoding` default value is now set to `false`. [#80088](https://github.com/ClickHouse/ClickHouse/pull/80088) ([Shankar Iyer](https://github.com/shankar-iyer)). + +#### New Feature +* Support scalar correlated subqueries in the WHERE clause. Closes [#6697](https://github.com/ClickHouse/ClickHouse/issues/6697). [#79600](https://github.com/ClickHouse/ClickHouse/pull/79600) ([Dmitry Novik](https://github.com/novikd)). Support correlated subqueries in the projection list in simple cases. [#79925](https://github.com/ClickHouse/ClickHouse/pull/79925) ([Dmitry Novik](https://github.com/novikd)). [#76078](https://github.com/ClickHouse/ClickHouse/pull/76078) ([Dmitry Novik](https://github.com/novikd)). Now it covers 100% of TPC-H test suite. +* Vector search using the vector similarity index is now beta (from previously experimental). [#80164](https://github.com/ClickHouse/ClickHouse/pull/80164) ([Robert Schulze](https://github.com/rschu1ze)). +* Support geo types in `Parquet` format. This closes [#75317](https://github.com/ClickHouse/ClickHouse/issues/75317). [#79777](https://github.com/ClickHouse/ClickHouse/pull/79777) ([scanhex12](https://github.com/scanhex12)). +* New functions `sparseGrams`, `sparseGramsHashes`, `sparseGramsHashesUTF8`, `sparseGramsUTF8` for calculating "sparse-ngrams" - a robust algorithm for extracting substrings for indexing and search. [#79517](https://github.com/ClickHouse/ClickHouse/pull/79517) ([scanhex12](https://github.com/scanhex12)). +* `clickhouse-local` (and its shorthand alias, `ch`) now use an implicit `FROM table` when there is input data for processing. This closes [#65023](https://github.com/ClickHouse/ClickHouse/issues/65023). Also enabled format inference in clickhouse-local if `--input-format` is not specified and it processes a regular file. [#79085](https://github.com/ClickHouse/ClickHouse/pull/79085) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Add `stringBytesUniq` and `stringBytesEntropy` functions to search for possibly random or encrypted data. [#79350](https://github.com/ClickHouse/ClickHouse/pull/79350) ([Sachin Kumar Singh](https://github.com/sachinkumarsingh092)). +* Added functions for encoding and decoding base32. [#79809](https://github.com/ClickHouse/ClickHouse/pull/79809) ([Joanna Hulboj](https://github.com/jh0x)). +* Add `getServerSetting` and `getMergeTreeSetting` function. Closes #78318. [#78439](https://github.com/ClickHouse/ClickHouse/pull/78439) ([NamNguyenHoai](https://github.com/NamHoaiNguyen)). +* Add new `iceberg_enable_version_hint` setting to leverage `version-hint.text` file. [#78594](https://github.com/ClickHouse/ClickHouse/pull/78594) ([Arnaud Briche](https://github.com/arnaudbriche)). +* Gives the possibility to truncate specific tables from a database, filtered with the `LIKE` keyword. [#78597](https://github.com/ClickHouse/ClickHouse/pull/78597) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Support `_part_starting_offset` virtual column in `MergeTree`-family tables. This column represents the cumulative row count of all preceding parts, calculated at query time based on the current part list. The cumulative values are retained throughout query execution and remain effective even after part pruning. Related internal logic has been refactored to support this behavior. [#79417](https://github.com/ClickHouse/ClickHouse/pull/79417) ([Amos Bird](https://github.com/amosbird)). +* Add functions `divideOrNull`,`moduloOrNull`, `intDivOrNull`,`positiveModuloOrNull` to return NULL when right argument is zero. [#78276](https://github.com/ClickHouse/ClickHouse/pull/78276) ([kevinyhzou](https://github.com/KevinyhZou)). +* Clickhouse vector search now supports both pre-filtering and post-filtering and provides related settings for finer control. (issue [#78161](https://github.com/ClickHouse/ClickHouse/issues/78161)). [#79854](https://github.com/ClickHouse/ClickHouse/pull/79854) ([Shankar Iyer](https://github.com/shankar-iyer)). +* Add [`icebergHash`](https://iceberg.apache.org/spec/#appendix-b-32-bit-hash-requirements) and [`icebergBucket`](https://iceberg.apache.org/spec/#bucket-transform-details) functions. Support data files pruning in `Iceberg` tables partitioned with [`bucket transfom`](https://iceberg.apache.org/spec/#partitioning). [#79262](https://github.com/ClickHouse/ClickHouse/pull/79262) ([Daniil Ivanik](https://github.com/divanik)). + +#### Experimental Feature +* New `Time`/`Time64` data types: `Time` (HHH:MM:SS) and `Time64` (HHH:MM:SS.``) and some basic cast functions and functions to interact with other data types. Also, changed the existing function's name toTime to toTimeWithFixedDate because the function toTime is required for the cast function. [#75735](https://github.com/ClickHouse/ClickHouse/pull/75735) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +72459). +* Hive metastore catalog for Iceberg datalake. [#77677](https://github.com/ClickHouse/ClickHouse/pull/77677) ([scanhex12](https://github.com/scanhex12)). +* Indexes of type `full_text` were renamed to `gin`. This follows the more familiar terminology of PostgreSQL and other databases. Existing indexes of type `full_text` remain loadable but they will throw an exception (suggesting `gin` indexes instead) when one tries to use them in searches. [#79024](https://github.com/ClickHouse/ClickHouse/pull/79024) ([Robert Schulze](https://github.com/rschu1ze)). + +#### Performance Improvement +* Change the Compact part format to save marks for each substream to be able to read individual subcolumns. Old Compact format is still supported for reads and can be enabled for writes using MergeTree setting `write_marks_for_substreams_in_compact_parts`. It's disabled by default for safer upgrades as it changes the compact parts storage. It will be enabled by default in one of the next releases. [#77940](https://github.com/ClickHouse/ClickHouse/pull/77940) ([Pavel Kruglov](https://github.com/Avogar)). +* Allow moving conditions with subcolumns to prewhere. [#79489](https://github.com/ClickHouse/ClickHouse/pull/79489) ([Pavel Kruglov](https://github.com/Avogar)). +* Speed up secondary indices by evaluating their expressions on multiple granules at once. [#64109](https://github.com/ClickHouse/ClickHouse/pull/64109) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Enable `compile_expressions` (JIT compiler for fragments of ordinary expressions) by default. This closes [#51264](https://github.com/ClickHouse/ClickHouse/issues/51264) and [#56386](https://github.com/ClickHouse/ClickHouse/issues/56386) and [#66486](https://github.com/ClickHouse/ClickHouse/issues/66486). [#79907](https://github.com/ClickHouse/ClickHouse/pull/79907) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* New setting introduced: `use_skip_indexes_in_final_exact_mode`. If a query on a `ReplacingMergeTree` table has FINAL clause, reading only table ranges based on skip indexes may produce incorrect result. This setting can ensure that correct results are returned by scanning newer parts that have overlap with primary key ranges returned by the skip index. Set to 0 to disable, 1 to enable. [#78350](https://github.com/ClickHouse/ClickHouse/pull/78350) ([Shankar Iyer](https://github.com/shankar-iyer)). +* Object storage cluster table functions (e.g. `s3Cluster`) will now assign files to replicas for reading based on consistent hash to improve cache locality. [#77326](https://github.com/ClickHouse/ClickHouse/pull/77326) ([Andrej Hoos](https://github.com/adikus)). +* Improve performance of `S3Queue`/`AzureQueue` by allowing INSERTs data in parallel (can be enabled with `parallel_inserts=true` queue setting). Previously S3Queue/AzureQueue can only do first part of pipeline in parallel (downloading, parsing), INSERT was single-threaded. And `INSERT`s are almost always the bottleneck. Now it will scale almost linear with `processing_threads_num`. [#77671](https://github.com/ClickHouse/ClickHouse/pull/77671) ([Azat Khuzhin](https://github.com/azat)). More fair max_processed_files_before_commit in S3Queue/AzureQueue. [#79363](https://github.com/ClickHouse/ClickHouse/pull/79363) ([Azat Khuzhin](https://github.com/azat)). +* Introduced threshold (regulated by setting `parallel_hash_join_threshold`) to fall back to the `hash` algorithm when the size of the right table is below the threshold. [#76185](https://github.com/ClickHouse/ClickHouse/pull/76185) ([Nikita Taranov](https://github.com/nickitat)). +* Now we use number of replicas to determine task size for reading with parallel replicas enabled. This provides better work distribution between replicas when the amount of data to read is not really big. [#78695](https://github.com/ClickHouse/ClickHouse/pull/78695) ([Nikita Taranov](https://github.com/nickitat)). +* Allow parallel merging of `uniqExact` states during the final stage of distributed aggregation. [#78703](https://github.com/ClickHouse/ClickHouse/pull/78703) ([Nikita Taranov](https://github.com/nickitat)). +* Fix possible performance degradation of the parallel merging of `uniqExact` states for aggregation with key. [#78724](https://github.com/ClickHouse/ClickHouse/pull/78724) ([Nikita Taranov](https://github.com/nickitat)). +* Reduce the number of List Blobs API calls to Azure storage. [#78860](https://github.com/ClickHouse/ClickHouse/pull/78860) ([Julia Kartseva](https://github.com/jkartseva)). +* Fix performance of the distributed INSERT SELECT with parallel replicas. [#79441](https://github.com/ClickHouse/ClickHouse/pull/79441) ([Azat Khuzhin](https://github.com/azat)). +* Prevent `LogSeriesLimiter` from doing cleanup on every construction, avoiding lock contention and performance regressions in high-concurrency scenarios. [#79864](https://github.com/ClickHouse/ClickHouse/pull/79864) ([filimonov](https://github.com/filimonov)). +* Speedup queries with trivial count optimization. [#79945](https://github.com/ClickHouse/ClickHouse/pull/79945) ([Raúl Marín](https://github.com/Algunenano)). +* Better inlining for some operations with `Decimal`. [#79999](https://github.com/ClickHouse/ClickHouse/pull/79999) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Set `input_format_parquet_bloom_filter_push_down` to true by default. Also, fix a mistake in the settings changes history. [#80058](https://github.com/ClickHouse/ClickHouse/pull/80058) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Optimized `ALTER ... DELETE` mutations for parts in which all rows should be deleted. Now, in such cases an empty part is created instead of original without executing a mutation. [#79307](https://github.com/ClickHouse/ClickHouse/pull/79307) ([Anton Popov](https://github.com/CurtizJ)). +* Avoid extra copying of the block during insertion into Compact part when possible. [#79536](https://github.com/ClickHouse/ClickHouse/pull/79536) ([Pavel Kruglov](https://github.com/Avogar)). +* Add setting `input_format_max_block_size_bytes` to limit blocks created in input formats in bytes. It can help to avoid high memory usage during data import when rows contains large values. [#79495](https://github.com/ClickHouse/ClickHouse/pull/79495) ([Pavel Kruglov](https://github.com/Avogar)). +* Remove guard pages for threads and async_socket_for_remote/use_hedge_requests. Change the allocation method in `FiberStack` from `mmap` to `aligned_alloc`. Since this splits VMAs and under heavy load vm.max_map_count can be reached. [#79147](https://github.com/ClickHouse/ClickHouse/pull/79147) ([Sema Checherinda](https://github.com/CheSema)). +* Lazy Materialization with parallel replicas. [#79401](https://github.com/ClickHouse/ClickHouse/pull/79401) ([Igor Nikonov](https://github.com/devcrafter)). + +#### Improvement +* Added an ability to apply lightweight deletes on the fly (with settings `lightweight_deletes_sync = 0`, `apply_mutations_on_fly = 1`. [#79281](https://github.com/ClickHouse/ClickHouse/pull/79281) ([Anton Popov](https://github.com/CurtizJ)). +* If data in the pretty format is displayed in the terminal, and a subsequent block has the same column widths, it can continue from the previous block, glue it to the previous block by moving the cursor up. This closes [#79333](https://github.com/ClickHouse/ClickHouse/issues/79333). The feature is controlled by the new setting, `output_format_pretty_glue_chunks`. [#79339](https://github.com/ClickHouse/ClickHouse/pull/79339) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Extend the `isIPAddressInRange` function to `String`, `IPv4`, `IPv6`, `Nullable(String)`, `Nullable(IPv4)`, and `Nullable(IPv6)` data types. [#78364](https://github.com/ClickHouse/ClickHouse/pull/78364) ([YjyJeff](https://github.com/YjyJeff)). +* Allow changing `PostgreSQL` engine connection pooler settings dynamically. [#78414](https://github.com/ClickHouse/ClickHouse/pull/78414) ([Samay Sharma](https://github.com/samay-sharma)). +* Allow to specify `_part_offset` in normal projection. This is the first step to build projection index. It can be used with [#58224](https://github.com/ClickHouse/ClickHouse/issues/58224) and can help improve #63207. [#78429](https://github.com/ClickHouse/ClickHouse/pull/78429) ([Amos Bird](https://github.com/amosbird)). +* Add new columns (`create_query` and `source`) for `system.named_collections`. Closes [#78179](https://github.com/ClickHouse/ClickHouse/issues/78179). [#78582](https://github.com/ClickHouse/ClickHouse/pull/78582) ([MikhailBurdukov](https://github.com/MikhailBurdukov)). +* Added a new field `condition` to system table `system.query_condition_cache`. It stores the plaintext condition whose hash is used as a key in the query condition cache. [#78671](https://github.com/ClickHouse/ClickHouse/pull/78671) ([Robert Schulze](https://github.com/rschu1ze)). +* Vector similarity indexes can now be created on top of `BFloat16` columns. [#78850](https://github.com/ClickHouse/ClickHouse/pull/78850) ([Robert Schulze](https://github.com/rschu1ze)). +* Support unix timestapms with fractional part in best effort `DateTime64` parsing. [#78908](https://github.com/ClickHouse/ClickHouse/pull/78908) ([Pavel Kruglov](https://github.com/Avogar)). +* In the storage `DeltaLake` delta-kernel implementation, fix for column mapping mode, add tests for schema evolution. [#78921](https://github.com/ClickHouse/ClickHouse/pull/78921) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Improve insert into `Variant` column in Values format by better conversion of values. [#78923](https://github.com/ClickHouse/ClickHouse/pull/78923) ([Pavel Kruglov](https://github.com/Avogar)). +* The `tokens` function was extended to accept an additional "tokenizer" argument plus further tokenizer-specific arguments. [#79001](https://github.com/ClickHouse/ClickHouse/pull/79001) ([Elmi Ahmadov](https://github.com/ahmadov)). +* The `SHOW CLUSTER` statement now expands macros (if any) in its argument. [#79006](https://github.com/ClickHouse/ClickHouse/pull/79006) ([arf42](https://github.com/arf42)). +* Hash functions now support `NULL`s inside arrays, tuples, and maps. (issues [#48365](https://github.com/ClickHouse/ClickHouse/issues/48365) and [#48623](https://github.com/ClickHouse/ClickHouse/issues/48623)). [#79008](https://github.com/ClickHouse/ClickHouse/pull/79008) ([Michael Kolupaev](https://github.com/al13n321)). +* Update cctz to 2025a. [#79043](https://github.com/ClickHouse/ClickHouse/pull/79043) ([Raúl Marín](https://github.com/Algunenano)). +* Change the default stderr processing for UDFs to "log_last". It's better for usability. [#79066](https://github.com/ClickHouse/ClickHouse/pull/79066) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Make tabs undo-able in the Web UI. This closes [#71284](https://github.com/ClickHouse/ClickHouse/issues/71284). [#79084](https://github.com/ClickHouse/ClickHouse/pull/79084) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Remove settings during `recoverLostReplica` same as it was done in: https://github.com/ClickHouse/ClickHouse/pull/78637. [#79113](https://github.com/ClickHouse/ClickHouse/pull/79113) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Add profile events: `ParquetReadRowGroups` and `ParquetPrunedRowGroups` to profile parquet index prune. [#79180](https://github.com/ClickHouse/ClickHouse/pull/79180) ([flynn](https://github.com/ucasfl)). +* Support `ALTER`ing database on cluster. [#79242](https://github.com/ClickHouse/ClickHouse/pull/79242) ([Tuan Pham Anh](https://github.com/tuanpach)). +* Explicitly skip missed runs of statistics collection for QueryMetricLog, otherwise the log will take a long time to catch up with the current time. [#79257](https://github.com/ClickHouse/ClickHouse/pull/79257) ([Mikhail Artemenko](https://github.com/Michicosun)). +* Some small optimizations for reading `Arrow`-based formats. [#79308](https://github.com/ClickHouse/ClickHouse/pull/79308) ([Bharat Nallan](https://github.com/bharatnc)). +* The setting `allow_archive_path_syntax` was marked as experimental by mistake. Add a test to prevent having experimental settings enabled by default. [#79320](https://github.com/ClickHouse/ClickHouse/pull/79320) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Made page cache settings adjustable on a per-query level. This is needed for faster experimentation and for the possibility of fine-tuning for high-throughput and low-latency queries. [#79337](https://github.com/ClickHouse/ClickHouse/pull/79337) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Do not print number tips in pretty formats for numbers that look like most of the 64-bit hashes. This closes [#79334](https://github.com/ClickHouse/ClickHouse/issues/79334). [#79338](https://github.com/ClickHouse/ClickHouse/pull/79338) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Colors of graphs on the advanced dashboards will be calculated from the hash of the corresponding query. This makes it easier to remember and locate a graph while scrolling the dashboard. [#79341](https://github.com/ClickHouse/ClickHouse/pull/79341) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Add asynchronous metric, `FilesystemCacheCapacity` - total capacity in the `cache` virtual filesystem. This is useful for global infrastructure monitoring. [#79348](https://github.com/ClickHouse/ClickHouse/pull/79348) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Optimize access to system.parts (read columns/indexes size only when requested). [#79352](https://github.com/ClickHouse/ClickHouse/pull/79352) ([Azat Khuzhin](https://github.com/azat)). +* Calculate the relevant fields for query `'SHOW CLUSTER '` instead of all fields. [#79368](https://github.com/ClickHouse/ClickHouse/pull/79368) ([Tuan Pham Anh](https://github.com/tuanpach)). +* Allow to specify storage settings for `DatabaseCatalog`. [#79407](https://github.com/ClickHouse/ClickHouse/pull/79407) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Support local storage in `DeltaLake`. [#79416](https://github.com/ClickHouse/ClickHouse/pull/79416) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Add a query level setting to enable delta-kernel-rs: `allow_experimental_delta_kernel_rs`. [#79418](https://github.com/ClickHouse/ClickHouse/pull/79418) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Fix possible endless loop when listing blobs from Azure/S3 blob storage. [#79425](https://github.com/ClickHouse/ClickHouse/pull/79425) ([Alexander Gololobov](https://github.com/davenger)). +* Add filesystem cache setting `max_size_ratio_to_total_space`. [#79460](https://github.com/ClickHouse/ClickHouse/pull/79460) ([Kseniia Sumarokova](https://github.com/kssenii)). +* For `clickhouse-benchmark` reconfigure `reconnect` option to take 0, 1 or N as values for reconnecting accordingly. [#79465](https://github.com/ClickHouse/ClickHouse/pull/79465) ([Sachin Kumar Singh](https://github.com/sachinkumarsingh092)). +* Allow `ALTER TABLE ... MOVE|REPLACE PARTITION` for tables on different `plain_rewritable` disks. [#79566](https://github.com/ClickHouse/ClickHouse/pull/79566) ([Julia Kartseva](https://github.com/jkartseva)). +* The vector similarity index is now also used if the reference vector is of type `Array(BFloat16)`. [#79745](https://github.com/ClickHouse/ClickHouse/pull/79745) ([Shankar Iyer](https://github.com/shankar-iyer)). +* Add last_error_message, last_error_trace and query_id to the system.error_log table. Related ticket [#75816](https://github.com/ClickHouse/ClickHouse/issues/75816). [#79836](https://github.com/ClickHouse/ClickHouse/pull/79836) ([Andrei Tinikov](https://github.com/Dolso)). +* Enable sending crash reports by default. This can be turned off in the server's configuration file. [#79838](https://github.com/ClickHouse/ClickHouse/pull/79838) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* System table `system.functions` now shows in which ClickHouse version functions were first introduced. [#79839](https://github.com/ClickHouse/ClickHouse/pull/79839) ([Robert Schulze](https://github.com/rschu1ze)). +* Added `access_control_improvements.enable_user_name_access_type` setting. This setting allows enabling/disabling of precise grants for users/roles, introduced in https://github.com/ClickHouse/ClickHouse/pull/72246. You may want to turn this setting off in case you have a cluster with the replicas older than 25.1. [#79842](https://github.com/ClickHouse/ClickHouse/pull/79842) ([pufit](https://github.com/pufit)). +* Proper implementation of `ASTSelectWithUnionQuery::clone()` method now takes into account `is_normalized` field as well. This might help with [#77569](https://github.com/ClickHouse/ClickHouse/issues/77569). [#79909](https://github.com/ClickHouse/ClickHouse/pull/79909) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Fix the inconsistent formatting of certain queries with the EXCEPT operator. If the left-hand side of the EXCEPT operator ends with `*`, the formatted query loses parentheses and is then parsed as a `*` with the `EXCEPT` modifier. These queries are found by the fuzzer and are unlikely to be found in practice. This closes [#79950](https://github.com/ClickHouse/ClickHouse/issues/79950). [#79952](https://github.com/ClickHouse/ClickHouse/pull/79952) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Small improvement in `JSON` type parsing by using cache of variants deserialization order. [#79984](https://github.com/ClickHouse/ClickHouse/pull/79984) ([Pavel Kruglov](https://github.com/Avogar)). +* Add setting `s3_slow_all_threads_after_network_error`. [#80035](https://github.com/ClickHouse/ClickHouse/pull/80035) ([Vitaly Baranov](https://github.com/vitlibar)). +* The logging level about the selected parts to merge was wrong (Information). Closes [#80061](https://github.com/ClickHouse/ClickHouse/issues/80061). [#80062](https://github.com/ClickHouse/ClickHouse/pull/80062) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* trace-visualizer: add runtime/share in tooltips and status messages. [#79040](https://github.com/ClickHouse/ClickHouse/pull/79040) ([Sergei Trifonov](https://github.com/serxa)). +* trace-visualizer: load data from clickhouse server. [#79042](https://github.com/ClickHouse/ClickHouse/pull/79042) ([Sergei Trifonov](https://github.com/serxa)). +* Add metrics on failing merges. [#79228](https://github.com/ClickHouse/ClickHouse/pull/79228) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)). +* `clickhouse-benchmark` will display percentage based on the max iterations if specified. [#79346](https://github.com/ClickHouse/ClickHouse/pull/79346) ([Alexey Milovidov](https://github.com/alexey-milovidov)). +* Add system.parts table visualizer. [#79437](https://github.com/ClickHouse/ClickHouse/pull/79437) ([Sergei Trifonov](https://github.com/serxa)). +* Add tool for query latency analyzing. [#79978](https://github.com/ClickHouse/ClickHouse/pull/79978) ([Sergei Trifonov](https://github.com/serxa)). + +#### Bug Fix (user-visible misbehavior in an official stable release) +* Fix renames of columns missing in part. [#76346](https://github.com/ClickHouse/ClickHouse/pull/76346) ([Anton Popov](https://github.com/CurtizJ)). +* A materialized view can start too late, e.g. after the Kafka table that streams to it. [#72123](https://github.com/ClickHouse/ClickHouse/pull/72123) ([Ilya Golshtein](https://github.com/ilejn)). +* Fix `SELECT` query rewriting during `VIEW` creation with enabled analyzer. closes [#75956](https://github.com/ClickHouse/ClickHouse/issues/75956). [#76356](https://github.com/ClickHouse/ClickHouse/pull/76356) ([Dmitry Novik](https://github.com/novikd)). +* Fix applying `async_insert` from server (via `apply_settings_from_server`) (previously leads to `Unknown packet 11 from server` errors on the client). [#77578](https://github.com/ClickHouse/ClickHouse/pull/77578) ([Azat Khuzhin](https://github.com/azat)). +* Fixed refreshable materialized view in Replicated database not working on newly added replicas. [#77774](https://github.com/ClickHouse/ClickHouse/pull/77774) ([Michael Kolupaev](https://github.com/al13n321)). +* Fixed refreshable materialized views breaking backups. [#77893](https://github.com/ClickHouse/ClickHouse/pull/77893) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix old firing logical error for `transform`. [#78247](https://github.com/ClickHouse/ClickHouse/pull/78247) ([Yarik Briukhovetskyi](https://github.com/yariks5s)). +* Fix some cases where secondary index was not applied with analyzer. Fixes [#65607](https://github.com/ClickHouse/ClickHouse/issues/65607) , fixes [#69373](https://github.com/ClickHouse/ClickHouse/issues/69373). [#78485](https://github.com/ClickHouse/ClickHouse/pull/78485) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix dumping profile events (`NetworkSendElapsedMicroseconds`/`NetworkSendBytes`) for HTTP protocol with compression enabled (the error should not be more then the buffer size, usually around 1MiB). [#78516](https://github.com/ClickHouse/ClickHouse/pull/78516) ([Azat Khuzhin](https://github.com/azat)). +* Fix analyzer producing LOGICAL_ERROR when JOIN ... USING involves ALIAS column - should produce appropriate error. [#78618](https://github.com/ClickHouse/ClickHouse/pull/78618) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fix analyzer: CREATE VIEW ... ON CLUSTER fails if SELECT contains positional arguments. [#78663](https://github.com/ClickHouse/ClickHouse/pull/78663) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fix `Block structure mismatch` error in case of `INSERT SELECT` into table a function with schema inference if `SELECT` has scalar subqueries. [#78677](https://github.com/ClickHouse/ClickHouse/pull/78677) ([Pervakov Grigorii](https://github.com/GrigoryPervakov)). +* Fix analyzer: with prefer_global_in_and_join=1 for Distributed table in SELECT query `in` function should be replaced by `globalIn`. [#78749](https://github.com/ClickHouse/ClickHouse/pull/78749) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)). +* Fixed several types of `SELECT` queries that read from tables with `MongoDB` engine or `mongodb` table function: queries with implicit conversion of const value in `WHERE` clause (e.g. `WHERE datetime = '2025-03-10 00:00:00'`) ; queries with `LIMIT` and `GROUP BY`. Previously, they could return the wrong result. [#78777](https://github.com/ClickHouse/ClickHouse/pull/78777) ([Anton Popov](https://github.com/CurtizJ)). +* Fix conversion between different JSON types. Not it's performed by simple cast through convertion to/from String. It's less effective but 100% accurate. [#78807](https://github.com/ClickHouse/ClickHouse/pull/78807) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix logical error during convertion of Dynamic type to Interval. [#78813](https://github.com/ClickHouse/ClickHouse/pull/78813) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix column rollback on JSON parsing error. [#78836](https://github.com/ClickHouse/ClickHouse/pull/78836) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix 'bad cast' error when join using constant alias column. [#78848](https://github.com/ClickHouse/ClickHouse/pull/78848) ([Vladimir Cherkasov](https://github.com/vdimir)). +* Don't allow prewhere in materialized view on columns with different types in view and target table. [#78889](https://github.com/ClickHouse/ClickHouse/pull/78889) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix logical error during parsing of bad binary data of Variant column. [#78982](https://github.com/ClickHouse/ClickHouse/pull/78982) ([Pavel Kruglov](https://github.com/Avogar)). +* Throw an exception when the parquet batch size is set to 0. Previously when output_format_parquet_batch_size = 0 ClickHouse would hang. Now this behavior is fixed. [#78991](https://github.com/ClickHouse/ClickHouse/pull/78991) ([daryawessely](https://github.com/daryawessely)). +* Fix deserialization of variant discriminators with basic format in compact parts. It was introduced in https://github.com/ClickHouse/ClickHouse/pull/55518. [#79000](https://github.com/ClickHouse/ClickHouse/pull/79000) ([Pavel Kruglov](https://github.com/Avogar)). +* Dictionaries of type `complex_key_ssd_cache` now reject zero or negative `block_size` and `write_buffer_size` parameters (issue [#78314](https://github.com/ClickHouse/ClickHouse/issues/78314)). [#79028](https://github.com/ClickHouse/ClickHouse/pull/79028) ([Elmi Ahmadov](https://github.com/ahmadov)). +* Avoid using Field for non-aggregated columns in SummingMergeTree. It could lead to unexpected errors with Dynamic/Variant types used in SummingMergeTree. [#79051](https://github.com/ClickHouse/ClickHouse/pull/79051) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix read from materialized view with Distributed destination table and different header in analyzer. [#79059](https://github.com/ClickHouse/ClickHouse/pull/79059) ([Pavel Kruglov](https://github.com/Avogar)). +* Fixes a bug where `arrayUnion()` returned extra (incorrect) values on tables that had batch inserts. Fixes [#75057](https://github.com/ClickHouse/ClickHouse/issues/75057). [#79079](https://github.com/ClickHouse/ClickHouse/pull/79079) ([Peter Nguyen](https://github.com/petern48)). +* Fix segfault in `OpenSSLInitializer`. Closes [#79092](https://github.com/ClickHouse/ClickHouse/issues/79092). [#79097](https://github.com/ClickHouse/ClickHouse/pull/79097) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Always set prefix for S3 ListObject. [#79114](https://github.com/ClickHouse/ClickHouse/pull/79114) ([Azat Khuzhin](https://github.com/azat)). +* Fixes a bug where arrayUnion() returned extra (incorrect) values on tables that had batch inserts. Fixes [#79157](https://github.com/ClickHouse/ClickHouse/issues/79157). [#79158](https://github.com/ClickHouse/ClickHouse/pull/79158) ([Peter Nguyen](https://github.com/petern48)). +* Fix logical error after filter pushdown. [#79164](https://github.com/ClickHouse/ClickHouse/pull/79164) ([Pervakov Grigorii](https://github.com/GrigoryPervakov)). +* Fix DeltaLake table engine with delta-kernel implementation being used with http based endpoints, fix NOSIGN. Closes [#78124](https://github.com/ClickHouse/ClickHouse/issues/78124). [#79203](https://github.com/ClickHouse/ClickHouse/pull/79203) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Keeper fix: Avoid triggering watches on failed multi requests. [#79247](https://github.com/ClickHouse/ClickHouse/pull/79247) ([Antonio Andelic](https://github.com/antonio2368)). +* Forbid Dynamic and JSON types in IN. With current implementation of `IN` it can lead to incorrect results. Proper support of this types in `IN` is complicated and can be done in future. [#79282](https://github.com/ClickHouse/ClickHouse/pull/79282) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix check for duplicate paths in JSON type parsing. [#79317](https://github.com/ClickHouse/ClickHouse/pull/79317) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix SecureStreamSocket connection issues. [#79383](https://github.com/ClickHouse/ClickHouse/pull/79383) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Fix loading of plain_rewritable disks containing data. [#79439](https://github.com/ClickHouse/ClickHouse/pull/79439) ([Julia Kartseva](https://github.com/jkartseva)). +* Fix crash in dynamic subcolumns discovery in Wide parts in MergeTree. [#79466](https://github.com/ClickHouse/ClickHouse/pull/79466) ([Pavel Kruglov](https://github.com/Avogar)). +* Verify the table name's length only for initial create queries. Do not verify this for secondary creates to avoid backward compatibility issues. [#79488](https://github.com/ClickHouse/ClickHouse/pull/79488) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)). +* Fixed error `Block structure mismatch` in several cases with tables with sparse columns. [#79491](https://github.com/ClickHouse/ClickHouse/pull/79491) ([Anton Popov](https://github.com/CurtizJ)). +* Fix two cases of "Logical Error: Can't set alias of * of Asterisk on alias". [#79505](https://github.com/ClickHouse/ClickHouse/pull/79505) ([Raúl Marín](https://github.com/Algunenano)). +* Fix using incorrect paths when renaming an Atomic database. [#79569](https://github.com/ClickHouse/ClickHouse/pull/79569) ([Tuan Pham Anh](https://github.com/tuanpach)). +* Fix order by JSON column with other columns. [#79591](https://github.com/ClickHouse/ClickHouse/pull/79591) ([Pavel Kruglov](https://github.com/Avogar)). +* Fix result duplication when reading from remote with both `use_hedged_requests` and `allow_experimental_parallel_reading_from_replicas` disabled. [#79599](https://github.com/ClickHouse/ClickHouse/pull/79599) ([Eduard Karacharov](https://github.com/korowa)). +* Fix crash in delta-kernel implementation when using unity catalog. [#79677](https://github.com/ClickHouse/ClickHouse/pull/79677) ([Kseniia Sumarokova](https://github.com/kssenii)). +* Resolve macros for autodiscovery clusters. [#79696](https://github.com/ClickHouse/ClickHouse/pull/79696) ([Anton Ivashkin](https://github.com/ianton-ru)). +* Handle incorrectly configured page_cache_limits suitably. [#79805](https://github.com/ClickHouse/ClickHouse/pull/79805) ([Bharat Nallan](https://github.com/bharatnc)). +* Fixes the result of SQL function `formatDateTime` if a variable-size formatter (e.g. `%W` aka. weekday `Monday` `Tuesday`, etc.) is followed by a compound formatter (a formatter that prints multiple components at once, e.g. `%D` aka. the American date `05/04/25`). [#79835](https://github.com/ClickHouse/ClickHouse/pull/79835) ([Robert Schulze](https://github.com/rschu1ze)). +* IcebergS3 supports count optimization, but IcebergS3Cluster does not. As a result, the count() result returned in cluster mode may be a multiple of the number of replicas. [#79844](https://github.com/ClickHouse/ClickHouse/pull/79844) ([wxybear](https://github.com/wxybear)). +* Fixes AMBIGUOUS_COLUMN_NAME error with lazy materialization when no columns are used for query execution until projection. Example, SELECT * FROM t ORDER BY rand() LIMIT 5. [#79926](https://github.com/ClickHouse/ClickHouse/pull/79926) ([Igor Nikonov](https://github.com/devcrafter)). +* Hide password for query `CREATE DATABASE datalake ENGINE = DataLakeCatalog(\'http://catalog:8181\', \'admin\', \'password\')`. [#79941](https://github.com/ClickHouse/ClickHouse/pull/79941) ([Han Fei](https://github.com/hanfei1991)). +* Allow to specify an alias in JOIN USING. Specify this alias in case the column was renamed (e.g., because of ARRAY JOIN). Fixes [#73707](https://github.com/ClickHouse/ClickHouse/issues/73707). [#79942](https://github.com/ClickHouse/ClickHouse/pull/79942) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Allow materialized views with UNIONs to work correctly on new replicas. [#80037](https://github.com/ClickHouse/ClickHouse/pull/80037) ([Samay Sharma](https://github.com/samay-sharma)). +* Format specifier `%e` in SQL function `parseDateTime` now recognizes single-digit days (e.g. `3`), whereas it previously required space padding (e.g. ` 3`). This makes its behavior compatible with MySQL. To retain the previous behaviour, set setting `parsedatetime_e_requires_space_padding = 1`. (issue [#78243](https://github.com/ClickHouse/ClickHouse/issues/78243)). [#80057](https://github.com/ClickHouse/ClickHouse/pull/80057) ([Robert Schulze](https://github.com/rschu1ze)). +* Fix warnings `Cannot find 'kernel' in '[...]/memory.stat'` in ClickHouse's log (issue [#77410](https://github.com/ClickHouse/ClickHouse/issues/77410)). [#80129](https://github.com/ClickHouse/ClickHouse/pull/80129) ([Robert Schulze](https://github.com/rschu1ze)). +* Check stack size in FunctionComparison to avoid stack overflow crash. [#78208](https://github.com/ClickHouse/ClickHouse/pull/78208) ([Julia Kartseva](https://github.com/jkartseva)). +* Fix race during SELECT from `system.workloads`. [#78743](https://github.com/ClickHouse/ClickHouse/pull/78743) ([Sergei Trifonov](https://github.com/serxa)). +* Fix: lazy materialization in distributed queries. [#78815](https://github.com/ClickHouse/ClickHouse/pull/78815) ([Igor Nikonov](https://github.com/devcrafter)). +* Fix `Array(Bool)` to `Array(FixedString)` conversion. [#78863](https://github.com/ClickHouse/ClickHouse/pull/78863) ([Nikita Taranov](https://github.com/nickitat)). +* Make parquet version selection less confusing. [#78818](https://github.com/ClickHouse/ClickHouse/pull/78818) ([Michael Kolupaev](https://github.com/al13n321)). +* Fix `ReservoirSampler` self-merging. [#79031](https://github.com/ClickHouse/ClickHouse/pull/79031) ([Nikita Taranov](https://github.com/nickitat)). +* Fix storage of insertion table in client context. [#79046](https://github.com/ClickHouse/ClickHouse/pull/79046) ([Pervakov Grigorii](https://github.com/GrigoryPervakov)). +* Fix the destruction order of data members of `AggregatingSortedAlgorithm` and `SummingSortedAlgorithm`. [#79056](https://github.com/ClickHouse/ClickHouse/pull/79056) ([Nikita Taranov](https://github.com/nickitat)). +* `enable_user_name_access_type` must not affect `DEFINER` access type. [#80026](https://github.com/ClickHouse/ClickHouse/pull/80026) ([pufit](https://github.com/pufit)). +* Query to system database can hang if system database metadata located in keeper. [#79304](https://github.com/ClickHouse/ClickHouse/pull/79304) ([Mikhail Artemenko](https://github.com/Michicosun)). + +#### Build/Testing/Packaging Improvement +* Make it possible to reuse the built `chcache` binary instead of always rebuilding it. [#78851](https://github.com/ClickHouse/ClickHouse/pull/78851) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)). +* Add NATS pause waiting. [#78987](https://github.com/ClickHouse/ClickHouse/pull/78987) ([Dmitry Novikov](https://github.com/dmitry-sles-novikov)). +* Fix for incorrectly publishing ARM build as amd64compat. [#79122](https://github.com/ClickHouse/ClickHouse/pull/79122) ([Alexander Gololobov](https://github.com/davenger)). +* Use generated ahead of time assembly for OpenSSL. [#79386](https://github.com/ClickHouse/ClickHouse/pull/79386) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Fixes to allow building with `clang20`. [#79588](https://github.com/ClickHouse/ClickHouse/pull/79588) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* `chcache`: Rust caching support. [#78691](https://github.com/ClickHouse/ClickHouse/pull/78691) ([Konstantin Bogdanov](https://github.com/thevar1able)). +* Add unwind information for `zstd` assembly files. [#79288](https://github.com/ClickHouse/ClickHouse/pull/79288) ([Michael Kolupaev](https://github.com/al13n321)). + + ### ClickHouse release 25.4, 2025-04-22 {#254} #### Backward Incompatible Change diff --git a/docs/whats-new/roadmap.md b/docs/whats-new/roadmap.md index d30f4c68a49..93f7d5313dd 100644 --- a/docs/whats-new/roadmap.md +++ b/docs/whats-new/roadmap.md @@ -5,13 +5,13 @@ sidebar_position: 50 description: 'Present and past ClickHouse road maps' --- -## Current Roadmap {#current-roadmap} +## Current roadmap {#current-roadmap} The current roadmap is published for open discussion: - [2025](https://github.com/ClickHouse/ClickHouse/issues/74046) -## Previous Roadmaps {#previous-roadmaps} +## Previous roadmaps {#previous-roadmaps} - [2024](https://github.com/ClickHouse/ClickHouse/issues/58392) - [2023](https://github.com/ClickHouse/ClickHouse/issues/44767) diff --git a/docs/whats-new/security-changelog.md b/docs/whats-new/security-changelog.md index 1c79dca658c..2f02469a7e5 100644 --- a/docs/whats-new/security-changelog.md +++ b/docs/whats-new/security-changelog.md @@ -1,12 +1,12 @@ --- slug: /whats-new/security-changelog sidebar_position: 20 -sidebar_label: 'Security Changelog' -title: 'Security Changelog' +sidebar_label: 'Security changelog' +title: 'Security changelog' description: 'Security changelog detailing security related updates and changes' --- -# Security Changelog +# Security changelog ## Fixed in ClickHouse v25.1.5.5, 2025-01-05 {#fixed-in-clickhouse-release-2025-01-05} diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/chdb/install/bun.md b/i18n/jp/docusaurus-plugin-content-docs/current/chdb/install/bun.md index 59e5bac14db..058cd1c3c64 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/chdb/install/bun.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/chdb/install/bun.md @@ -42,7 +42,9 @@ var result = query("SELECT version()", "CSV"); console.log(result); // 23.10.1.1 ``` + ### Session.Query(query, *format) {#sessionqueryquery-format} + ```javascript const sess = new Session('./chdb-bun-tmp'); diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/get-started/cloud-quick-start.mdx b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/get-started/cloud-quick-start.mdx index 02b9c0a8828..e1afdf787f2 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/get-started/cloud-quick-start.mdx +++ b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/get-started/cloud-quick-start.mdx @@ -26,8 +26,6 @@ import client_details from '@site/static/images/_snippets/client_details.png'; import new_rows_from_csv from '@site/static/images/_snippets/new_rows_from_csv.png'; import SQLConsoleDetail from '@site/i18n/jp/docusaurus-plugin-content-docs/current/_snippets/_launch_sql_console.md'; -```md - # ClickHouse Cloud クイックスタート ClickHouse を始める最も迅速で簡単な方法は、[ClickHouse Cloud](https://console.clickhouse.cloud) に新しいサービスを作成することです。 diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/manage/billing/marketplace/index.md b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/manage/billing/marketplace/index.md index 41113920eed..b9c1b2dd71a 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/manage/billing/marketplace/index.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/manage/billing/marketplace/index.md @@ -8,8 +8,6 @@ - 'GCP' --- - - このセクションでは、マーケットプレイスに関連する請求トピックについて詳しく説明します。 | ページ | 説明 | diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md index 1dc21f9d681..a266bd73f75 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md @@ -146,7 +146,7 @@ ClickHouse Cloudの安定した使用を確保し、ベストプラクティス [Golang](https://github.com/ClickHouse/clickhouse-go/releases/tag/v2.30.1)、[Python](https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.8.11)、および[NodeJS](https://github.com/ClickHouse/clickhouse-js/releases/tag/1.10.1)クライアントが、Dynamic、Variant、およびJSONタイプリクエストをサポートしました。 -### DBT support for Refreshable Materialized Views {#dbt-support-for-refreshable-materialized-views} +### DBT support for refreshable materialized views {#dbt-support-for-refreshable-materialized-views} DBTは、`1.8.7`リリースで[リフレッシュ可能なマテリアライズドビュー](https://github.com/ClickHouse/dbt-clickhouse/releases/tag/v1.8.7)をサポートしています。 @@ -297,7 +297,7 @@ ClickHouse Cloudは、いくつかの請求およびスケーリングイベン 多要素認証を使用している顧客は、電話を失ったりトークンを誤って削除した場合に使用できる回復コードを取得できるようになりました。初めてMFAに登録する顧客には、設定時にコードが提供されます。既存のMFAを持っている顧客は、既存のMFAトークンを削除し新しいトークンを追加することで回復コードを取得できます。 -### ClickPipes Update: Custom Certificates, Latency Insights, and More! {#clickpipes-update-custom-certificates-latency-insights-and-more} +### ClickPipes update: custom certificates, latency insights, and more! {#clickpipes-update-custom-certificates-latency-insights-and-more} ClickPipes、データをClickHouseサービスに取り込むための最も簡単な方法に関する最新の更新情報をお知らせできることを嬉しく思います!これらの新機能は、データ取り込みの制御を強化し、パフォーマンスメトリクスへの可視化を提供することを目的としています。 diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/security/personal-data-access.md b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/security/personal-data-access.md index 1ea5b38d0dd..6e72e55f9b9 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/cloud/security/personal-data-access.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/cloud/security/personal-data-access.md @@ -39,7 +39,7 @@ ClickHouseが収集する個人データやその使用方法については、C 注意: `OrgID`を含むURLは、特定のアカウントの`OrgID`を反映するように更新する必要があります。 -### Current Customers {#current-customers} +### Current customers {#current-customers} 弊社とアカウントをお持ちで、セルフサービスオプションで個人データの問題が解決しない場合、プライバシーポリシーに基づきデータ主体アクセス要求を提出できます。そのためには、ClickHouseアカウントにログインし、[サポートケース](https://console.clickhouse.cloud/support)を開いてください。これにより、あなたの身元を確認し、リクエストに対応するプロセスをスムーズに進めることができます。 diff --git a/i18n/jp/docusaurus-plugin-content-docs/current/managing-data/core-concepts/shards.md b/i18n/jp/docusaurus-plugin-content-docs/current/managing-data/core-concepts/shards.md index 689171ef683..9072126e4e5 100644 --- a/i18n/jp/docusaurus-plugin-content-docs/current/managing-data/core-concepts/shards.md +++ b/i18n/jp/docusaurus-plugin-content-docs/current/managing-data/core-concepts/shards.md @@ -52,6 +52,7 @@ CREATE TABLE uk.uk_price_paid_simple_dist ON CLUSTER test_cluster price UInt32 ) ENGINE = Distributed('test_cluster', 'uk', 'uk_price_paid_simple', rand()) +``` `ON CLUSTER` 句により、DDL ステートメントは [分散 DDL ステートメント](/sql-reference/distributed-ddl) となり、ClickHouse に `test_cluster` [クラスター定義](/architecture/horizontal-scaling#replication-and-sharding-configuration) にリストされているすべてのサーバーでテーブルを作成するよう指示します。分散 DDL には、[クラスターアーキテクチャ](/architecture/horizontal-scaling#architecture-diagram) において追加の [Keeper](https://clickhouse.com/clickhouse/keeper) コンポーネントが必要です。 diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md b/i18n/zh/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md index 567c07d020c..0d0e7992055 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/cloud/reference/changelog.md @@ -201,7 +201,7 @@ Users can schedule upgrades for their services. This feature is supported for En [Golang](https://github.com/ClickHouse/clickhouse-go/releases/tag/v2.30.1), [Python](https://github.com/ClickHouse/clickhouse-connect/releases/tag/v0.8.11), and [NodeJS](https://github.com/ClickHouse/clickhouse-js/releases/tag/1.10.1) clients added support for Dynamic, Variant, and JSON types. -### DBT support for Refreshable Materialized Views {#dbt-support-for-refreshable-materialized-views} +### DBT support for refreshable materialized views {#dbt-support-for-refreshable-materialized-views} DBT now [supports Refreshable Materialized Views](https://github.com/ClickHouse/dbt-clickhouse/releases/tag/v1.8.7) in the `1.8.7` release. @@ -360,7 +360,7 @@ Compute-compute separation allows you to designate specific services as read-wri Customers using multi-factor authentication can now obtain recovery codes that can be used in the event of a lost phone or accidentally deleted token. Customers enrolling in MFA for the first time will be provided the code on set up. Customers with existing MFA can obtain a recovery code by removing their existing MFA token and adding a new one. -### ClickPipes Update: Custom Certificates, Latency Insights, and More! {#clickpipes-update-custom-certificates-latency-insights-and-more} +### ClickPipes update: custom certificates, latency insights, and more! {#clickpipes-update-custom-certificates-latency-insights-and-more} We're excited to share the latest updates for ClickPipes, the easiest way to ingest data into your ClickHouse service! These new features are designed to enhance your control over data ingestion and provide greater visibility into performance metrics. @@ -414,7 +414,7 @@ ClickPipes is the easiest way to ingest data into ClickHouse Cloud. We're happy ## July 18, 2024 {#july-18-2024} -### Prometheus Endpoint for Metrics is now Generally Available {#prometheus-endpoint-for-metrics-is-now-generally-available} +### Prometheus endpoint for metrics is now generally available {#prometheus-endpoint-for-metrics-is-now-generally-available} In our last cloud changelog, we announced the Private Preview for exporting [Prometheus](https://prometheus.io/) metrics from ClickHouse Cloud. This feature allows you to use the [ClickHouse Cloud API](/cloud/manage/api/api-overview) to get your metrics into tools like [Grafana](https://grafana.com/) and [Datadog](https://www.datadoghq.com/) for visualization. We're happy to announce that this feature is now **Generally Available**. Please see [our docs](/integrations/prometheus) to learn more about this feature. @@ -449,13 +449,13 @@ This release also includes support for subscriptions via the [Microsoft Azure Ma If you'd like any specific region to be supported, please [contact us](https://clickhouse.com/support/program). -### Query Log Insights {#query-log-insights} +### Query log insights {#query-log-insights} Our new Query Insights UI in the Cloud Console makes ClickHouse's built-in query log a lot easier to use. ClickHouse's `system.query_log` table is a key source of information for query optimization, debugging, and monitoring overall cluster health and performance. There's just one caveat: with 70+ fields and multiple records per query, interpreting the query log represents a steep learning curve. This initial version of query insights provides a blueprint for future work to simplify query debugging and optimization patterns. We'd love to hear your feedback as we continue to iterate on this feature, so please reach out—your input will be greatly appreciated! ClickHouse Cloud Query Insights UI showing query performance metrics and analysis -### Prometheus Endpoint for Metrics (Private Preview) {#prometheus-endpoint-for-metrics-private-preview} +### Prometheus endpoint for metrics (private preview) {#prometheus-endpoint-for-metrics-private-preview} Perhaps one of our most requested features: you can now export [Prometheus](https://prometheus.io/) metrics from ClickHouse Cloud to [Grafana](https://grafana.com/) and [Datadog](https://www.datadoghq.com/) for visualization. Prometheus provides an open-source solution to monitor ClickHouse and set up custom alerts. Access to Prometheus metrics for your ClickHouse Cloud service is available via the [ClickHouse Cloud API](/integrations/prometheus). This feature is currently in Private Preview. Please reach out to the [support team](https://clickhouse.com/support/program) to enable this feature for your organization. diff --git a/styles/ClickHouse/Headings.yml b/styles/ClickHouse/Headings.yml index 9bee1cb4225..45c1a2b21ad 100644 --- a/styles/ClickHouse/Headings.yml +++ b/styles/ClickHouse/Headings.yml @@ -7,62 +7,180 @@ match: $sentence indicators: - ":" exceptions: - - ClickHouse - - Cloud + - API + - ANN + - APPEND + - DELETE FROM + - ALTER DELETE + - DROP PARTITION + - TRUNCATE + - Arrow + - AWS + - AWS + - Apache + - AggregatingMergeTree + - Amazon + - Amazon Web Services - Azure + - Azure Blob Storage + - B2B + - BigQuery + - Bring Your Own Cloud + - Bun + - BYOC + - BYOC + - CPU + - CSV + - CDC + - CMEK - CLI + - ClickHouse + - ClickHouse Cloud + - ClickHouse Keeper + - ClickPipe + - ClickPipes + - ClickPipes Connector + - ClickStack + - ClickStack + - CloudFormation - Cosmos + - Customer Managed Encryption Keys + - DBT + - DBT + - DDL + - DNS + - Docker - Docker + - Docker Compose + - DigitalOcean Spaces + - Duo SAML + - EDOT + - Elastic Agent - Emmet - - gRPC + - FAQ + - Filebeat + - Frequently Asked Questions + - GA + - GCS + - GET + - GitHub + - Go + - Google + - Google Cloud + - Google Cloud Platform + - Google Cloud Storage + - HTTP + - Helm + - HyperDX + - HyperDX - I + - IdP + - JWT + - Java + - JSON + - JSON + - Kafka + - Kafka + - Kafka Connect + - KMS - Kubernetes + - LZ4 - Linux - - macOS - - Marketplace + - MERGETREE + - Microsoft + - Middle East - MongoDB + - NPM + - NodeJS + - OLAP + - OLTP + - ORC + - OSS + - OTel + - Okta + - OpenTelemetry + - Pandas + - Parquet + - PlanetScale + - Postgres + - PostgreSQL + - PSC + - Private Link + - Private Service Connect + - PrivateLink + - Prometheus + - Python + - RBAC - REPL + - REPLACE + - REST + - Role-Based Access Control + - SAML + - SOC + - SQL + - SQL + - SSO + - S3 + - SaaS + - SDK - Studio + - TDE + - TTL + - TiDB + - Time To Live + - Transparent Data Encryption - TypeScript + - U.S. + - UDF + - UI - URLs - - Visual + - UAE + - User Defined Functions + - VLDB - VS - - Windows - - JSON - - MergeTree - - ReplacingMergeTree - - AggregatingMergeTree - - DigitalOcean Spaces - - Azure Blob Storage - VPC - - BYOC - - TiDB - - PlanetScale - Vitess + - Visual + - Windows + - Yandex.Metrica + - ZSTD + - chDB + - ch_bucket_us_east1 + - ch_bucket_us_east4 + - gRPC + - macOS + - MergeTree - MySQL - - ClickPipe - - ClickPipes - - Postgres + - ReplacingMergeTree + - SLA + - SLAs + - Beta + - Preview + - Private Preview - CDC - - FAQ - - Amazon Web Services - - AWS - - Frequently Asked Questions - - PostgreSQL - - TTL - - ClickStack - - OpenTelemetry - - Filebeat - - Elastic Agent - - HyperDX - - Helm - - EDOT - - SDK - - Docker - - Time To Live - - Docker Compose - - Kafka - - Google Cloud Run - - NPM - - OTel - - SQL \ No newline at end of file + - DB + - URL + - Build + - Testing + - Packaging + - Tier + - Tiers + - Overview + - Console + - Endpoints + - Backups + - Thresholds + - Keys + - Routing + - Cloud + - Change Data Capture + - PostLinks + - PostHistory + - DateTime + - Stack Overflow + - Homebrew + - WSL + - London + - Query 2. Average price per year in London + - V24.5 changelog for Cloud + - V24.6 changelog for Cloud