diff --git a/sql-statements/sql-statement-admin-alter-ddl.md b/sql-statements/sql-statement-admin-alter-ddl.md index 3f8d77eed77b0..158f861d12e41 100644 --- a/sql-statements/sql-statement-admin-alter-ddl.md +++ b/sql-statements/sql-statement-admin-alter-ddl.md @@ -23,7 +23,7 @@ The following are the supported parameters for different DDL jobs and their corr - `ADD INDEX`: - `THREAD`: the concurrency of the DDL job. The initial value is set by `tidb_ddl_reorg_worker_cnt`. - `BATCH_SIZE`: the batch size. The initial value is set by [`tidb_ddl_reorg_batch_size`](/system-variables.md#tidb_ddl_reorg_batch_size). - - `MAX_WRITE_SPEED`: the maximum bandwidth limit for importing index records into each TiKV. The initial value is set by [`tidb_ddl_reorg_max_write_speed`](/system-variables.md#tidb_ddl_reorg_max_write_speed-new-in-v6512-v755-and-v850). + - `MAX_WRITE_SPEED`: the maximum bandwidth limit for importing index records into each TiKV on each TiDB node. The initial value is set by [`tidb_ddl_reorg_max_write_speed`](/system-variables.md#tidb_ddl_reorg_max_write_speed-new-in-v6512-v755-and-v850). Currently, the preceding parameters only work for `ADD INDEX` jobs that are submitted and running after [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) is disabled. diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 1bb57618741ae..be0b4273bc333 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -151,7 +151,7 @@ The supported options are described as follows: | `DISK_QUOTA=''` | All file formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | | `DISABLE_TIKV_IMPORT_MODE` | All file formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | | `THREAD=` | All file formats and query results of `SELECT` | Specifies the concurrency for import. For `IMPORT INTO ... FROM FILE`, the default value of `THREAD` is 50% of the number of CPU cores on the TiDB node, the minimum value is `1`, and the maximum value is the number of CPU cores. For `IMPORT INTO ... FROM SELECT`, the default value of `THREAD` is `2`, the minimum value is `1`, and the maximum value is two times the number of CPU cores on the TiDB node. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | -| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed to a TiKV node. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. | +| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed of each TiDB node to each TiKV. By default, there is no speed limit. For example, if you have 10 TiDB nodes, specifying this option as `1MiB` limits the write speed to 10 MiB/s for each TiKV. | | `CHECKSUM_TABLE=''` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | | `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | | `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. | @@ -327,12 +327,21 @@ IMPORT INTO t FROM '/path/to/file.sql' FORMAT 'sql'; #### Limit the write speed to TiKV -To limit the write speed to a TiKV node to 10 MiB/s, execute the following SQL statement: +Importing data may impact the performance of foreground workloads. In such scenario, it is recommended to limit the write speed to TiKV with `MAX_WRITE_SPEED`. + +For example, the following SQL statement limits the write speed to a TiKV node to 10 MiB/s for each TiDB node: ```sql IMPORT INTO t FROM 's3://bucket/path/to/file.parquet?access-key=XXX&secret-access-key=XXX' FORMAT 'parquet' WITH MAX_WRITE_SPEED='10MiB'; ``` +If you are importing data with DXF and global sort enabled, you can configure `MAX_WRITE_SPEED` as follows to mitigate the impact: + +1. Import a small dataset with unlimited speed. And you can monitor the average import speed through Grafana: TiDB > Import Into > Total encode/deliver/import-kv speed > Import KV. +2. Determine the upper limit of `MAX_WRITE_SPEED` using this formula: + - (Import Speed) x (Number of Replicas) / (Number of TiDB Nodes) / min(Number of TiKV Nodes, THREAD) +3. Set `MAX_WRITE_SPEED` to a lower value than the calculated to ensure workload performance, for example, 4-8× lower. + ## `IMPORT INTO ... FROM SELECT` usage `IMPORT INTO ... FROM SELECT` lets you import the query result of a `SELECT` statement to an empty table in TiDB. You can also use it to import historical data queried with [`AS OF TIMESTAMP`](/as-of-timestamp.md). diff --git a/system-variables.md b/system-variables.md index c31bde4afb2b9..c7e95ef62ad5f 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1741,11 +1741,12 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Type: String - Default value: `0` - Range: `[0, 1PiB]` -- This variable limits the write bandwidth for each TiKV node and only takes effect when index creation acceleration is enabled (controlled by the [`tidb_ddl_enable_fast_reorg`](#tidb_ddl_enable_fast_reorg-new-in-v630) variable). When the data size in your cluster is quite large (such as billions of rows), limiting the write bandwidth for index creation can effectively reduce the impact on application workloads. +- This variable limits the write bandwidth of each TiDB node to TiKV and only takes effect when index creation acceleration is enabled (controlled by the [`tidb_ddl_enable_fast_reorg`](#tidb_ddl_enable_fast_reorg-new-in-v630) variable). When the data size in your cluster is quite large (such as billions of rows), limiting the write bandwidth for index creation can effectively reduce the impact on application workloads. - The default value `0` means no write bandwidth limit. - You can specify the value of this variable either with a unit or without a unit. - When you specify the value without a unit, the default unit is bytes per second. For example, `67108864` represents `64MiB` per second. - When you specify the value with a unit, supported units include KiB, MiB, GiB, and TiB. For example, `'1GiB`' represents 1 GiB per second, and `'256MiB'` represents 256 MiB per second. +- When the Distributed eXecution Framework (DXF) is enabled, this write limit applies to each TiDB node seperately. For example, if you add index with 4 TiDB nodes, setting this variable to `64MiB` means the maximum write speed to one TiKV is `256MiB/s`. For instructions on configuring this variable, you can refer to the [limit-the-write-speed-to-tikv](https://docs.pingcap.com/tidb/stable/sql-statement-import-into/#limit-the-write-speed-to-tikv) section of IMPORT INTO documentation. The only difference is that you should monitor the speed through Grafana > TiDB > DDL > Add Index Backfill Import Speed. ### tidb_ddl_reorg_worker_cnt diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index d9fba98de07e5..9c91d360a0209 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -280,7 +280,7 @@ The `security` section specifies certificates and keys for TLS connections withi #### `store-write-bwlimit` -- Limits the bandwidth in which TiDB Lightning writes data into each TiKV node in the physical import mode. +- Limits the bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. - Default value: `0`, which means no limit. #### `disk-quota` diff --git a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md index 6caf6e4357817..958c99ac38b97 100644 --- a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md +++ b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md @@ -57,8 +57,8 @@ duplicate-resolution = 'none' # The directory of local KV sorting. sorted-kv-dir = "./some-dir" -# Limits the bandwidth in which TiDB Lightning writes data into each TiKV -# node in the physical import mode. 0 by default, which means no limit. +# Limits the bandwidth to write data into TiKV for each TiDB Lightning instance +# in the physical import mode. # store-write-bwlimit = "128MiB" # Specifies whether Physical Import Mode adds indexes via SQL. The default value is `false`, which means that TiDB Lightning will encode both row data and index data into KV pairs and import them into TiKV together. This mechanism is consistent with that of the historical versions. If you set it to `true`, it means that TiDB Lightning adds indexes via SQL after importing the row data. @@ -206,7 +206,7 @@ By default, TiDB Lightning pauses the cluster scheduling for the minimum range p ```toml [tikv-importer] -# Limits the bandwidth in which TiDB Lightning writes data into each TiKV node in the physical import mode. +# Limits the bandwidth to write data into TiKV for each TiDB Lightning instance in the physical import mode. store-write-bwlimit = "128MiB" [tidb]