pingcap · hfxsd · Jun 23, 2025 · Jun 23, 2025 · Jun 23, 2025 · Jun 23, 2025
diff --git a/TOC.md b/TOC.md
@@ -423,6 +423,7 @@
     - [Use the `tidb_snapshot` System Variable](/read-historical-data.md)
   - Best Practices
     - [Use TiDB](/best-practices/tidb-best-practices.md)
+    - [Index Best Practices](/best-practices/index-best-practices.md)
     - [Java Application Development](/best-practices/java-app-best-practices.md)
     - [Use HAProxy](/best-practices/haproxy-best-practices.md)
     - [Highly Concurrent Write](/best-practices/high-concurrency-best-practices.md)

diff --git a/best-practices/index-best-practices.md b/best-practices/index-best-practices.md
@@ -0,0 +1,325 @@
+---
+title: Index Best Practices
-title: Index Best Practices
+title: Best Practices for Managing Indexes and Identifying Unused Indexes
-title: Index Best Practices
+title: Best Practices for Managing Indexes and Identifying Unused Indexes
+summary: This document summarizes best practices for using indexes in TiDB.
-summary: This document summarizes best practices for using indexes in TiDB.
+summary: This document summarizes best practices for managing indexes and identifying unused indexes.
-summary: This document summarizes best practices for using indexes in TiDB.
+summary: This document summarizes best practices for managing indexes and identifying unused indexes.
+---
+
+# Index Best Practices
-# Index Best Practices
+# Best Practices for Managing Indexes and Identifying Unused Indexes
-# Index Best Practices
+# Best Practices for Managing Indexes and Identifying Unused Indexes
+
+Indexes are essential for optimizing database query performance, reducing the need to scan large amounts of data. However, as applications evolve, business logic changes, and data volume grows, indexing inefficiencies emerge, including the following:
+
+- Unused indexes: these indexes are once relevant but are no longer selected by the query optimizer, consuming storage and adding unnecessary overhead to write operations.
+- Inefficient indexes: some indexes are used by the optimizer but scan more data than expected, increasing disk I/O and slowing down query performance.
+
+If left unaddressed, these indexing issues can cause higher storage costs, degraded performance, and operational inefficiencies. In a distributed SQL database like TiDB, indexing inefficiencies have an even greater impact due to the scale of distributed queries and the complexity of multi-node coordination. That is why regular index audits are crucial for maintaining an optimized database. 
+
+Proactively identifying and optimizing indexes helps:
+
+- Reduce storage overhead: removing unused indexes frees up disk space and reduces long-term storage costs.
+- Improve write performance: write-heavy workloads (such as `INSERT`, `UPDATE`, and `DELETE`) perform better when unnecessary index maintenance is eliminated.
+- Optimize query execution: efficient indexes reduce the number of rows scanned, improving query speed and response times.
+- Streamline database management: fewer and well-optimized indexes simplify backups, recovery, and schema changes.
+
+TiDB v8.0.0 introduces the [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) table and the [`schema_unused_indexes`](/sys-schema/sys-schema-unused-indexes.md) table to help you track index usage patterns and make data-driven decisions. 
+
+Because indexes evolve with changing business logic, regular index audits are a standard part of database maintenance. TiDB provides built-in observability tools to help you detect, evaluate, and optimize indexes without risk.
-TiDB v8.0.0 introduces the [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) table and the [`schema_unused_indexes`](/sys-schema/sys-schema-unused-indexes.md) table to help you track index usage patterns and make data-driven decisions. 
-
-Because indexes evolve with changing business logic, regular index audits are a standard part of database maintenance. TiDB provides built-in observability tools to help you detect, evaluate, and optimize indexes without risk.
+Because indexes evolve with changing business logic, regular index audits are a standard part of database maintenance. TiDB provides built-in observability tools to help you detect, evaluate, and optimize indexes without risk.
+
+TiDB v8.0.0 introduces the [`INFORMATION_SCHEMA.TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) table and the [`mysql.schema_unused_indexes`](/sys-schema/sys-schema-unused-indexes.md) table to help you track index usage patterns and make data-driven decisions. 
-TiDB v8.0.0 introduces the [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) table and the [`schema_unused_indexes`](/sys-schema/sys-schema-unused-indexes.md) table to help you track index usage patterns and make data-driven decisions. 
-
-Because indexes evolve with changing business logic, regular index audits are a standard part of database maintenance. TiDB provides built-in observability tools to help you detect, evaluate, and optimize indexes without risk.
+Because indexes evolve with changing business logic, regular index audits are a standard part of database maintenance. TiDB provides built-in observability tools to help you detect, evaluate, and optimize indexes without risk.
+
+TiDB v8.0.0 introduces the [`INFORMATION_SCHEMA.TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) table and the [`mysql.schema_unused_indexes`](/sys-schema/sys-schema-unused-indexes.md) table to help you track index usage patterns and make data-driven decisions. 
+
+This document describes the tools that you can use to detect and eliminate unused or inefficient indexes, thus improving TiDB's performance and stability.
+
+## TiDB index optimization: a data-driven approach
+
+Indexes are essential for query performance, but removing them without proper analysis can lead to unexpected regressions or even system instability. To ensure safe and effective index management, TiDB provides built-in observability tools that allow you to:
+
+- Track index usage in real-time: identify how often an index is accessed and whether it contributes to performance improvements.
+- Detect unused indexes: locate indexes that have not been used since the database is last restarted.
+- Assess index efficiency: evaluate whether an index filters data effectively or causes excessive I/O overhead.
+- Safely test index removal: temporarily make an index invisible before deleting it to ensure no queries depend on it.
+
+TiDB simplifies index optimization by introducing the following powerful tools:
+
+- `TIDB_INDEX_USAGE`: monitors index usage patterns and query frequency.
+- `schema_unused_indexes`: lists indexes that have not been used since the database is last restarted.
- `TIDB_INDEX_USAGE`: monitors index usage patterns and query frequency.
- `schema_unused_indexes`: lists indexes that have not been used since the database is last restarted.
+- `INFORMATION_SCHEMA.TIDB_INDEX_USAGE`: monitors index usage patterns and query frequency.
+- `mysql.schema_unused_indexes`: lists indexes that have not been used since the database is last restarted.
- `TIDB_INDEX_USAGE`: monitors index usage patterns and query frequency.
- `schema_unused_indexes`: lists indexes that have not been used since the database is last restarted.
+- `INFORMATION_SCHEMA.TIDB_INDEX_USAGE`: monitors index usage patterns and query frequency.
+- `mysql.schema_unused_indexes`: lists indexes that have not been used since the database is last restarted.
+- Invisible indexes: allows you to test the impact of removing an index before permanently deleting it.
+
+By using these observability tools, you can confidently clean up redundant indexes without risking performance degradation.
+
+## Track index usage with `TIDB_INDEX_USAGE`
+
+Introduced in [TiDB v8.0.0](/releases/release-8.0.0.md), the `TIDB_INDEX_USAGE` system table provides real-time insights into how indexes are used, helping you optimize query performance and remove unnecessary indexes.
+
+With `TIDB_INDEX_USAGE`, you can:
+
+- Detect unused indexes: identify indexes that have not been accessed by queries, helping determine which ones can be safely removed.
+- Analyze index efficiency: track how frequently an index is used and whether it contributes to efficient query execution.
+- Evaluate query patterns: understand how indexes affect read operations, data scans, and key-value (KV) requests.
+
+Starting from [TiDB v8.4.0](/releases/release-8.4.0.md), the table also includes primary keys in clustered tables, offering deeper visibility into index performance.
+
+### Key metrics in `TIDB_INDEX_USAGE`
+
+Run the following SQL statement to check the fields in `TIDB_INDEX_USAGE`:
+
+```sql
+USE INFORMATION_SCHEMA;
+DESC TIDB_INDEX_USAGE;
+```
+
+```sql
++--------------------------+-------------+------+------+---------+-------+
+| Field                    | Type        | Null | Key  | Default | Extra |
++--------------------------+-------------+------+------+---------+-------+
+| TABLE_SCHEMA             | varchar(64) | YES  |      | NULL    |       |
+| TABLE_NAME               | varchar(64) | YES  |      | NULL    |       |
+| INDEX_NAME               | varchar(64) | YES  |      | NULL    |       |
+| QUERY_TOTAL              | bigint(21)  | YES  |      | NULL    |       |
+| KV_REQ_TOTAL             | bigint(21)  | YES  |      | NULL    |       |
+| ROWS_ACCESS_TOTAL        | bigint(21)  | YES  |      | NULL    |       |
+| PERCENTAGE_ACCESS_0      | bigint(21)  | YES  |      | NULL    |       |
+| PERCENTAGE_ACCESS_0_1    | bigint(21)  | YES  |      | NULL    |       |
+| PERCENTAGE_ACCESS_1_10   | bigint(21)  | YES  |      | NULL    |       |
+| PERCENTAGE_ACCESS_10_20  | bigint(21)  | YES  |      | NULL    |       |
+| PERCENTAGE_ACCESS_20_50  | bigint(21)  | YES  |      | NULL    |       |
+| PERCENTAGE_ACCESS_50_100 | bigint(21)  | YES  |      | NULL    |       |
+| PERCENTAGE_ACCESS_100    | bigint(21)  | YES  |      | NULL    |       |
+| LAST_ACCESS_TIME         | datetime    | YES  |      | NULL    |       |
++--------------------------+-------------+------+------+---------+-------+
+14 rows in set (0.00 sec)
+```
+
+For the explanations of these columns, see [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md).
+
+### Identify unused and inefficient indexes using `TIDB_INDEX_USAGE`
+
+Identify unused and inefficient indexes using `TIDB_INDEX_USAGE` as follows:
+
+- Unused indexes:
+
+    - If `QUERY_TOTAL = 0`, the index has not been used by any queries.
+    - If `LAST_ACCESS_TIME` shows a long time ago, the index might no longer be relevant.
+
+- Inefficient indexes:
+
+    - Large values in `PERCENTAGE_ACCESS_100` suggest full index scans, which might indicate an inefficient index.
+    - Compare `ROWS_ACCESS_TOTAL` and `QUERY_TOTAL` to determine whether the index scans too many rows relative to its usage.
+
+By leveraging `TIDB_INDEX_USAGE`, you can gain detailed insights into index performance, making it easier to remove unnecessary indexes and optimize query execution.
+
+### Considerations when using `TIDB_INDEX_USAGE`
+
+Take the following into consideration when you use `TIDB_INDEX_USAGE`.
+
+#### Data updates are delayed
+
+To minimize performance impact, `TIDB_INDEX_USAGE` does not update instantly. Index usage metrics might be delayed by up to 5 minutes. Take this delay into consideration when you analyze queries.
+
+#### Index usage data is not persisted
+
+`TIDB_INDEX_USAGE` stores data in memory, meaning it does not persist across node restarts.
+
+If a TiDB node is restarted, all index usage statistics from that node will be cleared.
+
+#### Track historical data
+
+You can periodically export index usage snapshots using the following SQL statement:
+
+```sql
+SELECT * FROM INFORMATION_SCHEMA.TIDB_INDEX_USAGE INTO OUTFILE '/backup/index_usage_snapshot.csv';
+```
+
+This allows for historical tracking by comparing snapshots over time to detect trends in index usage and make more informed pruning decisions.
+
+## Consolidate index usage data across TiDB nodes using `CLUSTER_TIDB_INDEX_USAGE`
+
+Because TiDB is a distributed SQL database, query workloads are spread across multiple nodes. Each TiDB node tracks its own local index usage. For a global view of index performance, TiDB provides the [`CLUSTER_TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md#cluster_tidb_index_usage) system table. This view consolidates index usage data from all TiDB nodes, ensuring that distributed query workloads are fully accounted for when optimizing indexing strategies.
+
+Unlike `TIDB_INDEX_USAGE`, which provides insights at the node level, this cluster-wide view `CLUSTER_TIDB_INDEX_USAGE` lets you:
+
+- Detect inconsistencies in index usage. For example, an index might be frequently used on some nodes but unused on others.
+- Analyze global index patterns for distributed queries, ensuring indexing decisions reflect real-world workload distribution.
+- Optimize indexing strategies across all nodes, improving query efficiency for multi-node deployments.
+
+Different TiDB nodes might experience different query workloads. An index that appears unused on some nodes might still be critical elsewhere. To segment index analysis by workload, run the following SQL statement:
+
+```sql
+SELECT INSTANCE, TABLE_NAME, INDEX_NAME, SUM(QUERY_TOTAL) AS total_queries
+FROM INFORMATION_SCHEMA.CLUSTER_TIDB_INDEX_USAGE
+GROUP BY INSTANCE, TABLE_NAME, INDEX_NAME
+ORDER BY total_queries DESC;
+```
+
+This helps determine whether an index is truly unused across all nodes or only for specific instances, allowing you to make informed decisions on index removal.
+
+### Key differences between `TIDB_INDEX_USAGE` and `CLUSTER_TIDB_INDEX_USAGE`
+
+The following tables show the key differences between `TIDB_INDEX_USAGE` and `CLUSTER_TIDB_INDEX_USAGE`.
+
+| Feature          | `TIDB_INDEX_USAGE`                                   | `CLUSTER_TIDB_INDEX_USAGE`                              |
+| ---------------- | ---------------------------------------------------- | ------------------------------------------------------- |
+| Scope            | Tracks index usage within a single database.         | Aggregates index usage across the entire TiDB cluster   |
+| Index tracking   | Data is local to each database.                      | Provides a centralized cluster-wide view                |
+| Primary use case | Debugs index usage at the database instance level    | Analyzes global index patterns and multi-node behavior  |
+
+### Considerations when using `CLUSTER_TIDB_INDEX_USAGE`
+
+Because this system table consolidates data from multiple nodes, consider the following:
+
+- Delayed data updates
+
+    The data is refreshed periodically to minimize performance impact. If index usage is analyzed immediately after a query execution, allow some time for the metrics to update.
-    The data is refreshed periodically to minimize performance impact. If index usage is analyzed immediately after a query execution, allow some time for the metrics to update.
+    To minimize performance impact, `CLUSTER_TIDB_INDEX_USAGE` does not update instantly. Index usage metrics might be delayed by up to 5 minutes. Take this delay into consideration when you analyze queries.
-    The data is refreshed periodically to minimize performance impact. If index usage is analyzed immediately after a query execution, allow some time for the metrics to update.
+    To minimize performance impact, `CLUSTER_TIDB_INDEX_USAGE` does not update instantly. Index usage metrics might be delayed by up to 5 minutes. Take this delay into consideration when you analyze queries.
+
+- Memory-based storage
+
+    Like `TIDB_INDEX_USAGE`, this system table does not persist data across node restarts. If a node goes down, its recorded index usage data will be lost.
+
+By using `CLUSTER_TIDB_INDEX_USAGE`, you can gain a global perspective on index behavior, ensuring indexing strategies are aligned with distributed query workloads.
+
+## Identify unused indexes using `schema_unused_indexes`
+
+Manually analyzing index usage data can be time-consuming. To simplify this process, TiDB provides `schema_unused_indexes`, a system view that lists indexes that have not been used since the database is last restarted.
+
+This provides a quick way for you to:
+
+- Identify indexes that are no longer in use, reducing unnecessary storage costs.
+- Speed up DML operations by eliminating indexes that add overhead to `INSERT`, `UPDATE`, and `DELETE` queries.
+- Streamline index audits without needing to manually analyze query patterns.
+
+By using `schema_unused_indexes`, you can quickly identify unnecessary indexes and reduce database overhead with minimal effort.
+
+### How `schema_unused_indexes` works
+
+The `schema_unused_indexes` view is derived from `TIDB_INDEX_USAGE`, meaning it automatically filters out indexes that have recorded zero query activity since the last TiDB restart.
+
+To retrieve a list of unused indexes, run the following SQL statement:
+
+```sql
+SELECT * FROM sys.schema_unused_indexes;
+```
+
+A result similar to the following is returned:
+
+```
++-----------------+---------------+--------------------+
+| object_schema   | object_name   | index_name         |
++---------------- + ------------- + -------------------+
+| bookshop        | users         | nickname           |
+| bookshop        | ratings       | uniq_book_user_idx |
++---------------- + ------------- + -------------------+
+```
+
+### Considerations when using `schema_unused_indexes`
+
+#### Indexes are considered unused only since the last restart
+
+- If a TiDB node restarts, the usage tracking data is reset.
+- Ensure the system has been running long enough to capture a representative workload before relying on this data.
+
+#### Not all unused indexes can be dropped immediately
+
+Some indexes might be rarely used but still essential for specific queries, batch jobs, or reporting tasks. Before dropping an index, consider whether it supports the following:
+
+- Rare but essential queries, for example, monthly reports, analytics
+- Batch processing jobs that do not run daily
+- Ad-hoc troubleshooting queries
+
+If the index appears in important but infrequent queries, it is recommended to keep it or make it invisible first.
+
+You can use [invisible indexes](#safely-test-index-removal-with-invisible-indexes) to safely test whether an index can be removed without impacting performance.
+
+### Manually create the `schema_unused_indexes` view
+
+Because `TIDB_INDEX_USAGE` is cleared after a TiDB node restarts, ensure that the node has been running for a sufficient amount of time before making decisions. For clusters upgraded from an earlier version to TiDB v8.0.0 or later, you must manually create the system schema and the included views. 
-Because `TIDB_INDEX_USAGE` is cleared after a TiDB node restarts, ensure that the node has been running for a sufficient amount of time before making decisions. For clusters upgraded from an earlier version to TiDB v8.0.0 or later, you must manually create the system schema and the included views. 
+For clusters upgraded from an earlier version to TiDB v8.0.0 or later, you must manually create the system schema and the included views. 
-Because `TIDB_INDEX_USAGE` is cleared after a TiDB node restarts, ensure that the node has been running for a sufficient amount of time before making decisions. For clusters upgraded from an earlier version to TiDB v8.0.0 or later, you must manually create the system schema and the included views. 
+For clusters upgraded from an earlier version to TiDB v8.0.0 or later, you must manually create the system schema and the included views. 
+
+For more information, see [Manually create the `schema_unused_indexes` view](/sys-schema/sys-schema-unused-indexes.md#manually-create-the-schema_unused_indexes-view). 
+
+## Safely test index removal with invisible indexes
+
+Removing an index without proper validation can lead to unexpected performance issues, especially if the index is infrequently used but still critical for certain queries. 
+
+To mitigate this risk, TiDB provides invisible indexes, allowing you to temporarily disable an index without deleting it. By using invisible indexes, you can validate index removal decisions without risk, ensuring a more controlled and predictable database optimization process.
+
+### What are invisible indexes?
+
+An invisible index remains in the database but is ignored by the TiDB optimizer. You can use [`ALTER TABLE ... INVISIBLE`](/sql-statements/sql-statement-alter-table.md) to make an index invisible to test whether the index is truly unnecessary without permanently removing it.
+
+Key benefits of invisible indexes include:
+
+- **Safe index testing**: queries will no longer use the index, but it can be quickly restored if needed.
+- **Zero disruption to index storage**: the index remains intact, ensuring no need for costly re-creation.
+- **Performance monitoring**: as a DBA, you can observe query behavior without the index before making a final decision.
+
+### Make an index invisible
+
+To make an index invisible without dropping it, run a SQL statement similar to the following:
+
+```sql
+ALTER TABLE bookshop.users ALTER INDEX nickname INVISIBLE;
+```
+
+After making the index invisible, observe the system's query performance:
+
+- If performance remains unchanged, the index is likely unnecessary and can be safely removed.
+- If query latency increases, the index might still be needed.
+
+### Considerations when using invisible indexes
+
+- **Test during off-peak hours**: monitor performance impact in a controlled environment.
+- **Use query monitoring tools**: analyze query execution plans before and after marking an index invisible.
+- **Confirm over multiple workloads**: ensure that the index is not needed for specific reports or scheduled queries.
+
+### How long can an index remain invisible?
+
+- OLTP workloads: monitor for at least one week to account for daily variations.
+- Batch processing or ETL workloads: allow one full reporting cycle, for example, a monthly financial report.
+- Ad-hoc analytical queries: use query logs to confirm that the index is not needed before dropping it.
+
+For safety, keep the index invisible for at least one full business cycle to ensure all workloads have been tested before making a final decision.
+
+## Top five best practices for index optimization
+
+To maintain high performance and efficient resource usage, regular index optimization is part of database maintenance. The following are the best practices for managing indexes effectively in TiDB.
+
+1. Monitor index usage regularly.
+
+    - Use [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) and [`CLUSTER_TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md#cluster_tidb_index_usage) to track index activity.
+    - Identify unused indexes with [`schema_unused_indexes`](/sys-schema/sys-schema-unused-indexes.md) and evaluate whether they can be removed.
+    - Monitor query execution plans to detect inefficient indexes that might cause excessive I/O.
+
+2. Validate before removing indexes.
+
+    - Use [`ALTER TABLE ... INVISIBLE`](/sql-statements/sql-statement-alter-table.md) to make an index invisible to temporarily disable an index and observe the impact before permanent deletion.
+    - If query performance remains stable, proceed with index removal.
+    - Ensure a sufficient observation period to account for all query patterns before making a final decision.
+
+3. Optimize existing indexes.
+
+    - Consolidating redundant indexes can reduce storage overhead and improve write performance. If multiple indexes serve similar queries, they might be candidates for merging into a single, more efficient index. 
+
+        - To find indexes with overlapping prefixes (which might indicate redundancy), run the following:
+
+            ```sql
+            SELECT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME, COLUMN_NAME, SEQ_IN_INDEX
+            FROM INFORMATION_SCHEMA.STATISTICS
+            WHERE TABLE_NAME = 'your_table'
+            ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, SEQ_IN_INDEX;
+            ```
+        
+        - If two indexes have the same leading columns, consider merging them into a composite index instead.
+
+    - Improve selectivity. Low-selectivity indexes (those filtering too many rows) can be optimized by:
+
+        - Adding additional columns to improve filtering efficiency.
+        - Changing index structure (for example, prefix indexes, composite indexes).
+
+    - Analyze index selectivity. Use `PERCENTAGE_ACCESS_*` fields in `TIDB_INDEX_USAGE` to evaluate how well an index filters data.
+
+4. Be mindful of DML performance impact.
+
+    - Avoid excessive indexing. Each additional index increases overhead on `INSERT`, `UPDATE`, and `DELETE` operations.
+    - Index only what is necessary for queries to minimize the maintenance cost on write-heavy workloads.
+
+5. Test and tune regularly.
+
+    - Perform index audits periodically, especially after significant workload changes.
+    - Use TiDB's execution plan analysis tools to verify whether indexes are being used optimally.
+    - When adding new indexes, test them in an isolated environment first to prevent unexpected regressions.
+
+By following these best practices, you can ensure efficient query execution, reduce unnecessary storage overhead, and maintain optimal database performance.