-
Notifications
You must be signed in to change notification settings - Fork 701
best practices: add index best practices #21218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @hfxsd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces new documentation focused on best practices for index management and optimization within the TiDB database. The aim is to provide users with a detailed guide on how to leverage TiDB's features to improve query performance, reduce storage overhead, and streamline database operations through effective index auditing and maintenance.
Highlights
- New Documentation Added: A comprehensive new document,
best-practices/index-best-practices.md
, has been added. This document outlines best practices for managing and optimizing indexes in TiDB, covering topics such as identifying unused/inefficient indexes and safely testing index removal. - Table of Contents Update: The main Table of Contents (
TOC.md
) has been updated to include a link to the newly added 'Index Best Practices' document under the 'Best Practices' section. - Index Optimization Tools Explained: The new best practices guide details the use of TiDB's built-in observability tools, including
TIDB_INDEX_USAGE
,CLUSTER_TIDB_INDEX_USAGE
, andschema_unused_indexes
, along with the concept of invisible indexes, to help users effectively manage and optimize their database indexes.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes add a new best practices document for index management in TiDB. The document provides guidance on identifying and optimizing indexes, leveraging TiDB's observability tools, and safely testing index removal. The review focuses on ensuring clarity, adherence to the style guide, and providing suggestions for improved readability.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/bot-review |
✅ AI review completed, 1 comments generated. |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…into index-best-practices
@@ -0,0 +1,325 @@ | |||
--- | |||
title: Index Best Practices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The practice is mostly about how to observe and manage existing indexes. The title is too big to me. It's expected "index practice" includes creating new indexes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identifying unused indexes is part of "index best practice".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
|
||
- Delayed data updates | ||
|
||
The data is refreshed periodically to minimize performance impact. If index usage is analyzed immediately after a query execution, allow some time for the metrics to update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data shown in the CLUSTER_TIDB_INDEX_USAGE
is always synchornized with TIDB_INDEX_USAGE
. There is no delay between these two tables and they are all memory-based storage. List this in Considerations when using
CLUSTER_TIDB_INDEX_USAGEmay confuse the user that
CLUSTER_TIDB_INDEX_USAGEhas delay while
TIDB_INDEX_USAGE` doesn't have.
Actually, both of them can be delayed for at most 5 minutes (ref https://docs.pingcap.com/tidb/stable/information-schema-tidb-index-usage/).
|
||
### Manually create the `schema_unused_indexes` view | ||
|
||
Because `TIDB_INDEX_USAGE` is cleared after a TiDB node restarts, ensure that the node has been running for a sufficient amount of time before making decisions. For clusters upgraded from an earlier version to TiDB v8.0.0 or later, you must manually create the system schema and the included views. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Because TIDB_INDEX_USAGE
is cleared after a TiDB node restarts, ensure that the node has been running for a sufficient amount of time before making decisions." This sentence is duplicated with "Ensure the system has been running long enough to capture a representative workload before relying on this data." above, and have no connection with "Manually create the schema_unused_indexes
view" as the section title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
TiDB v8.0.0 introduces the [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) table and the [`schema_unused_indexes`](/sys-schema/sys-schema-unused-indexes.md) table to help you track index usage patterns and make data-driven decisions. | ||
|
||
Because indexes evolve with changing business logic, regular index audits are a standard part of database maintenance. TiDB provides built-in observability tools to help you detect, evaluate, and optimize indexes without risk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapping the two paragraphs makes more sense to me.
- `TIDB_INDEX_USAGE`: monitors index usage patterns and query frequency. | ||
- `schema_unused_indexes`: lists indexes that have not been used since the database is last restarted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `TIDB_INDEX_USAGE`: monitors index usage patterns and query frequency. | |
- `schema_unused_indexes`: lists indexes that have not been used since the database is last restarted. | |
- `INFORMATION_SCHEMA`.`TIDB_INDEX_USAGE`: monitors index usage patterns and query frequency. | |
- `mysql`.`schema_unused_indexes`: lists indexes that have not been used since the database is last restarted. |
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?