-
Notifications
You must be signed in to change notification settings - Fork 701
add changefeed doc #21273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-8.1
Are you sure you want to change the base?
add changefeed doc #21273
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @shiyuhang0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces comprehensive documentation for the new TiDB Cloud Changefeed feature. It provides an overview of the changefeed's capabilities, how to manage it using the TiDB Cloud CLI, and detailed instructions for configuring data streaming to Apache Kafka, including various filter and Kafka-specific settings.
Highlights
- New Feature Documentation: Added a new overview document for the TiDB Cloud Changefeed feature, explaining its purpose (streaming data from TiDB Cloud to other services like Kafka), its current beta status, and general management operations.
- CLI Command Reference: Documented the TiDB Cloud CLI commands for interacting with changefeeds, covering listing, creating, pausing, resuming, editing, and deleting changefeeds.
- Apache Kafka Integration Guide: Provided a detailed guide on how to configure a changefeed to stream data to Apache Kafka, including network prerequisites, Kafka ACL authorization, and extensive configuration options for data formats (e.g., Canal-JSON, Avro, Open Protocol, Debezium), authentication, and topic partitioning strategies.
- Changefeed States and Limitations: Outlined the various states a changefeed can be in (e.g.,
CREATING
,RUNNING
,PAUSED
,WARNING
,RUNNING_FAILED
) and noted current limitations such as the number of changefeeds per cluster and table filter rules.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes add documentation for the Changefeed feature in TiDB Cloud, including an overview and a guide for sinking data to Apache Kafka. The documentation covers restrictions, prerequisites, configuration, and other details. The style guide was followed, with minor suggestions for clarity and grammar.
- `<cluster-id>`: the ID of the TiDB Cloud cluster that you want to create the changefeed for. | ||
- `<changefeed-name>`: the name of the changefeed, it is optional. If you do not specify a name, TiDB Cloud automatically generates a name for the changefeed. | ||
- type: the type of the changefeed, which is `KAFKA` in this case. | ||
- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See []() for more information about the configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a valid link here. Please provide a valid reference or remove the empty link.
- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See []() for more information about the configurations. | |
- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See the documentation for more information about the configurations. |
- `<changefeed-name>`: the name of the changefeed, it is optional. If you do not specify a name, TiDB Cloud automatically generates a name for the changefeed. | ||
- type: the type of the changefeed, which is `KAFKA` in this case. | ||
- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See []() for more information about the configurations. | ||
- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See []() for more information about the configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a valid link here. Please provide a valid reference or remove the empty link.
- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See []() for more information about the configurations. | |
- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See the documentation for more information about the configurations. |
TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. Currently, TiDB Cloud supports streaming data to Apache Kafka. | ||
> **Note:** | ||
> | ||
> - Currently, you can manager changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the style guide 1, when addressing users, use the second person ("you"). Consider rephrasing to directly address the user.
Style Guide References
> - Currently, you can manager changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). | |
> - Currently, you can manage changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). |
Footnotes
> | ||
> TiDB Cloud currently only allows editing changefeeds in the paused status. | ||
|
||
To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit with the TiDB Cloud CLI command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the style guide 1, when addressing users, use the second person ("you"). Consider rephrasing to directly address the user.
Style Guide References
To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit with the TiDB Cloud CLI command: | |
To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit it with the TiDB Cloud CLI command: |
Footnotes
ticloud serverless changefeed template --explain | ||
``` | ||
|
||
The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `filter` configuration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configurations in the kafka
JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a kafka
configuration, not a filter
configuration.
The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `filter` configuration: | |
The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `kafka` configuration: |
- `tls_enable`: Whether to enable TLS for the connection. | ||
- `compression`: The compression type for messages, support `NONE`, `GZIP`, `LZ4`, `SNAPPY`, and `ZSTD`. | ||
|
||
"DISABLE", "SASL_PLAIN", "SASL_SCRAM_SHA_256", "SASL_SCRAM_SHA_512" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
- **Distribute changelogs by timestamp to Kafka partition** | ||
|
||
If you want the changefeed to send Kafka messages to different Kafka partitions randomly, set `partition_type` to `TS`.. The commitTs of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures orderliness in each partition. However, multiple changes of a data item might be sent to different partitions and the consumer progress of different consumers might be different, which might cause data inconsistency. Therefore, the consumer needs to sort the data from multiple partitions by commitTs before consuming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "is send" should be "is sent".
If you want the changefeed to send Kafka messages to different Kafka partitions randomly, set `partition_type` to `TS`.. The commitTs of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures orderliness in each partition. However, multiple changes of a data item might be sent to different partitions and the consumer progress of different consumers might be different, which might cause data inconsistency. Therefore, the consumer needs to sort the data from multiple partitions by commitTs before consuming. | |
If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `COLUMN` and set the `columns`. The specified column values of a row changelog will determine which partition the changelog is sent to. This distribution method ensures orderliness in each partition and guarantees that the changelog with the same column values is sent to the same partition. |
@shiyuhang0: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?