-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Hot shard design documentation #12413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I think it's also worth documenting the max shard size is dynamically calculated in
int64_t getMaxShardSize(double dbSizeEstimate) {
int64_t size = std::min((SERVER_KNOBS->MIN_SHARD_BYTES + (int64_t)std::sqrt(std::max<double>(dbSizeEstimate, 0)) *
SERVER_KNOBS->SHARD_BYTES_PER_SQRT_BYTES) *
SERVER_KNOBS->SHARD_BYTES_RATIO,
(int64_t)SERVER_KNOBS->MAX_SHARD_BYTES);
i.e., a formula of (MIN_SHARD_BYTES + sqrt(DB_size) * SHARD_BYTES_PER_SQRT_BYTES) * SHARD_BYTES_RATIO
, and then max'ed at MAX_SHARD_BYTES
.
88560df
to
dd60954
Compare
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this under
Server Knobs Related to Hot Shards
and probably group it withHot Shard Throttling Knobs
I want to call out that this is a separate feature. We will be testing it soon and once it makes production we can update the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sounds like this should go under a different section? or maybe make the section not read-specific
probably makes sense for the section to not be read-specific
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
design/hotshard.md
Outdated
|
||
### Storage Server Impact | ||
|
||
- The storage servers hosting the hot shard become saturated with write traffic, causing their write queues to grow ([Ratekeeper:933](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L933), [Ratekeeper:973](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L973)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be blobWorker, not storage servers. Durability lag causes rate limiting at line 910.
design/hotshard.md
Outdated
|
||
- The storage servers hosting the hot shard become saturated with write traffic, causing their write queues to grow ([Ratekeeper:933](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L933), [Ratekeeper:973](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L973)). | ||
|
||
- Storage server metrics report high `bytesInput` and increasing queue depths, which are monitored by Ratekeeper ([Ratekeeper:933](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L933)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sees to be line 835 and 845.
design/hotshard.md
Outdated
|
||
- Storage server metrics report high `bytesInput` and increasing queue depths, which are monitored by Ratekeeper ([Ratekeeper:933](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L933)). | ||
|
||
- If the storage server's write queue size exceeds thresholds relative to available space and target ratios, it becomes the bottleneck for the entire cluster ([Ratekeeper:973](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L973)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 845?
design/hotshard.md
Outdated
|
||
### Cluster-Wide Rate Limiting | ||
|
||
- Ratekeeper continuously monitors storage server queues and bandwidth metrics to determine cluster-wide transaction rate limits ([Ratekeeper:933](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L933), [Ratekeeper:973](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L973)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line numbers are wrong
design/hotshard.md
Outdated
|
||
- Ratekeeper continuously monitors storage server queues and bandwidth metrics to determine cluster-wide transaction rate limits ([Ratekeeper:933](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L933), [Ratekeeper:973](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L973)). | ||
|
||
- When a storage server's write queue becomes excessive, Ratekeeper sets the limit reason to `storage_server_write_queue_size` and reduces the cluster-wide transaction rate proportionally ([Ratekeeper:973](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L973), [Ratekeeper:1440](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L1440)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
design/hotshard.md
Outdated
|
||
- This causes **all transactions across the cluster** to be throttled, even those not touching the hot shard, because the cluster cannot commit faster than its slowest storage server can durably persist data. | ||
|
||
- The rate limit is calculated as: `limitTps = min(actualTps * maxBytesPerSecond / inputRate, maxBytesPerSecond * MAX_TRANSACTIONS_PER_BYTE)`, ensuring the cluster doesn't overwhelm the saturated storage server ([Ratekeeper:933](https://github.com/apple/foundationdb/blob/7.3.0/fdbserver/Ratekeeper.actor.cpp#L933)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 795
limitTps = std::min(actualTps * maxBytesPerSecond / std::max(1.0e-8, inputRate),
maxBytesPerSecond * SERVER_KNOBS->MAX_TRANSACTIONS_PER_BYTE);
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and thank you!
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
- Prevents moving tiny/idle shards during rebalancing | ||
- Location: [ServerKnobs:368](https://github.com/apple/foundationdb/blob/7.3.0/fdbclient/ServerKnobs.cpp#L368) | ||
|
||
**`DD_ENABLE_REBALANCE_STORAGE_QUEUE_WITH_LIGHT_WRITE_SHARD`** (bool, default: `true`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature flag is ENABLE_REBALANCE_STORAGE_QUEUE instead of DD_ENABLE_REBALANCE_STORAGE_QUEUE_WITH_LIGHT_WRITE_SHARD. This feature is off by default.
Hot shard documentation
[X]explains what a hot shard is
[X] - explains how to avoid a hot shard (e.g. structure data and/or access patterns in a certain way)
[X] - explains the server management side options for fixing a hot shard, should it arise. Please emphasize that the options are rather bad / near non-existent and hot shards should be avoided by design for this reason
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)