A77: xDS Server-Side Rate Limiting #414

sergiitk · 2024-01-24T16:07:00Z

Rate Limit Quota Service (RLQS) gRFC.

Rendered version: https://github.com/sergiitk/grfc-proposal/blob/a77-rlqs/A77-xds-rate-limiting-rlqs.md

markdroth

Thanks for writing this up! Overall, this looks good, but I have few comments around making it clearer to readers.

Please let me know if you have any questions. Thanks!

A77-xds-rate-limiting-rlqs.md

markdroth

This is moving in the right direction!

Please let me know if you have any questions. Thanks!

markdroth · 2025-05-01T19:47:41Z

A77-xds-rate-limiting-rlqs.md

+-   RLQS Client: Accessed when we get the first data plane RPC for a given
+    bucket and when a report timer fires. Notifies RLQS Filter State of
+    responses received from the RLQS server.
+-   Report Timers Map: Initialized at instantiation. Used to track discovered


Why is this a map? You don't explicitly say what the keys or values are, but from reading the description of the timer-firing event below, I'm guessing that the idea here is that the keys are timer durations and the values are the timers.

This implies that all buckets with the same reporting interval will use the same timer. However, it's unlikely that all buckets with the same reporting interval will have come into existence at the exact same moment, which implies that the first report for one of the buckets is going to wind up happening sooner than its scheduled time. For example, consider the following timeline:

t=0: Bucket A is created with reporting interval 5s. Timer is created for 5s.

t=3s: Bucket B is created with reporting interval 5s. There is already a timer for the 5s interval, so no new timer is created.

t=5s: Timer fires. It sends a load report covering 5s for bucket A but only 2s for bucket B.

Is it okay for the initial report for bucket B to be shorter than configured in this case? If not, then I don't see how we can share timers between the two buckets. And if we actually need a separate timer per bucket, then it seems like we should store the timers in the bucket map instead of having a separate map.

Actually, do we even need multiple timers here to begin with? Could we instead have a single timer that is set to fire when the next load report needs to be sent, and then have it scan through the bucket map to determine the duration at which to schedule the next load report?

Another thing to note here is that scanning through the whole bucket map seems like it could potentially waste a fair amount of CPU, especially if the reporting interval is short and the number of buckets is large. From the description of the timer-firing event below, it seems like you were planning to have the timer do that anyway, but it might be a good idea to consider whether there are better alternatives. For example, we could have each entry in the timer map store a pointer to each of the buckets that use that same reporting interval, so that when the timer fires, it already knows which buckets to include in the load report.

I've mentioned a bunch of possible implementations here, because I want us to think about the possible ways to do this as a way of nailing down what the actual requirements are. However, I want this document to specify those requirements without dictating the implementation.

I think we need to cover things like the following:

The RLQS Filter State is responsible for handling load report timers.

Implementations can (should?) use a single load report schedule for all buckets that have the same interval, even if that means that the initial report for a given bucket will sometimes be sooner than the configured interval.

We need to specify any scalability requirements -- e.g., if we need to support a large number of buckets and very short load reporting intervals, we should say that.

We shouldn't dictate exactly how the implementation works. If you want, we can suggest (in just a sentence or two) a couple of possible implementation options, but implementations should be free to do something else as long as they meet the stated requirements.

Thanks for the feedback. Yeah, there's some room for improvement here. I'll try to make things clearer in this comment and update gRPF text accordingly.

Is it okay for the initial report for bucket B to be shorter than configured in this case?

Yes, this is acceptable. I've always intended RLQS protocol as "best effort".

There are other special cases where before-the-timer reporting is explicitly allowed, f.e. replacing the assignment.

If the rate limiting strategy is different from bucket's active assignment, or
the current bucket assignment is expired, the data plane must immediately
end the current assignment, report the bucket usage, and apply the new assignment.

It's RLQS server's job to aggregate the requests as needed (CAT cycle, Collect+Aggregate+Throttle), even if they come early.

scanning through the whole bucket map seems like it could potentially waste a fair amount of CPU, especially if the reporting interval is short and the number of buckets is large

This is covered under RLQS Bucket Map: "RLQS Bucket Map allows to retrieve all buckets that need to be reported to the RLQS server at a given reporting interval." I've recommended the "Buckets Per Interval Map" for that. However, I'll rethink how it should be phrased so it's not mandates a specific implementation decision.

Could we instead have a single timer that is set to fire when the next load report needs to be sent, and then have it scan through the bucket map to determine the duration at which to schedule the next load report?

You're right. I've actually considered this exact approach even java. But it's a great example why I should've focused on the requirements rather than implementation details.

A77-xds-rate-limiting-rlqs.md

markdroth · 2025-05-01T21:07:42Z

A77-xds-rate-limiting-rlqs.md

+        bucket, or when instructed by the RLQS Server.
+    -   Entries deleted either by RLQS server via `abandon_action`, or on report
+        timers if a bucket's active assignment expires.
+-   Buckets Per Interval Map: Initialized empty at instantiation, thread-safe.


See my comment above about focusing on requirements instead of implementation details for this kind of low-level thing.

But with regard to this particular implementation choice, if we already have a separate map by report interval to store the timers in, it's not clear to me why we are duplicating that map here. We should not need two separate maps to do the same thing.

if we already have a separate map by report interval to store the timers in, it's not clear to me why we are duplicating that map here

This is a basic lookup map so that we don't have to scan full Bucket Map every time. As you pointed out, doing this every time is pretty much an unnecessary waste of CPU.

The idea was something like this

def timer_fire(self, interval): buckets_to_report: list[Bucket] = self.buckets_by_interval[interval] report = Report() for bucket in buckets_to_report: report.add(bucket.snapshot_stats()) self.rlqs_client.send_report(report)

However, as I pointed out in the other comment, this is too implementation-specific.

markdroth · 2025-05-01T21:19:08Z

A77-xds-rate-limiting-rlqs.md

+When a report timer fires, the RLQS Filter State retrieves all buckets with the
+corresponding reporting interval from the RLQS Bucket Map. For each bucket, the
+filter snapshots the current usage counters, and resets them. The filter then
+sends the snapshot to RLQS server using RLQS Client.


Are there requirements in terms of when exactly we reschedule the timer, and how that interacts with flow control on the RLQS stream?

Imagine that a bucket is configured for a 5-second load reporting interval, but when the load report timer fires, it takes 3 seconds to send the load report on the RLQS stream due to flow control. Do we schedule the next load report as soon as we start the write on the RLQS stream, in which case it will fire 2 seconds after the previous write finished? Or do we wait for the write to complete before we schedule the next load report, so that the next load report will start 5 seconds later but actually include 7 seconds worth of data?

What if the reporting interval is 5s but it takes 8s for the last report to clear flow control? We probably don't want to queue up writes indefinitely, since that would defeat the purpose of flow control.

We may instead want to structure this as a single load report timer with an ordered queue of all buckets for which load reports need to be sent by a given time. That way, when a write finishes, we can check if any load reports are overdue, in which case we can start them all immediately, and if not, we can schedule the next timer for the time at which the next load reports are due. But again, the document should discuss requirements at this level, not dictate the specific implementation.

Very good point.

markdroth · 2025-05-01T23:56:48Z

A77-xds-rate-limiting-rlqs.md

+   that `bucket_id` is represented as a `Map<String, String>`, which may
+   introduce complexities in efficient cache sharding for certain programming
+   languages.
+2. Incrementing `num_request_allowed`/`num_request_denied` bucket counters.


The wording here is good -- no need to change anything -- but note that the use-case here is very similar to what we do in outlier detection, so implementations can probably use a similar approach to what we do there for avoiding lock contention.

Noted. I've discussed cpp implementation approaches for counter increments with @yashykt a while back, and we concluded that is doable and can be done in efficient manner. With Java, @ejona86 and I came up with a few efficient options too. I'll check out the outlier detection to see if we can lift any some applicable wording directly from it, or just add a link as a "suggested approach".

markdroth · 2025-05-02T22:33:35Z

A77-xds-rate-limiting-rlqs.md

+Envoy provides two syntactically equivalent Unified Matcher
+definitions: [`envoy.config.common.matcher.v3.Matcher`](https://github.com/envoyproxy/envoy/blob/e3da7ebb16ad01c2ac7662758a75dba5cdc024ce/api/envoy/config/common/matcher/v3/matcher.proto)
+and [`xds.type.matcher.v3.Matcher`](https://github.com/cncf/xds/blob/b4127c9b8d78b77423fd25169f05b7476b6ea932/xds/type/matcher/v3/matcher.proto).
+We will only support the latter, which is the preferred version for all new APIs


Any reason not to support both? I think there are other places where we'll want to use the same matcher implementation in the future that use the old type, and it seems like we should be able to handle either one.

I don't want to introduce a legacy dependency we have to support forever without a real need for it.

Note that CEL libraries will keep supporting the legacy format, but they're being migrated to the canonical protos, f.e. google/cel-cpp@aff8696. This means canonical protos will be used internally, and legacy CEL operations will incur a performance overhead.

As for java, the initial release of CEL-java only supported canonical CEL in the first place, however now it's possible to convert the legacy protos to the canonical one, but this leads to adding separate dependencies (CEL Java V1alpha1 Utility, CEL Java Protobuf Adapter).

AFAIK Envoy will be upgrading its CEL intermediate lib to the canonical protos too.

Also - we can always add the support for CEL if necessary. But we won't be able to remove it once the genie is out of the bottle.

markdroth · 2025-05-02T22:35:36Z

A77-xds-rate-limiting-rlqs.md

+    1. [`HttpRequestHeaderMatchInput`](https://www.envoyproxy.io/docs/envoy/latest/api-v3/type/matcher/v3/http_inputs.proto#type-matcher-v3-httprequestheadermatchinput)
+    2. [`HttpAttributesCelMatchInput`](https://www.envoyproxy.io/docs/envoy/latest/xds/type/matcher/v3/http_inputs.proto#envoy-v3-api-msg-xds-type-matcher-v3-httpattributescelmatchinput)
+2. Custom Matchers:
+    1. [`CelMatcher`](https://www.envoyproxy.io/docs/envoy/latest/xds/type/matcher/v3/cel.proto.html)


Doesn't this matcher require using HttpAttributesCelMatchInput as the input? If so, and if this is the only matcher we're going to support, then it doesn't really make sense to say that we're going to support HttpRequestHeaderMatchInput, since there's no possible way to use it.

We should probably also support any common matchers that are expected to be used with HttpRequestHeaderMatchInput.

HttpRequestHeaderMatchInput can be used with the built-in valtype.matcher.v3.StringMatcher value_match. I didn't specify it explicitly because it's not an extension, as the line above implies:

In this iteration the following Unified Mather extensions will be supported:

A77-xds-rate-limiting-rlqs.md

markdroth · 2025-05-02T22:40:59Z

A77-xds-rate-limiting-rlqs.md

+For performance reasons, CEL variables should be resolved on demand. CEL Runtime
+provides the different variable resolving approaches based on the language:
+
+* CPP: [`BaseActivation::FindValue()`](https://github.com/google/cel-cpp/blob/9310c4910e598362695930f0e11b7f278f714755/eval/public/base_activation.h#L35)


I don't think we need to list the specific APIs in each language.

Normally I'd agree, but it's surprisingly difficult to find and identify these - just because approaches are really different between languages. IMO it'll save the implementers some time.

Do you think there's a way to keep these, but without making them look as the prescribed implementation? Maybe under a different section?

A77-xds-rate-limiting-rlqs.md

HiramSilvey · 2025-06-03T15:25:49Z

Hi @sergiitk, apologies for posting this on the PR itself but I couldn't find a better way to get in contact. I just wanted to reach out to ask if this proposal will likely be merged or if there is still a chance it may not be accepted. Thanks so much.

sergiitk · 2025-06-04T02:31:44Z

@HiramSilvey I expect it to be merged. High-level design has been approved some time ago.
In the past few weeks we needed to reconsider some details and reach the consensus on the filter state retention per-language implementation nuances. I'm planning to resume working on this PR sometime next week.

sergiitk force-pushed the a77-rlqs branch from 573d885 to b8880c3 Compare February 5, 2024 22:26

sergiitk added 3 commits November 4, 2024 16:22

A77: xDS Server-Side Rate Limiting

1df41a7

Simplified background

2a21984

add some notes from the sync with Mark

b012cfc

sergiitk force-pushed the a77-rlqs branch from e203890 to d6cf7bb Compare November 6, 2024 17:28

Add gRFC content

79c977b

sergiitk force-pushed the a77-rlqs branch from d6cf7bb to 79c977b Compare November 6, 2024 17:32

sergiitk marked this pull request as ready for review November 6, 2024 17:33

sergiitk requested a review from markdroth November 6, 2024 17:33

markdroth reviewed Nov 11, 2024

View reviewed changes

sergiitk added 3 commits November 11, 2024 15:54

Add boxes around components in RLQS Components Flowchart

60260c0

minor: switch rlqs components flowchart theme

0647faa

minor: improve components flowchart link visibility in dark mode

2b9bbc0

atollena mentioned this pull request Nov 12, 2024

gRFC: host rewrite literal zhiyanfoo/grpc-proposal#1

Draft

sergiitk added 2 commits April 21, 2025 15:25

Merge branch 'master' into a77-rlqs

e55b2e0

Address Mark's feedback

6ab4494

sergiitk requested a review from markdroth April 29, 2025 20:10

markdroth reviewed May 2, 2025

View reviewed changes

A77: xDS Server-Side Rate Limiting #414

Are you sure you want to change the base?

A77: xDS Server-Side Rate Limiting #414

Conversation

sergiitk commented Jan 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markdroth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markdroth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergiitk May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergiitk May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergiitk May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HiramSilvey commented Jun 3, 2025

Uh oh!

sergiitk commented Jun 4, 2025

Uh oh!

Uh oh!

sergiitk commented Jan 24, 2024 •

edited

Loading

sergiitk May 7, 2025 •

edited

Loading

sergiitk May 7, 2025 •

edited

Loading

sergiitk May 9, 2025 •

edited

Loading