Skip to content

A77: xDS Server-Side Rate Limiting #414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

sergiitk
Copy link
Member

@sergiitk sergiitk commented Jan 24, 2024

Rate Limit Quota Service (RLQS) gRFC.

Rendered version: https://github.com/sergiitk/grfc-proposal/blob/a77-rlqs/A77-xds-rate-limiting-rlqs.md

@sergiitk sergiitk marked this pull request as ready for review November 6, 2024 17:33
@sergiitk sergiitk requested a review from markdroth November 6, 2024 17:33
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up! Overall, this looks good, but I have few comments around making it clearer to readers.

Please let me know if you have any questions. Thanks!

@sergiitk sergiitk requested a review from markdroth April 29, 2025 20:10
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moving in the right direction!

Please let me know if you have any questions. Thanks!

- RLQS Client: Accessed when we get the first data plane RPC for a given
bucket and when a report timer fires. Notifies RLQS Filter State of
responses received from the RLQS server.
- Report Timers Map: Initialized at instantiation. Used to track discovered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a map? You don't explicitly say what the keys or values are, but from reading the description of the timer-firing event below, I'm guessing that the idea here is that the keys are timer durations and the values are the timers.

This implies that all buckets with the same reporting interval will use the same timer. However, it's unlikely that all buckets with the same reporting interval will have come into existence at the exact same moment, which implies that the first report for one of the buckets is going to wind up happening sooner than its scheduled time. For example, consider the following timeline:

  • t=0: Bucket A is created with reporting interval 5s. Timer is created for 5s.
  • t=3s: Bucket B is created with reporting interval 5s. There is already a timer for the 5s interval, so no new timer is created.
  • t=5s: Timer fires. It sends a load report covering 5s for bucket A but only 2s for bucket B.

Is it okay for the initial report for bucket B to be shorter than configured in this case? If not, then I don't see how we can share timers between the two buckets. And if we actually need a separate timer per bucket, then it seems like we should store the timers in the bucket map instead of having a separate map.

Actually, do we even need multiple timers here to begin with? Could we instead have a single timer that is set to fire when the next load report needs to be sent, and then have it scan through the bucket map to determine the duration at which to schedule the next load report?

Another thing to note here is that scanning through the whole bucket map seems like it could potentially waste a fair amount of CPU, especially if the reporting interval is short and the number of buckets is large. From the description of the timer-firing event below, it seems like you were planning to have the timer do that anyway, but it might be a good idea to consider whether there are better alternatives. For example, we could have each entry in the timer map store a pointer to each of the buckets that use that same reporting interval, so that when the timer fires, it already knows which buckets to include in the load report.


I've mentioned a bunch of possible implementations here, because I want us to think about the possible ways to do this as a way of nailing down what the actual requirements are. However, I want this document to specify those requirements without dictating the implementation.

I think we need to cover things like the following:

  • The RLQS Filter State is responsible for handling load report timers.
  • Implementations can (should?) use a single load report schedule for all buckets that have the same interval, even if that means that the initial report for a given bucket will sometimes be sooner than the configured interval.
  • We need to specify any scalability requirements -- e.g., if we need to support a large number of buckets and very short load reporting intervals, we should say that.

We shouldn't dictate exactly how the implementation works. If you want, we can suggest (in just a sentence or two) a couple of possible implementation options, but implementations should be free to do something else as long as they meet the stated requirements.

Copy link
Member Author

@sergiitk sergiitk May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Yeah, there's some room for improvement here. I'll try to make things clearer in this comment and update gRPF text accordingly.

Is it okay for the initial report for bucket B to be shorter than configured in this case?

Yes, this is acceptable. I've always intended RLQS protocol as "best effort".

There are other special cases where before-the-timer reporting is explicitly allowed, f.e. replacing the assignment.

If the rate limiting strategy is different from bucket's active assignment, or
the current bucket assignment is expired, the data plane must immediately
end the current assignment, report the bucket usage, and apply the new assignment.

It's RLQS server's job to aggregate the requests as needed (CAT cycle, Collect+Aggregate+Throttle), even if they come early.


scanning through the whole bucket map seems like it could potentially waste a fair amount of CPU, especially if the reporting interval is short and the number of buckets is large

This is covered under RLQS Bucket Map: "RLQS Bucket Map allows to retrieve all buckets that need to be reported to the RLQS server at a given reporting interval." I've recommended the "Buckets Per Interval Map" for that. However, I'll rethink how it should be phrased so it's not mandates a specific implementation decision.


Could we instead have a single timer that is set to fire when the next load report needs to be sent, and then have it scan through the bucket map to determine the duration at which to schedule the next load report?

You're right. I've actually considered this exact approach even java. But it's a great example why I should've focused on the requirements rather than implementation details.

bucket, or when instructed by the RLQS Server.
- Entries deleted either by RLQS server via `abandon_action`, or on report
timers if a bucket's active assignment expires.
- Buckets Per Interval Map: Initialized empty at instantiation, thread-safe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above about focusing on requirements instead of implementation details for this kind of low-level thing.

But with regard to this particular implementation choice, if we already have a separate map by report interval to store the timers in, it's not clear to me why we are duplicating that map here. We should not need two separate maps to do the same thing.

Copy link
Member Author

@sergiitk sergiitk May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we already have a separate map by report interval to store the timers in, it's not clear to me why we are duplicating that map here

This is a basic lookup map so that we don't have to scan full Bucket Map every time. As you pointed out, doing this every time is pretty much an unnecessary waste of CPU.

The idea was something like this

def timer_fire(self, interval):
  buckets_to_report: list[Bucket] = self.buckets_by_interval[interval]
  report = Report()
  for bucket in buckets_to_report:
    report.add(bucket.snapshot_stats())

  self.rlqs_client.send_report(report)

However, as I pointed out in the other comment, this is too implementation-specific.

When a report timer fires, the RLQS Filter State retrieves all buckets with the
corresponding reporting interval from the RLQS Bucket Map. For each bucket, the
filter snapshots the current usage counters, and resets them. The filter then
sends the snapshot to RLQS server using RLQS Client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there requirements in terms of when exactly we reschedule the timer, and how that interacts with flow control on the RLQS stream?

Imagine that a bucket is configured for a 5-second load reporting interval, but when the load report timer fires, it takes 3 seconds to send the load report on the RLQS stream due to flow control. Do we schedule the next load report as soon as we start the write on the RLQS stream, in which case it will fire 2 seconds after the previous write finished? Or do we wait for the write to complete before we schedule the next load report, so that the next load report will start 5 seconds later but actually include 7 seconds worth of data?

What if the reporting interval is 5s but it takes 8s for the last report to clear flow control? We probably don't want to queue up writes indefinitely, since that would defeat the purpose of flow control.

We may instead want to structure this as a single load report timer with an ordered queue of all buckets for which load reports need to be sent by a given time. That way, when a write finishes, we can check if any load reports are overdue, in which case we can start them all immediately, and if not, we can schedule the next timer for the time at which the next load reports are due. But again, the document should discuss requirements at this level, not dictate the specific implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point.

that `bucket_id` is represented as a `Map<String, String>`, which may
introduce complexities in efficient cache sharding for certain programming
languages.
2. Incrementing `num_request_allowed`/`num_request_denied` bucket counters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording here is good -- no need to change anything -- but note that the use-case here is very similar to what we do in outlier detection, so implementations can probably use a similar approach to what we do there for avoiding lock contention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I've discussed cpp implementation approaches for counter increments with @yashykt a while back, and we concluded that is doable and can be done in efficient manner. With Java, @ejona86 and I came up with a few efficient options too. I'll check out the outlier detection to see if we can lift any some applicable wording directly from it, or just add a link as a "suggested approach".

Envoy provides two syntactically equivalent Unified Matcher
definitions: [`envoy.config.common.matcher.v3.Matcher`](https://github.com/envoyproxy/envoy/blob/e3da7ebb16ad01c2ac7662758a75dba5cdc024ce/api/envoy/config/common/matcher/v3/matcher.proto)
and [`xds.type.matcher.v3.Matcher`](https://github.com/cncf/xds/blob/b4127c9b8d78b77423fd25169f05b7476b6ea932/xds/type/matcher/v3/matcher.proto).
We will only support the latter, which is the preferred version for all new APIs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to support both? I think there are other places where we'll want to use the same matcher implementation in the future that use the old type, and it seems like we should be able to handle either one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to introduce a legacy dependency we have to support forever without a real need for it.

Note that CEL libraries will keep supporting the legacy format, but they're being migrated to the canonical protos, f.e. google/cel-cpp@aff8696. This means canonical protos will be used internally, and legacy CEL operations will incur a performance overhead.

As for java, the initial release of CEL-java only supported canonical CEL in the first place, however now it's possible to convert the legacy protos to the canonical one, but this leads to adding separate dependencies (CEL Java V1alpha1 Utility, CEL Java Protobuf Adapter).

AFAIK Envoy will be upgrading its CEL intermediate lib to the canonical protos too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - we can always add the support for CEL if necessary. But we won't be able to remove it once the genie is out of the bottle.

1. [`HttpRequestHeaderMatchInput`](https://www.envoyproxy.io/docs/envoy/latest/api-v3/type/matcher/v3/http_inputs.proto#type-matcher-v3-httprequestheadermatchinput)
2. [`HttpAttributesCelMatchInput`](https://www.envoyproxy.io/docs/envoy/latest/xds/type/matcher/v3/http_inputs.proto#envoy-v3-api-msg-xds-type-matcher-v3-httpattributescelmatchinput)
2. Custom Matchers:
1. [`CelMatcher`](https://www.envoyproxy.io/docs/envoy/latest/xds/type/matcher/v3/cel.proto.html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this matcher require using HttpAttributesCelMatchInput as the input? If so, and if this is the only matcher we're going to support, then it doesn't really make sense to say that we're going to support HttpRequestHeaderMatchInput, since there's no possible way to use it.

We should probably also support any common matchers that are expected to be used with HttpRequestHeaderMatchInput.

Copy link
Member Author

@sergiitk sergiitk May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HttpRequestHeaderMatchInput can be used with the built-in valtype.matcher.v3.StringMatcher value_match. I didn't specify it explicitly because it's not an extension, as the line above implies:

In this iteration the following Unified Mather extensions will be supported:

For performance reasons, CEL variables should be resolved on demand. CEL Runtime
provides the different variable resolving approaches based on the language:

* CPP: [`BaseActivation::FindValue()`](https://github.com/google/cel-cpp/blob/9310c4910e598362695930f0e11b7f278f714755/eval/public/base_activation.h#L35)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to list the specific APIs in each language.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I'd agree, but it's surprisingly difficult to find and identify these - just because approaches are really different between languages. IMO it'll save the implementers some time.

Do you think there's a way to keep these, but without making them look as the prescribed implementation? Maybe under a different section?

@HiramSilvey
Copy link

Hi @sergiitk, apologies for posting this on the PR itself but I couldn't find a better way to get in contact. I just wanted to reach out to ask if this proposal will likely be merged or if there is still a chance it may not be accepted. Thanks so much.

@sergiitk
Copy link
Member Author

sergiitk commented Jun 4, 2025

@HiramSilvey I expect it to be merged. High-level design has been approved some time ago.
In the past few weeks we needed to reconsider some details and reach the consensus on the filter state retention per-language implementation nuances. I'm planning to resume working on this PR sometime next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants