Skip to content

Commit 8cd4cfa

Browse files
authored
feat: add LLM token rate-limiting feature (#602)
* feat: add basic llm token rate limit * feat: optimize the basic token update method with cache * feat: filter rules and optimize redis's key * refactor: wrapper global object * fix: fix dependency error * feat: add predictive error temporal amortized throttling * fix: fix correct error * fix: fix duplicated token error * examples: add new rate limiting application examples * fix: add maximum binary search iterations * refactor: simplify the usage steps of rate limiting * test: fix test case errors * refactor: update token encoding ways * fix: fix the issue of excessive goroutines * refactor: optimize the calculation method of token prediction * refactor: optimize the code structure and introduce log auditing * deps: update Redis to v8 * feat: support Redis configuration with multiple addresses * fix: fix abnormal issues in token correction * fix: fix issues with data dependency in token prediction * refactor: delete the unused functions * fix: fix token prediction accuracy and response header issues * fix: fix missing response header issue in fixed window strategy * feat: adapt to eino framework; fix initialization issues * fix: fix go.yml test case path error * feat: optimize response header information * fix: fix go.yml error * remove: remove dependency test cases * feat: remove binding hit mechanism between input token and total token * test: add identifier_checker and rule_collector unit test cases * fix: fix lint error * test: add context, request_info, util unit test cases * fix: fix lint error * test: add rule_filter unit test cases * test: add resource benchmark test example * feat: support multi-architecture Redis service * feat: add metric logger * test: add all unit test cases * docs: add llm token rate limit integration steps * docs: add llm token rate limit adapter usage * docs: update llm token rate limit usage * style: fix spelling errors * refactor: remove config struct fields for loose coupling and update documentation * feat: adapt llm_token_ratelimit component to datasource module
1 parent f7a7f95 commit 8cd4cfa

File tree

104 files changed

+22132
-42
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

104 files changed

+22132
-42
lines changed

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,9 @@ Temporary Items
6464

6565
# coverage file
6666
coverage.html
67+
coverage.txt
68+
69+
pkg/adapters/eino/*_test.go
70+
pkg/adapters/langchaingo/*_test.go
71+
72+
.env

api/api.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,9 @@ func Entry(resource string, opts ...EntryOption) (*base.SentinelEntry, *base.Blo
138138
}()
139139

140140
for _, opt := range opts {
141-
opt(options)
141+
if opt != nil {
142+
opt(options)
143+
}
142144
}
143145
if options.slotChain == nil {
144146
options.slotChain = GlobalSlotChain()

api/init.go

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ import (
2020
"net/http"
2121

2222
"github.com/alibaba/sentinel-golang/core/config"
23+
llmtokenratelimit "github.com/alibaba/sentinel-golang/core/llm_token_ratelimit"
2324
"github.com/alibaba/sentinel-golang/core/log/metric"
2425
"github.com/alibaba/sentinel-golang/core/system_metric"
2526
metric_exporter "github.com/alibaba/sentinel-golang/exporter/metric"
@@ -134,6 +135,21 @@ func initCoreComponents() error {
134135
return nil
135136
}
136137

138+
if err := llmtokenratelimit.InitMetricLogger(&llmtokenratelimit.MetricLoggerConfig{
139+
AppName: config.AppName(),
140+
LogDir: config.LogBaseDir(),
141+
MaxFileSize: config.MetricLogSingleFileMaxSize(),
142+
MaxFileAmount: config.MetricLogMaxFileAmount(),
143+
FlushInterval: config.MetricLogFlushIntervalSec(),
144+
UsePid: config.LogUsePid(),
145+
}); err != nil {
146+
return err
147+
}
148+
149+
if err := llmtokenratelimit.Init(config.LLMTokenRateLimit()); err != nil {
150+
return err
151+
}
152+
137153
return nil
138154
}
139155

api/slot_chain.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ import (
2020
"github.com/alibaba/sentinel-golang/core/flow"
2121
"github.com/alibaba/sentinel-golang/core/hotspot"
2222
"github.com/alibaba/sentinel-golang/core/isolation"
23+
llmtokenratelimit "github.com/alibaba/sentinel-golang/core/llm_token_ratelimit"
2324
"github.com/alibaba/sentinel-golang/core/log"
2425
"github.com/alibaba/sentinel-golang/core/stat"
2526
"github.com/alibaba/sentinel-golang/core/system"
@@ -37,13 +38,15 @@ func BuildDefaultSlotChain() *base.SlotChain {
3738

3839
sc.AddRuleCheckSlot(system.DefaultAdaptiveSlot)
3940
sc.AddRuleCheckSlot(flow.DefaultSlot)
41+
sc.AddRuleCheckSlot(llmtokenratelimit.DefaultSlot)
4042
sc.AddRuleCheckSlot(isolation.DefaultSlot)
4143
sc.AddRuleCheckSlot(hotspot.DefaultSlot)
4244
sc.AddRuleCheckSlot(circuitbreaker.DefaultSlot)
4345

4446
sc.AddStatSlot(stat.DefaultSlot)
4547
sc.AddStatSlot(log.DefaultSlot)
4648
sc.AddStatSlot(flow.DefaultStandaloneStatSlot)
49+
sc.AddStatSlot(llmtokenratelimit.DefaultLLMTokenRatelimitStatSlot)
4750
sc.AddStatSlot(hotspot.DefaultConcurrencyStatSlot)
4851
sc.AddStatSlot(circuitbreaker.DefaultMetricStatSlot)
4952
return sc

core/base/result.go

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,16 +28,18 @@ const (
2828
BlockTypeCircuitBreaking
2929
BlockTypeSystemFlow
3030
BlockTypeHotSpotParamFlow
31+
BlockTypeLLMTokenRateLimit
3132
)
3233

3334
var (
3435
blockTypeMap = map[BlockType]string{
35-
BlockTypeUnknown: "BlockTypeUnknown",
36-
BlockTypeFlow: "BlockTypeFlowControl",
37-
BlockTypeIsolation: "BlockTypeIsolation",
38-
BlockTypeCircuitBreaking: "BlockTypeCircuitBreaking",
39-
BlockTypeSystemFlow: "BlockTypeSystem",
40-
BlockTypeHotSpotParamFlow: "BlockTypeHotSpotParamFlow",
36+
BlockTypeUnknown: "BlockTypeUnknown",
37+
BlockTypeFlow: "BlockTypeFlowControl",
38+
BlockTypeIsolation: "BlockTypeIsolation",
39+
BlockTypeCircuitBreaking: "BlockTypeCircuitBreaking",
40+
BlockTypeSystemFlow: "BlockTypeSystem",
41+
BlockTypeHotSpotParamFlow: "BlockTypeHotSpotParamFlow",
42+
BlockTypeLLMTokenRateLimit: "BlockTypeLLMTokenRateLimit",
4143
}
4244
blockTypeExisted = fmt.Errorf("block type existed")
4345
)

core/base/result_test.go

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -70,19 +70,22 @@ func (t BlockType) stringSwitch() string {
7070
return "System"
7171
case BlockTypeHotSpotParamFlow:
7272
return "HotSpotParamFlow"
73+
case BlockTypeLLMTokenRateLimit:
74+
return "LLMTokenRateLimit"
7375
default:
7476
return fmt.Sprintf("%d", t)
7577
}
7678
}
7779

7880
var (
7981
blockTypeNames = []string{
80-
BlockTypeUnknown: "Unknown",
81-
BlockTypeFlow: "FlowControl",
82-
BlockTypeIsolation: "BlockTypeIsolation",
83-
BlockTypeCircuitBreaking: "CircuitBreaking",
84-
BlockTypeSystemFlow: "System",
85-
BlockTypeHotSpotParamFlow: "HotSpotParamFlow",
82+
BlockTypeUnknown: "Unknown",
83+
BlockTypeFlow: "FlowControl",
84+
BlockTypeIsolation: "BlockTypeIsolation",
85+
BlockTypeCircuitBreaking: "CircuitBreaking",
86+
BlockTypeSystemFlow: "System",
87+
BlockTypeHotSpotParamFlow: "HotSpotParamFlow",
88+
BlockTypeLLMTokenRateLimit: "LLMTokenRateLimit",
8689
}
8790
blockTypeErr = fmt.Errorf("block type err")
8891
)

core/config/config.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ import (
2121
"strconv"
2222
"sync"
2323

24+
llmtokenratelimit "github.com/alibaba/sentinel-golang/core/llm_token_ratelimit"
2425
"github.com/alibaba/sentinel-golang/logging"
2526
"github.com/alibaba/sentinel-golang/util"
2627
"github.com/pkg/errors"
@@ -262,3 +263,7 @@ func MetricStatisticIntervalMs() uint32 {
262263
func MetricStatisticSampleCount() uint32 {
263264
return globalCfg.MetricStatisticSampleCount()
264265
}
266+
267+
func LLMTokenRateLimit() *llmtokenratelimit.Config {
268+
return globalCfg.LLMTokenRateLimit()
269+
}

core/config/entity.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ import (
1919
"fmt"
2020

2121
"github.com/alibaba/sentinel-golang/core/base"
22+
llmtokenratelimit "github.com/alibaba/sentinel-golang/core/llm_token_ratelimit"
2223
"github.com/alibaba/sentinel-golang/logging"
2324
"github.com/pkg/errors"
2425
)
@@ -46,6 +47,8 @@ type SentinelConfig struct {
4647
Stat StatConfig
4748
// UseCacheTime indicates whether to cache time(ms)
4849
UseCacheTime bool `yaml:"useCacheTime"`
50+
// LLMTokenRateLimit represents configuration items related to llm token rate limit.
51+
LLMTokenRateLimit *llmtokenratelimit.Config `yaml:"llmTokenRatelimit"`
4952
}
5053

5154
// ExporterConfig represents configuration items related to exporter, like metric exporter.
@@ -259,3 +262,7 @@ func (entity *Entity) MetricStatisticIntervalMs() uint32 {
259262
func (entity *Entity) MetricStatisticSampleCount() uint32 {
260263
return entity.Sentinel.Stat.MetricStatisticSampleCount
261264
}
265+
266+
func (entity *Entity) LLMTokenRateLimit() *llmtokenratelimit.Config {
267+
return entity.Sentinel.LLMTokenRateLimit
268+
}
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
#### Integration Steps
2+
3+
From the user's perspective, to integrate the Token rate limiting function provided by Sentinel, the following steps are required:
4+
5+
1. Prepare a Redis instance
6+
7+
2. Configure and initialize Sentinel's runtime environment.
8+
1. Only initialization from a YAML file is supported
9+
10+
3. Embed points (define resources) with fixed resource type: `ResourceType=ResTypeCommon` and `TrafficType=Inbound`
11+
12+
4. Load rules according to the configuration file below. The rule configuration items include: resource name, rate limiting strategy, specific rule items, Redis configuration, error code, and error message. The following is an example of rule configuration, with specific field meanings detailed in the "Configuration File Description" below.
13+
14+
```go
15+
_, err = llmtokenratelimit.LoadRules([]*llmtokenratelimit.Rule{
16+
{
17+
18+
Resource: ".*",
19+
Strategy: llmtokenratelimit.FixedWindow,
20+
SpecificItems: []llmtokenratelimit.SpecificItem{
21+
{
22+
Identifier: llmtokenratelimit.Identifier{
23+
Type: llmtokenratelimit.Header,
24+
Value: ".*",
25+
},
26+
KeyItems: []llmtokenratelimit.KeyItem{
27+
{
28+
Key: ".*",
29+
Token: llmtokenratelimit.Token{
30+
Number: 1000,
31+
CountStrategy: llmtokenratelimit.TotalTokens,
32+
},
33+
Time: llmtokenratelimit.Time{
34+
Unit: llmtokenratelimit.Second,
35+
Value: 60,
36+
},
37+
},
38+
},
39+
},
40+
},
41+
},
42+
})
43+
```
44+
45+
5. Optional: Create an LLM instance and embed it into the provided adapter
46+
47+
48+
#### Configuration Description
49+
50+
##### Configuration File
51+
52+
| Configuration Item | Type | Required | Default Value | Description |
53+
| :----------------- | :------- | :------- | :--------------------------------- | :----------------------------------------------------------- |
54+
| enabled | bool | No | false | Whether to enable the LLM Token Rate Limiting feature. Values: false (disable), true (enable) |
55+
| redis | object | No | | Redis instance connection information |
56+
| errorCode | int | No | 429 | Error code. If set to 0, it will be modified to 429 automatically |
57+
| errorMessage | string | No | "Too Many Requests" | Error message |
58+
59+
Redis Configuration
60+
61+
| Configuration Item | Type | Required | Default Value | Description |
62+
| :----------------- | :------------------- | :------- | :--------------------------------------- | :----------------------------------------------------------- |
63+
| addrs | array of addr object | No | [{name: "127.0.0.1", port: 6379}] | Redis node service. **See Notes for details** |
64+
| username | string | No | Empty string | Redis username |
65+
| password | string | No | Empty string | Redis password |
66+
| dialTimeout | int | No | 0 | Maximum waiting time for establishing a Redis connection, unit: milliseconds |
67+
| readTimeout | int | No | 0 | Maximum waiting time for responses from the Redis server, unit: milliseconds |
68+
| writeTimeout | int | No | 0 | Maximum time for sending command data to the network connection, unit: milliseconds |
69+
| poolTimeout | int | No | 0 | Maximum waiting time to obtain an idle connection from the connection pool, unit: milliseconds |
70+
| poolSize | int | No | 10 | Number of connections in the connection pool |
71+
| minIdleConns | int | No | 5 | Minimum number of idle connections in the connection pool |
72+
| maxRetries | int | No | 3 | Maximum number of retries when an operation fails |
73+
74+
Addr Configuration
75+
76+
| Configuration Item | Type | Required | Default Value | Description |
77+
| :----------------- | :----- | :------- | :----------------- | :----------------------------------------------------------- |
78+
| name | string | No | "127.0.0.1" | Redis node service name. A complete [FQDN](https://en.wikipedia.org/wiki/Fully_qualified_domain_name) with service type, e.g., my-redis.dns, redis.my-ns.svc.cluster.local |
79+
| port | int | No | 6379 | Redis node service port |
80+
81+
##### Rule Configuration
82+
83+
**Feature: Supports dynamic loading via LoadRules**
84+
85+
| Configuration Item | Type | Required | Default Value | Description |
86+
| :----------------- | :--------------------------- | :------- | :-------------------- | :----------------------------------------------------------- |
87+
| resource | string | No | ".*" | Rule resource name, supports regular expressions. Values: ".*" (global match), user-defined regular expressions |
88+
| strategy | string | No | "fixed-window" | Rate limiting strategy. Values: fixed-window, peta (Prediction Error Temporal Allocation) |
89+
| encoding | object | No | | Token encoding method. **Exclusive to PETA rate limiting strategy** |
90+
| specificItems | array of specificItem object | Yes | | Specific rule items |
91+
92+
encoding configuration
93+
94+
| Configuration Item | Type | Required | Default Value | Description |
95+
| :----------------- | :----- | :------- | :------------ | :--------------- |
96+
| provider | string | No | "openai" | Model provider |
97+
| model | string | No | "gpt-4" | Model name |
98+
99+
specificItem configuration
100+
101+
| Configuration Item | Type | Required | Default Value | Description |
102+
| :----------------- | :---------------------- | :------- | :------------ | :----------------------------------------------- |
103+
| identifier | object | No | | Request identifier |
104+
| keyItems | array of keyItem object | Yes | | Key-value information for rule matching |
105+
106+
identifier configuration
107+
108+
| Configuration Item | Type | Required | Default Value | Description |
109+
| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- |
110+
| type | string | No | "all" | Request identifier type. Values: all (global rate limiting), header |
111+
| value | string | No | ".*" | Request identifier value, supports regular expressions. Values: ".*" (global match), user-defined regular expressions |
112+
113+
keyItem configuration
114+
115+
| Configuration Item | Type | Required | Default Value | Description |
116+
| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- |
117+
| key | string | No | ".*" | Specific rule item value, supports regular expressions. Values: ".*" (global match), user-defined regular expressions |
118+
| token | object | Yes | | Token quantity and calculation strategy configuration |
119+
| time | object | Yes | | Time unit and cycle configuration |
120+
121+
token configuration
122+
123+
| Configuration Item | Type | Required | Default Value | Description |
124+
| :----------------- | :----- | :------- | :------------------- | :----------------------------------------------------------- |
125+
| number | int | Yes | | Token quantity, ≥ 0 |
126+
| countStrategy | string | No | "total-tokens" | Token calculation strategy. Values: input-tokens, output-tokens, total-tokens |
127+
128+
time configuration
129+
130+
| Configuration Item | Type | Required | Default Value | Description |
131+
| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- |
132+
| unit | string | Yes | | Time unit. Values: second, minute, hour, day |
133+
| value | int | Yes | | Time value, ≥ 0 |
134+
135+
#### Configuration File Example
136+
137+
```YAML
138+
version: "v1"
139+
sentinel:
140+
app:
141+
name: sentinel-go-demo
142+
log:
143+
metric:
144+
maxFileCount: 7
145+
llmTokenRatelimit:
146+
enabled: true
147+
148+
errorCode: 429
149+
errorMessage: "Too Many Requests"
150+
151+
redis:
152+
addrs:
153+
- name: "127.0.0.1"
154+
port: 6379
155+
username: "redis"
156+
password: "redis"
157+
dialTimeout: 5000
158+
readTimeout: 5000
159+
writeTimeout: 5000
160+
poolTimeout: 5000
161+
poolSize: 10
162+
minIdleConns: 5
163+
maxRetries: 3
164+
```
165+
166+
#### LLM Framework Adaptation
167+
Currently, non-intrusive integration of Sentinel's Token Rate Limiting capability is supported for the Langchaingo and Eino frameworks, mainly for text generation. For usage methods, please refer to:
168+
- pkg/adapters/langchaingo/wrapper.go
169+
- pkg/adapters/eino/wrapper.go
170+
171+
#### Notes
172+
173+
- Since only input tokens can be predicted currently, **it is recommended to use PETA for rate limiting input tokens**
174+
- PETA uses tiktoken-go to estimate the number of input tokens consumed, but it is necessary to download or preconfigure the `Byte Pair Encoding (BPE)` dictionary:
175+
- Online Mode
176+
- When used for the first time, tiktoken-go needs to download the encoding file via the internet
177+
- Offline Mode
178+
- Prepare the pre-cached tiktoken-go encoding files (**not directly downloaded files, but files processed by tiktoken-go**) in advance, and specify the file directory by configuring the TIKTOKEN_CACHE_DIR environment variable
179+
- Rule Deduplication Description
180+
- In keyItems, if only the "number" differs, duplicates will be removed and the latest "number" will be retained
181+
- In specificItems, only deduplicated keyItems will be retained
182+
- In resource, only the latest resource will be retained
183+
- Redis Configuration Description
184+
- **If the connected Redis is in cluster mode, the number of addresses in "addrs" must be ≥ 2; otherwise, it will default to Redis single-node mode, causing rate limiting to fail**

0 commit comments

Comments
 (0)