|
| 1 | +#### Integration Steps |
| 2 | + |
| 3 | +From the user's perspective, to integrate the Token rate limiting function provided by Sentinel, the following steps are required: |
| 4 | + |
| 5 | +1. Prepare a Redis instance |
| 6 | + |
| 7 | +2. Configure and initialize Sentinel's runtime environment. |
| 8 | + 1. Only initialization from a YAML file is supported |
| 9 | + |
| 10 | +3. Embed points (define resources) with fixed resource type: `ResourceType=ResTypeCommon` and `TrafficType=Inbound` |
| 11 | + |
| 12 | +4. Load rules according to the configuration file below. The rule configuration items include: resource name, rate limiting strategy, specific rule items, Redis configuration, error code, and error message. The following is an example of rule configuration, with specific field meanings detailed in the "Configuration File Description" below. |
| 13 | + |
| 14 | + ```go |
| 15 | + _, err = llmtokenratelimit.LoadRules([]*llmtokenratelimit.Rule{ |
| 16 | + { |
| 17 | + |
| 18 | + Resource: ".*", |
| 19 | + Strategy: llmtokenratelimit.FixedWindow, |
| 20 | + SpecificItems: []llmtokenratelimit.SpecificItem{ |
| 21 | + { |
| 22 | + Identifier: llmtokenratelimit.Identifier{ |
| 23 | + Type: llmtokenratelimit.Header, |
| 24 | + Value: ".*", |
| 25 | + }, |
| 26 | + KeyItems: []llmtokenratelimit.KeyItem{ |
| 27 | + { |
| 28 | + Key: ".*", |
| 29 | + Token: llmtokenratelimit.Token{ |
| 30 | + Number: 1000, |
| 31 | + CountStrategy: llmtokenratelimit.TotalTokens, |
| 32 | + }, |
| 33 | + Time: llmtokenratelimit.Time{ |
| 34 | + Unit: llmtokenratelimit.Second, |
| 35 | + Value: 60, |
| 36 | + }, |
| 37 | + }, |
| 38 | + }, |
| 39 | + }, |
| 40 | + }, |
| 41 | + }, |
| 42 | + }) |
| 43 | + ``` |
| 44 | + |
| 45 | +5. Optional: Create an LLM instance and embed it into the provided adapter |
| 46 | + |
| 47 | + |
| 48 | +#### Configuration Description |
| 49 | + |
| 50 | +##### Configuration File |
| 51 | + |
| 52 | +| Configuration Item | Type | Required | Default Value | Description | |
| 53 | +| :----------------- | :------- | :------- | :--------------------------------- | :----------------------------------------------------------- | |
| 54 | +| enabled | bool | No | false | Whether to enable the LLM Token Rate Limiting feature. Values: false (disable), true (enable) | |
| 55 | +| redis | object | No | | Redis instance connection information | |
| 56 | +| errorCode | int | No | 429 | Error code. If set to 0, it will be modified to 429 automatically | |
| 57 | +| errorMessage | string | No | "Too Many Requests" | Error message | |
| 58 | + |
| 59 | +Redis Configuration |
| 60 | + |
| 61 | +| Configuration Item | Type | Required | Default Value | Description | |
| 62 | +| :----------------- | :------------------- | :------- | :--------------------------------------- | :----------------------------------------------------------- | |
| 63 | +| addrs | array of addr object | No | [{name: "127.0.0.1", port: 6379}] | Redis node service. **See Notes for details** | |
| 64 | +| username | string | No | Empty string | Redis username | |
| 65 | +| password | string | No | Empty string | Redis password | |
| 66 | +| dialTimeout | int | No | 0 | Maximum waiting time for establishing a Redis connection, unit: milliseconds | |
| 67 | +| readTimeout | int | No | 0 | Maximum waiting time for responses from the Redis server, unit: milliseconds | |
| 68 | +| writeTimeout | int | No | 0 | Maximum time for sending command data to the network connection, unit: milliseconds | |
| 69 | +| poolTimeout | int | No | 0 | Maximum waiting time to obtain an idle connection from the connection pool, unit: milliseconds | |
| 70 | +| poolSize | int | No | 10 | Number of connections in the connection pool | |
| 71 | +| minIdleConns | int | No | 5 | Minimum number of idle connections in the connection pool | |
| 72 | +| maxRetries | int | No | 3 | Maximum number of retries when an operation fails | |
| 73 | + |
| 74 | +Addr Configuration |
| 75 | + |
| 76 | +| Configuration Item | Type | Required | Default Value | Description | |
| 77 | +| :----------------- | :----- | :------- | :----------------- | :----------------------------------------------------------- | |
| 78 | +| name | string | No | "127.0.0.1" | Redis node service name. A complete [FQDN](https://en.wikipedia.org/wiki/Fully_qualified_domain_name) with service type, e.g., my-redis.dns, redis.my-ns.svc.cluster.local | |
| 79 | +| port | int | No | 6379 | Redis node service port | |
| 80 | + |
| 81 | +##### Rule Configuration |
| 82 | + |
| 83 | +**Feature: Supports dynamic loading via LoadRules** |
| 84 | + |
| 85 | +| Configuration Item | Type | Required | Default Value | Description | |
| 86 | +| :----------------- | :--------------------------- | :------- | :-------------------- | :----------------------------------------------------------- | |
| 87 | +| resource | string | No | ".*" | Rule resource name, supports regular expressions. Values: ".*" (global match), user-defined regular expressions | |
| 88 | +| strategy | string | No | "fixed-window" | Rate limiting strategy. Values: fixed-window, peta (Prediction Error Temporal Allocation) | |
| 89 | +| encoding | object | No | | Token encoding method. **Exclusive to PETA rate limiting strategy** | |
| 90 | +| specificItems | array of specificItem object | Yes | | Specific rule items | |
| 91 | + |
| 92 | +encoding configuration |
| 93 | + |
| 94 | +| Configuration Item | Type | Required | Default Value | Description | |
| 95 | +| :----------------- | :----- | :------- | :------------ | :--------------- | |
| 96 | +| provider | string | No | "openai" | Model provider | |
| 97 | +| model | string | No | "gpt-4" | Model name | |
| 98 | + |
| 99 | +specificItem configuration |
| 100 | + |
| 101 | +| Configuration Item | Type | Required | Default Value | Description | |
| 102 | +| :----------------- | :---------------------- | :------- | :------------ | :----------------------------------------------- | |
| 103 | +| identifier | object | No | | Request identifier | |
| 104 | +| keyItems | array of keyItem object | Yes | | Key-value information for rule matching | |
| 105 | + |
| 106 | +identifier configuration |
| 107 | + |
| 108 | +| Configuration Item | Type | Required | Default Value | Description | |
| 109 | +| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- | |
| 110 | +| type | string | No | "all" | Request identifier type. Values: all (global rate limiting), header | |
| 111 | +| value | string | No | ".*" | Request identifier value, supports regular expressions. Values: ".*" (global match), user-defined regular expressions | |
| 112 | + |
| 113 | +keyItem configuration |
| 114 | + |
| 115 | +| Configuration Item | Type | Required | Default Value | Description | |
| 116 | +| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- | |
| 117 | +| key | string | No | ".*" | Specific rule item value, supports regular expressions. Values: ".*" (global match), user-defined regular expressions | |
| 118 | +| token | object | Yes | | Token quantity and calculation strategy configuration | |
| 119 | +| time | object | Yes | | Time unit and cycle configuration | |
| 120 | + |
| 121 | +token configuration |
| 122 | + |
| 123 | +| Configuration Item | Type | Required | Default Value | Description | |
| 124 | +| :----------------- | :----- | :------- | :------------------- | :----------------------------------------------------------- | |
| 125 | +| number | int | Yes | | Token quantity, ≥ 0 | |
| 126 | +| countStrategy | string | No | "total-tokens" | Token calculation strategy. Values: input-tokens, output-tokens, total-tokens | |
| 127 | + |
| 128 | +time configuration |
| 129 | + |
| 130 | +| Configuration Item | Type | Required | Default Value | Description | |
| 131 | +| :----------------- | :----- | :------- | :------------ | :----------------------------------------------------------- | |
| 132 | +| unit | string | Yes | | Time unit. Values: second, minute, hour, day | |
| 133 | +| value | int | Yes | | Time value, ≥ 0 | |
| 134 | + |
| 135 | +#### Configuration File Example |
| 136 | + |
| 137 | +```YAML |
| 138 | +version: "v1" |
| 139 | +sentinel: |
| 140 | + app: |
| 141 | + name: sentinel-go-demo |
| 142 | + log: |
| 143 | + metric: |
| 144 | + maxFileCount: 7 |
| 145 | + llmTokenRatelimit: |
| 146 | + enabled: true |
| 147 | + |
| 148 | + errorCode: 429 |
| 149 | + errorMessage: "Too Many Requests" |
| 150 | + |
| 151 | + redis: |
| 152 | + addrs: |
| 153 | + - name: "127.0.0.1" |
| 154 | + port: 6379 |
| 155 | + username: "redis" |
| 156 | + password: "redis" |
| 157 | + dialTimeout: 5000 |
| 158 | + readTimeout: 5000 |
| 159 | + writeTimeout: 5000 |
| 160 | + poolTimeout: 5000 |
| 161 | + poolSize: 10 |
| 162 | + minIdleConns: 5 |
| 163 | + maxRetries: 3 |
| 164 | +``` |
| 165 | +
|
| 166 | +#### LLM Framework Adaptation |
| 167 | +Currently, non-intrusive integration of Sentinel's Token Rate Limiting capability is supported for the Langchaingo and Eino frameworks, mainly for text generation. For usage methods, please refer to: |
| 168 | +- pkg/adapters/langchaingo/wrapper.go |
| 169 | +- pkg/adapters/eino/wrapper.go |
| 170 | +
|
| 171 | +#### Notes |
| 172 | +
|
| 173 | +- Since only input tokens can be predicted currently, **it is recommended to use PETA for rate limiting input tokens** |
| 174 | +- PETA uses tiktoken-go to estimate the number of input tokens consumed, but it is necessary to download or preconfigure the `Byte Pair Encoding (BPE)` dictionary: |
| 175 | + - Online Mode |
| 176 | + - When used for the first time, tiktoken-go needs to download the encoding file via the internet |
| 177 | + - Offline Mode |
| 178 | + - Prepare the pre-cached tiktoken-go encoding files (**not directly downloaded files, but files processed by tiktoken-go**) in advance, and specify the file directory by configuring the TIKTOKEN_CACHE_DIR environment variable |
| 179 | +- Rule Deduplication Description |
| 180 | + - In keyItems, if only the "number" differs, duplicates will be removed and the latest "number" will be retained |
| 181 | + - In specificItems, only deduplicated keyItems will be retained |
| 182 | + - In resource, only the latest resource will be retained |
| 183 | +- Redis Configuration Description |
| 184 | + - **If the connected Redis is in cluster mode, the number of addresses in "addrs" must be ≥ 2; otherwise, it will default to Redis single-node mode, causing rate limiting to fail** |
0 commit comments