Skip to content

Commit cd109fd

Browse files
Pluggable BBR proposal
1 parent db2b7ce commit cd109fd

File tree

4 files changed

+463
-0
lines changed

4 files changed

+463
-0
lines changed
Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
# Pluggable Body-Based Routing (BBR) Framework
2+
3+
Author(s): @davidbreitgand @srampal
4+
5+
## Proposal Status
6+
7+
***Draft***
8+
9+
## Summary
10+
11+
The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances.
12+
13+
The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base.
14+
15+
This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution.
16+
17+
See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference.
18+
19+
## Goals
20+
21+
The pluggable BBR Framework aims at addressing the following goals
22+
23+
- Avoid monolithic architecture
24+
- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two
25+
- Enable organizing plugins into a topology for ordered and concurrent execution
26+
- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance
27+
- Limit changes to the BBR feature to avoid any changes in the rest of the code base
28+
- Follow best practices and experience from the Scheduling subsystem
29+
pluggability effort. For example, extending the system to support the above
30+
should be through implementing well defined `Plugin` interfaces and registering
31+
them in the BBR subsystem; any configuration would be done in the
32+
same way (e.g., code and/or configuration file)
33+
- Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse
34+
- Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system
35+
- Provide reference plugin implementations.
36+
37+
## Non-Goals
38+
39+
- Modify existing GIE abstractions
40+
- Fully align plugins, registries, and factories across BBR and EPP
41+
- Dynamically reconfigure plugins and plugin topologies at runtime
42+
43+
## Proposal
44+
45+
### Overview
46+
47+
There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.
48+
In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency.
49+
50+
`PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically.
51+
52+
Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin.
53+
54+
### Suggested Components
55+
56+
The sketch of the proposed framework is shown in the figure below.
57+
<img src="./images/pluggable-framework-architecture-sketch.png" alt="Components of the proposed framework" width="1000" />
58+
59+
### Suggested BBR Pluggable Framework Interfaces
60+
61+
```go
62+
// ------------------------------------ Defaults ------------------------------------------
63+
const DefaultPluginType = "MetadataExtractor"
64+
const DefaultPluginImplementation = "simple-model-selector"
65+
66+
// BBRPlugin defines the interface for plugins in the BBR framework
67+
type BBRPlugin interface {
68+
plugins.Plugin
69+
70+
// RequiresFullParsing indicates whether full body parsing is required
71+
// to facilitate efficient memory sharing across plugins in a chain.
72+
RequiresFullParsing() bool
73+
74+
// Execute runs the plugin logic on the request body.
75+
// A plugin's imnplementation logic CAN mutate the body of the message.
76+
// A plugin's implementation MUST return a map of headers
77+
// If no headers are set by the implementation, the map must be empty
78+
// A value of a header in an extended implementation NEED NOT to be identical to the value of that same header as would be set
79+
// in a default implementation.
80+
// Example: in the body of a request model is set to "semantic-model-selector",
81+
// which, say, stands for "select a best model for this request at minimal cost"
82+
// A plugin implementation of "semantic-model-selector" sets X-Gateway-Model-Name to any valid
83+
// model name from the inventory of the backend models and also mutates the body accordingly
84+
// In contrast,
85+
Execute(requestBodyBytes []byte) (
86+
headers map[string]string,
87+
mutatedBodyBytes []byte,
88+
err error,
89+
)
90+
}
91+
92+
93+
// placeholder for BBRPlugin constructors
94+
type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type
95+
96+
// PluginRegistry defines operations for managing plugin factories and plugin instances
97+
type PluginRegistry interface {
98+
RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors
99+
RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first)
100+
GetFactory(typeKey string) (PluginFactoryFunc, error)
101+
GetPlugin(typeKey string) (bbrplugins.BBRPlugin, error)
102+
GetFactories() map[string]PluginFactoryFunc
103+
GetPlugins() map[string]bbrplugins.BBRPlugin
104+
ListPlugins() []string
105+
ListFactories() []string
106+
CreatePlugin(typeKey string) (bbrplugins.BBRPlugin, error)
107+
ContainsFactory(typeKey string) bool
108+
ContainsPlugin(typeKey string) bool
109+
String() string //human readable string for logging
110+
}
111+
112+
// PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry
113+
// The BBRPlugin instances
114+
type PluginsChain interface {
115+
AddPlugin(typeKey string, registry PluginRegistry) error //to be added to the chain the plugin should be registered in the registry first
116+
AddPluginAtInd(typeKey string, i int, r PluginRegistry) error //only affects the instance of the plugin chain
117+
GetPlugin(index int, registry PluginRegistry) (bbrplugins.BBRPlugin, error) //retrieves i-th plugin as defined in the chain from the registry
118+
Length() int
119+
ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct
120+
ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise
121+
GetSharedMemory(which string) interface{} //returns an appropriate shared open-ai struct dependent on whether which
122+
//corresponds to Completion or ChatCompletion endpoint requested in the body
123+
Run(bodyBytes []byte, registry PluginRegistry) ([]byte, map[string]string, error) //return potentially mutated body and all headers map safely merged
124+
String() string
125+
}
126+
//NOTE: for simplicity, in the initial PR, PluginsChain instance will be defined request only
127+
```
128+
129+
### Defaults
130+
131+
```go
132+
133+
const (
134+
//A deafult plugin implementation of this plugin type will always be configured for request plugins chain
135+
//Even though BBRPlugin type is not (yet) a K8s resource, it's logically akin to `kind`
136+
//MUST start wit an upper case letter, use CamelNotation, only aplhanumericals after the first letter
137+
PluginTypePattern = `^[A-Z][A-Za-z0-9]*$`
138+
MaxPluginTypeLength = 63
139+
DefaultPluginType = "MetaDataExtractor"
140+
// Even though BBRPlugin is not a K8s resource yet, let's make its naming compliant with K8s resource naming
141+
// Allows: lowercase letters, digits, hyphens, dots.
142+
// Must start and end with a lowercase alphanumeric character.
143+
// Middle characters group can contain lowercase alphanumerics, hyphens, and dots
144+
// Middle and rightmost groups are optional
145+
PluginNamePattern = `^[a-z0-9]([-a-z0-9.]*[a-z0-9])?$`
146+
DefaultPluginName = "simple-model-extractor"
147+
MaxPluginNameLength = 253
148+
//Well-known custom header set to a model name
149+
ModelHeader = "X-Gateway-Model-Name"
150+
)
151+
```
152+
153+
### Current BBR reimplementation as BBRPlugin
154+
155+
```go
156+
/ ------------------------------------ DEFAULT PLUGIN IMPLEMENTATION ----------------------------------------------
157+
158+
type simpleModelExtractor struct { //implements the MetadataExtractor interface
159+
typedName plugins.TypedName
160+
requiresFullParsing bool
161+
}
162+
163+
// defaultMetaDataExtractor implements the MetadataExtractor interface and extracts only the mmodel name AS-IS
164+
type defaultMetaDataExtractor struct {
165+
typedName plugins.TypedName
166+
requiresFullParsing bool //this field will be used to determine whether shared struct should be created in this chain
167+
}
168+
169+
// NewSimpleModelExtractor is a factory that constructs SimpleModelExtractor plugin
170+
// A developer who wishes to create her own implementation, will implement the BBRPlugin interface and
171+
// use Registry and PluginsChain to register and execute the plugin (together with other plugins in a chain)
172+
func NewDefaultMetaDataExtractor() BBRPlugin {
173+
return &defaultMetaDataExtractor{
174+
typedName: plugins.TypedName{
175+
Type: DefaultPluginType,
176+
Name: "simple-model-extractor",
177+
},
178+
requiresFullParsing: false,
179+
}
180+
}
181+
182+
func (s *defaultMetaDataExtractor) RequiresFullParsing() bool {
183+
return s.requiresFullParsing
184+
}
185+
186+
func (s *defaultMetaDataExtractor) TypedName() plugins.TypedName {
187+
return s.typedName
188+
}
189+
190+
// Execute extracts the "model" from the JSON request body and sets X-Gateway-Model-Name header.
191+
// This implementation intentionally ignores metaDataKeys and does not mutate the body.
192+
// It expects the request body to be a JSON object containing a "model" field.
193+
// A nil for metaDataKeysToHeaders map SHOULD be specified by a caller for clarity
194+
// The metaDataKeysToHeaders is explicitly ignored in this implementation
195+
// This implementation is simply refactoring of the default BBR implementation to work with the pluggable framework
196+
func (s *defaultMetaDataExtractor) Execute(requestBodyBytes []byte) (
197+
headers map[string]string,
198+
mutatedBodyBytes []byte,
199+
err error) {
200+
201+
type RequestBody struct {
202+
Model string `json:"model"`
203+
}
204+
205+
h := make(map[string]string)
206+
207+
var requestBody RequestBody
208+
209+
if err := json.Unmarshal(requestBodyBytes, &requestBody); err != nil {
210+
// return original body on decode failure
211+
return nil, requestBodyBytes, err
212+
}
213+
214+
if requestBody.Model == "" {
215+
return nil, requestBodyBytes, fmt.Errorf("missing required field: model")
216+
}
217+
218+
// ModelHeader is a constant defined in ./pkg/bbr/plugins/interfaces
219+
h[ModelHeader] = requestBody.Model
220+
221+
// Body is not mutated in this implementation hence returning original requestBodyBytes. This is intentional.
222+
return h, requestBodyBytes, nil
223+
}
224+
225+
func (s *defaultMetaDataExtractor) String() string {
226+
return fmt.Sprintf(("BBRPlugin{%v/requiresFullParsing=%v}"), s.TypedName(), s.requiresFullParsing)
227+
}
228+
```
229+
230+
### Implementation Phases
231+
232+
The pluggable framework will be implemented iteratively over several phases.
233+
234+
1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart
235+
1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation
236+
1. Introduce shared struct (shared among the plugins of a plugins chain)
237+
1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages
238+
1. Refactor metrics as needed to work with the new pluggable framework
239+
1. Implement configuration via manifests similar to those in EPP
240+
1. Implement `PluginsDAG` to allow for more complex topological order and concurrency.
241+
1. Continously learn lessons from this implementation and scheduling framework to improve the implementation
242+
1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway").
243+
244+
## Open Questions
245+
246+
1. More elaborate shared memory architecture for the best performance
247+
1. TBA
248+
249+
## Note
250+
251+
The proposed interfaces can slightly change from those implemented in the initial [PR 1981](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1981)
31.1 KB
Loading

0 commit comments

Comments
 (0)