|
| 1 | +# Pluggable Body-Based Routing (BBR) Framework |
| 2 | + |
| 3 | +Author(s): @davidbreitgand @srampal |
| 4 | + |
| 5 | +## Proposal Status |
| 6 | + |
| 7 | +***Draft*** |
| 8 | + |
| 9 | +## Summary |
| 10 | + |
| 11 | +The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances. |
| 12 | + |
| 13 | +The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base. |
| 14 | + |
| 15 | +This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution. |
| 16 | + |
| 17 | +See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference. |
| 18 | + |
| 19 | +## Goals |
| 20 | + |
| 21 | +The pluggable BBR Framework aims at addressing the following goals |
| 22 | + |
| 23 | +- Avoid monolithic architecture |
| 24 | +- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two |
| 25 | +- Enable organizing plugins into a topology for ordered and concurrent execution |
| 26 | +- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance |
| 27 | +- Limit changes to the BBR feature to avoid any changes in the rest of the code base |
| 28 | +- Follow best practices and experience from the Scheduling subsystem |
| 29 | + pluggability effort. For example, extending the system to support the above |
| 30 | + should be through implementing well defined `Plugin` interfaces and registering |
| 31 | + them in the BBR subsystem; any configuration would be done in the |
| 32 | + same way (e.g., code and/or configuration file) |
| 33 | +- Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse |
| 34 | +- Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system |
| 35 | +- Provide reference plugin implementations. |
| 36 | + |
| 37 | +## Non-Goals |
| 38 | + |
| 39 | +- Modify existing GIE abstractions |
| 40 | +- Fully align plugins, registries, and factories across BBR and EPP |
| 41 | +- Dynamically reconfigure plugins and plugin topologies at runtime |
| 42 | + |
| 43 | +## Proposal |
| 44 | + |
| 45 | +### Overview |
| 46 | + |
| 47 | +There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories. |
| 48 | +In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency. |
| 49 | + |
| 50 | +`PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically. |
| 51 | + |
| 52 | +Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin. |
| 53 | + |
| 54 | +### Suggested Components |
| 55 | + |
| 56 | +The sketch of the proposed framework is shown in the figure below. |
| 57 | +<img src="./images/pluggable-framework-architecture-sketch.png" alt="Components of the proposed framework" width="1000" /> |
| 58 | + |
| 59 | +### Suggested BBR Pluggable Framework Interfaces |
| 60 | + |
| 61 | +```go |
| 62 | +// ------------------------------------ Defaults ------------------------------------------ |
| 63 | +const DefaultPluginType = "MetadataExtractor" |
| 64 | +const DefaultPluginImplementation = "simple-model-selector" |
| 65 | + |
| 66 | +// BBRPlugin defines the interface for plugins in the BBR framework |
| 67 | +type BBRPlugin interface { |
| 68 | + plugins.Plugin |
| 69 | + |
| 70 | + // RequiresFullParsing indicates whether full body parsing is required |
| 71 | + // to facilitate efficient memory sharing across plugins in a chain. |
| 72 | + RequiresFullParsing() bool |
| 73 | + |
| 74 | + // Execute runs the plugin logic on the request body. |
| 75 | + // A plugin's imnplementation logic CAN mutate the body of the message. |
| 76 | + // A plugin's implementation MUST return a map of headers |
| 77 | + // If no headers are set by the implementation, the map must be empty |
| 78 | + // A value of a header in an extended implementation NEED NOT to be identical to the value of that same header as would be set |
| 79 | + // in a default implementation. |
| 80 | + // Example: in the body of a request model is set to "semantic-model-selector", |
| 81 | + // which, say, stands for "select a best model for this request at minimal cost" |
| 82 | + // A plugin implementation of "semantic-model-selector" sets X-Gateway-Model-Name to any valid |
| 83 | + // model name from the inventory of the backend models and also mutates the body accordingly |
| 84 | + // In contrast, |
| 85 | + Execute(requestBodyBytes []byte) ( |
| 86 | + headers map[string]string, |
| 87 | + mutatedBodyBytes []byte, |
| 88 | + err error, |
| 89 | + ) |
| 90 | +} |
| 91 | + |
| 92 | + |
| 93 | +// placeholder for BBRPlugin constructors |
| 94 | +type PluginFactoryFunc func() bbrplugins.BBRPlugin //concrete constructors are assigned to this type |
| 95 | + |
| 96 | +// PluginRegistry defines operations for managing plugin factories and plugin instances |
| 97 | +type PluginRegistry interface { |
| 98 | + RegisterFactory(typeKey string, factory PluginFactoryFunc) error //constructors |
| 99 | + RegisterPlugin(plugin bbrplugins.BBRPlugin) error //registers a plugin instance (the instance MUST be created via the factory first) |
| 100 | + GetFactory(typeKey string) (PluginFactoryFunc, error) |
| 101 | + GetPlugin(typeKey string) (bbrplugins.BBRPlugin, error) |
| 102 | + GetFactories() map[string]PluginFactoryFunc |
| 103 | + GetPlugins() map[string]bbrplugins.BBRPlugin |
| 104 | + ListPlugins() []string |
| 105 | + ListFactories() []string |
| 106 | + CreatePlugin(typeKey string) (bbrplugins.BBRPlugin, error) |
| 107 | + ContainsFactory(typeKey string) bool |
| 108 | + ContainsPlugin(typeKey string) bool |
| 109 | + String() string //human readable string for logging |
| 110 | +} |
| 111 | + |
| 112 | +// PluginsChain is used to define a specific order of execution of the BBRPlugin instances stored in the registry |
| 113 | +// The BBRPlugin instances |
| 114 | +type PluginsChain interface { |
| 115 | + AddPlugin(typeKey string, registry PluginRegistry) error //to be added to the chain the plugin should be registered in the registry first |
| 116 | + AddPluginAtInd(typeKey string, i int, r PluginRegistry) error //only affects the instance of the plugin chain |
| 117 | + GetPlugin(index int, registry PluginRegistry) (bbrplugins.BBRPlugin, error) //retrieves i-th plugin as defined in the chain from the registry |
| 118 | + Length() int |
| 119 | + ParseChatCompletion(data []byte) (openai.ChatCompletionNewParams, error) //parses the bytes slice into an appropriate openai-go struct |
| 120 | + ParseCompletion(data []byte) (openai.CompletionNewParams, error) //likewise |
| 121 | + GetSharedMemory(which string) interface{} //returns an appropriate shared open-ai struct dependent on whether which |
| 122 | + //corresponds to Completion or ChatCompletion endpoint requested in the body |
| 123 | + Run(bodyBytes []byte, registry PluginRegistry) ([]byte, map[string]string, error) //return potentially mutated body and all headers map safely merged |
| 124 | + String() string |
| 125 | +} |
| 126 | +//NOTE: for simplicity, in the initial PR, PluginsChain instance will be defined request only |
| 127 | +``` |
| 128 | + |
| 129 | +### Defaults |
| 130 | + |
| 131 | +```go |
| 132 | + |
| 133 | +const ( |
| 134 | + //A deafult plugin implementation of this plugin type will always be configured for request plugins chain |
| 135 | + //Even though BBRPlugin type is not (yet) a K8s resource, it's logically akin to `kind` |
| 136 | + //MUST start wit an upper case letter, use CamelNotation, only aplhanumericals after the first letter |
| 137 | + PluginTypePattern = `^[A-Z][A-Za-z0-9]*$` |
| 138 | + MaxPluginTypeLength = 63 |
| 139 | + DefaultPluginType = "MetaDataExtractor" |
| 140 | + // Even though BBRPlugin is not a K8s resource yet, let's make its naming compliant with K8s resource naming |
| 141 | + // Allows: lowercase letters, digits, hyphens, dots. |
| 142 | + // Must start and end with a lowercase alphanumeric character. |
| 143 | + // Middle characters group can contain lowercase alphanumerics, hyphens, and dots |
| 144 | + // Middle and rightmost groups are optional |
| 145 | + PluginNamePattern = `^[a-z0-9]([-a-z0-9.]*[a-z0-9])?$` |
| 146 | + DefaultPluginName = "simple-model-extractor" |
| 147 | + MaxPluginNameLength = 253 |
| 148 | + //Well-known custom header set to a model name |
| 149 | + ModelHeader = "X-Gateway-Model-Name" |
| 150 | +) |
| 151 | +``` |
| 152 | + |
| 153 | +### Current BBR reimplementation as BBRPlugin |
| 154 | + |
| 155 | +```go |
| 156 | +/ ------------------------------------ DEFAULT PLUGIN IMPLEMENTATION ---------------------------------------------- |
| 157 | + |
| 158 | +type simpleModelExtractor struct { //implements the MetadataExtractor interface |
| 159 | + typedName plugins.TypedName |
| 160 | + requiresFullParsing bool |
| 161 | +} |
| 162 | + |
| 163 | +// defaultMetaDataExtractor implements the MetadataExtractor interface and extracts only the mmodel name AS-IS |
| 164 | +type defaultMetaDataExtractor struct { |
| 165 | + typedName plugins.TypedName |
| 166 | + requiresFullParsing bool //this field will be used to determine whether shared struct should be created in this chain |
| 167 | +} |
| 168 | + |
| 169 | +// NewSimpleModelExtractor is a factory that constructs SimpleModelExtractor plugin |
| 170 | +// A developer who wishes to create her own implementation, will implement the BBRPlugin interface and |
| 171 | +// use Registry and PluginsChain to register and execute the plugin (together with other plugins in a chain) |
| 172 | +func NewDefaultMetaDataExtractor() BBRPlugin { |
| 173 | + return &defaultMetaDataExtractor{ |
| 174 | + typedName: plugins.TypedName{ |
| 175 | + Type: DefaultPluginType, |
| 176 | + Name: "simple-model-extractor", |
| 177 | + }, |
| 178 | + requiresFullParsing: false, |
| 179 | + } |
| 180 | +} |
| 181 | + |
| 182 | +func (s *defaultMetaDataExtractor) RequiresFullParsing() bool { |
| 183 | + return s.requiresFullParsing |
| 184 | +} |
| 185 | + |
| 186 | +func (s *defaultMetaDataExtractor) TypedName() plugins.TypedName { |
| 187 | + return s.typedName |
| 188 | +} |
| 189 | + |
| 190 | +// Execute extracts the "model" from the JSON request body and sets X-Gateway-Model-Name header. |
| 191 | +// This implementation intentionally ignores metaDataKeys and does not mutate the body. |
| 192 | +// It expects the request body to be a JSON object containing a "model" field. |
| 193 | +// A nil for metaDataKeysToHeaders map SHOULD be specified by a caller for clarity |
| 194 | +// The metaDataKeysToHeaders is explicitly ignored in this implementation |
| 195 | +// This implementation is simply refactoring of the default BBR implementation to work with the pluggable framework |
| 196 | +func (s *defaultMetaDataExtractor) Execute(requestBodyBytes []byte) ( |
| 197 | + headers map[string]string, |
| 198 | + mutatedBodyBytes []byte, |
| 199 | + err error) { |
| 200 | + |
| 201 | + type RequestBody struct { |
| 202 | + Model string `json:"model"` |
| 203 | + } |
| 204 | + |
| 205 | + h := make(map[string]string) |
| 206 | + |
| 207 | + var requestBody RequestBody |
| 208 | + |
| 209 | + if err := json.Unmarshal(requestBodyBytes, &requestBody); err != nil { |
| 210 | + // return original body on decode failure |
| 211 | + return nil, requestBodyBytes, err |
| 212 | + } |
| 213 | + |
| 214 | + if requestBody.Model == "" { |
| 215 | + return nil, requestBodyBytes, fmt.Errorf("missing required field: model") |
| 216 | + } |
| 217 | + |
| 218 | + // ModelHeader is a constant defined in ./pkg/bbr/plugins/interfaces |
| 219 | + h[ModelHeader] = requestBody.Model |
| 220 | + |
| 221 | + // Body is not mutated in this implementation hence returning original requestBodyBytes. This is intentional. |
| 222 | + return h, requestBodyBytes, nil |
| 223 | +} |
| 224 | + |
| 225 | +func (s *defaultMetaDataExtractor) String() string { |
| 226 | + return fmt.Sprintf(("BBRPlugin{%v/requiresFullParsing=%v}"), s.TypedName(), s.requiresFullParsing) |
| 227 | +} |
| 228 | +``` |
| 229 | + |
| 230 | +### Implementation Phases |
| 231 | + |
| 232 | +The pluggable framework will be implemented iteratively over several phases. |
| 233 | + |
| 234 | +1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart |
| 235 | +1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation |
| 236 | +1. Introduce shared struct (shared among the plugins of a plugins chain) |
| 237 | +1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages |
| 238 | +1. Refactor metrics as needed to work with the new pluggable framework |
| 239 | +1. Implement configuration via manifests similar to those in EPP |
| 240 | +1. Implement `PluginsDAG` to allow for more complex topological order and concurrency. |
| 241 | +1. Continously learn lessons from this implementation and scheduling framework to improve the implementation |
| 242 | +1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway"). |
| 243 | + |
| 244 | +## Open Questions |
| 245 | + |
| 246 | +1. More elaborate shared memory architecture for the best performance |
| 247 | +1. TBA |
| 248 | + |
| 249 | +## Note |
| 250 | + |
| 251 | +The proposed interfaces can slightly change from those implemented in the initial [PR 1981](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1981) |
0 commit comments