Add Qwen3 235B A22B Instruct 2507 model for Vertex AI#388
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: addc493296
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "max_input_tokens": 262144, | ||
| "max_output_tokens": 16384 | ||
| }, | ||
| "publishers/qwen/models/qwen3-235b-a22b-instruct-2507-maas": { |
There was a problem hiding this comment.
Route new Qwen MaaS model via Vertex OpenAPI path
This model is registered as format: "openai" on Vertex, so calls will go through fetchOpenAI, but that function only routes publishers/meta/... models to .../endpoints/openapi/chat/completions and sends all other publishers to :rawPredict after rewriting the model name. Because this new entry is another *-maas model (publishers/qwen/...-maas), it will follow the non-Meta branch and be invoked with the wrong Vertex path/model rewrite for chat-completions requests, causing runtime request failures for users selecting this model.
Useful? React with 👍 / 👎.
ibolmo
left a comment
There was a problem hiding this comment.
looks good. i just had a small reservation. did you happen to load this locally and hit the vertex api via the proxy?
packages/proxy/src/proxy.ts
Outdated
| // Use the OpenAPI endpoint. | ||
| fullURL = new URL( | ||
| `${baseURL}/v1beta1/projects/${project}/locations/${location}/endpoints/openapi/chat/completions`, | ||
| `${baseURL}/v1/projects/${project}/locations/${location}/endpoints/openapi/chat/completions`, |
There was a problem hiding this comment.
not sure about this change. are there any models that still require v1betav1? should we keep the old behavior but add another statement? to be safe?
There was a problem hiding this comment.
thanks! yes I did test this with a meta model with the following with this fix:
% curl -s --max-time 5 -X POST http://localhost:8001/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: $BRAINTRUST_API_KEY" -d '{"model": "publishers/meta/models/llama-3.3-70b-instruct-maas", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}
'
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Hello. How can I assist you today?","role":"assistant"}}],"created":1770834581,"id":"lcqMafL4OtD52PgP9qvZwQs","model":"meta/llama-3.3-70b-instruct-maas","object":"chat.completion","system_fingerprint":"","usage":{"completion_tokens":10,"extra_properties":{"google":{"traffic_type":"ON_DEMAND"}},"prompt_tokens":37,"total_tokens":47}}
I also did some digging and it seems like v1 is stable and v1beta1 is no longer needed but I'm happy to split them out to keep sending publishers/meta to v1beta1 to not introduce something new here.
There was a problem hiding this comment.
you can leave it, but maybe try to use the old models and see if we're good
There was a problem hiding this comment.
I ended up making it conditional to be on the safe side, so meta will remain on v1beta1 and qwen3 will be on v1
|
Deployment failed with the following error: View Documentation: https://vercel.com/docs/two-factor-authentication |
Adding model Qwen3 235B A22B Instruct 2507 model for Vertex AI using details in: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/maas/qwen/qwen3-235b. Per the Vertex AI docs, supported regions are
globalandus-south1, and I added that to the model as well.