ci: Add vLLM support to integration testing infrastructure #3128

derekhiggins · 2025-08-13T13:43:05Z

o Introduces vLLM provider support to the record/replay testing framework
o Enabling both recording and replay of vLLM API interactions alongside existing Ollama support.

The changes enable testing of vLLM functionality. vLLM tests focus on
inference capabilities, while Ollama continues to exercise the full API surface
including vision features.

Related: #2888

--
see alternative using qwen here #3545

llama_stack/testing/inference_recorder.py

derekhiggins · 2025-08-20T15:28:57Z

@ashwinb With this we'll need to run the record tests for 2 providers, but they can't be run in parrallel because

CONFLICT (content): Merge conflict in tests/integration/recordings/index.sqlite

it works if you run them sequentially

@ashwin to avoid conflicts what would you think about removing the index.sqlite file altogether?
From what I can see it is only used to get the path of the recording, and we can instead infer this from the request_hash
there are probably other places this file is going to cause conflicts so possibly its good to remove it anyways

r3v5 · 2025-08-25T10:21:35Z

@ashwinb With this we'll need to run the record tests for 2 providers, but they can't be run in parrallel because

CONFLICT (content): Merge conflict in tests/integration/recordings/index.sqlite

it works if you run them sequentially

@ashwin to avoid conflicts what would you think about removing the index.sqlite file altogether? From what I can see it is only used to get the path of the recording, and we can instead infer this from the request_hash there are probably other places this file is going to cause conflicts so possibly its good to remove it anyways

I also dealt with index.sqlite conflict when re-recorded tests responses for CI.

derekhiggins · 2025-08-26T16:30:57Z

@ashwinb With this we'll need to run the record tests for 2 providers, but they can't be run in parrallel because

CONFLICT (content): Merge conflict in tests/integration/recordings/index.sqlite

it works if you run them sequentially

@ashwin to avoid conflicts what would you think about removing the index.sqlite file altogether? From what I can see it is only used to get the path of the recording, and we can instead infer this from the request_hash there are probably other places this file is going to cause conflicts so possibly its good to remove it anyways

index.sqlite has been removed here #3254

llama_stack/testing/inference_recorder.py

tests/integration/test_cases/inference/chat_completion.json

mattf · 2025-09-23T08:41:48Z

scripts/integration-tests.sh

 # Additional exclusions for vllm setup
 if [[ "$TEST_SETUP" == "vllm" ]]; then
-    EXCLUDE_TESTS="${EXCLUDE_TESTS} or test_inference_store_tool_calls"
+    EXCLUDE_TESTS="${EXCLUDE_TESTS} or test_inference_store_tool_calls or test_text_chat_completion_structured_output"


what about adding these to the skips in the test files directly?

The problem here is because of the model, Our skips in test files are all based on provider
I put the skips here so that it only skips them in CI, anybody running integration test with a more capable model will still be able to use them.

If we can get to the point that this job is running, I'll happy test other models to see if I can get ride of this line alltogether.

yes please, ci w/ a model that passes more tests.

having a gap between what ci test and what developers see in the test suite is going to lead to bugs and confusion.

I've opened an alternative PR that instead use qwen3 #3545
I'll can close which ever one we don't want to go with

llama_stack/testing/inference_recorder.py

.github/workflows/integration-tests.yml

mattf · 2025-09-23T10:23:52Z

what changes did you make in your last force-push?

derekhiggins · 2025-09-23T11:04:16Z

what changes did you make in your last force-push?

tldr: I removed all of the trivial changes that the ollama record ci job produced

--
I used the ollama record CI jobs to create new recordings for ollama, this resulted in a PR with recordings with mostly trivial changes
this was 72da0de9b01e27f30abe0012875ad09f84b8bb29 101 files changed

I took this commit and removed the trivial changes (these arn't needed),
1a759f5 2 files changed

- Update earth question to be more specific with multiple choice format to prevent Llama-3.2-1B-Instruct from rambling about other planets - Skip test_text_chat_completion_structured_output as it sometimes times out during CI execution again with Llama-3.2-1B-Instruct on vllm Signed-off-by: Derek Higgins <[email protected]>

Add vLLM provider support to integration test CI workflows alongside existing Ollama support. Configure provider-specific test execution where vLLM runs only inference specific tests (excluding vision tests) while Ollama continues to run the full test suite. This enables comprehensive CI testing of both inference providers but keeps the vLLM footprint small, this can be expanded later if it proves to not be too disruptive. Signed-off-by: Derek Higgins <[email protected]>

Signed-off-by: Derek Higgins <[email protected]>

derekhiggins · 2025-10-10T10:16:58Z

Closing this i favour of the qwen version (as per discussion in community meeting)
#3545

derekhiggins requested review from ashwinb, bbrowning, ehhuang, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners August 13, 2025 13:43

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 13, 2025

ashwinb reviewed Aug 13, 2025

View reviewed changes

llama_stack/testing/inference_recorder.py Outdated Show resolved Hide resolved

ashwinb reviewed Aug 13, 2025

View reviewed changes

llama_stack/testing/inference_recorder.py Outdated Show resolved Hide resolved

ashwinb reviewed Aug 13, 2025

View reviewed changes

llama_stack/testing/inference_recorder.py Outdated Show resolved Hide resolved

derekhiggins force-pushed the vllm-ci-2 branch from 1f29aaa to 6a4da14 Compare August 15, 2025 12:06

ashwinb reviewed Aug 15, 2025

View reviewed changes

llama_stack/testing/inference_recorder.py Outdated Show resolved Hide resolved

derekhiggins force-pushed the vllm-ci-2 branch 4 times, most recently from 21f2737 to 0f0e9ca Compare August 20, 2025 15:12

derekhiggins force-pushed the vllm-ci-2 branch 2 times, most recently from fbe1472 to 9e9687d Compare August 25, 2025 10:07

derekhiggins force-pushed the vllm-ci-2 branch 4 times, most recently from 2ef7b4a to 79784ff Compare August 26, 2025 16:26

derekhiggins mentioned this pull request Sep 8, 2025

fix: Allow overriding model in integration tests #3372

Closed

derekhiggins force-pushed the vllm-ci-2 branch 3 times, most recently from 82c7e69 to c38a15a Compare September 10, 2025 16:21

ashwinb reviewed Sep 10, 2025

View reviewed changes

llama_stack/testing/inference_recorder.py Outdated Show resolved Hide resolved

derekhiggins force-pushed the vllm-ci-2 branch from c38a15a to 0e71d65 Compare September 11, 2025 11:46

This was referenced Sep 11, 2025

fix(inference_store): on duplicate chat completion IDs, replace #3408

Merged

chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions #3404

Merged

derekhiggins force-pushed the vllm-ci-2 branch 2 times, most recently from eb23960 to 2fdab23 Compare September 11, 2025 16:37

derekhiggins mentioned this pull request Sep 15, 2025

chore(recorder): update mocks to be closer to non-mock environment #3442

Merged

derekhiggins force-pushed the vllm-ci-2 branch from 2fdab23 to 903cffd Compare September 16, 2025 15:06

derekhiggins mentioned this pull request Sep 16, 2025

Regression in Ollama Model Listing with Inference Recording #3457

Closed

2 tasks

derekhiggins force-pushed the vllm-ci-2 branch 2 times, most recently from 6052457 to c13151e Compare September 23, 2025 08:32

mattf reviewed Sep 23, 2025

View reviewed changes

tests/integration/test_cases/inference/chat_completion.json Show resolved Hide resolved

mattf requested changes Sep 23, 2025

View reviewed changes

derekhiggins force-pushed the vllm-ci-2 branch from 72da0de to 1a759f5 Compare September 23, 2025 09:22

mattf mentioned this pull request Sep 23, 2025

dev: cannot replay or record using setup=vllm #3523

Closed

2 tasks

derekhiggins mentioned this pull request Sep 23, 2025

fix(dev): fix vllm inference recording (await models.list) #3524

Merged

derekhiggins and others added 4 commits September 24, 2025 09:38

Recordings update from CI (setup: vllm, suite: base-vllm-subset)

dc965ba

Recordings update (setup: ollama, suite: base)

0ec2427

Signed-off-by: Derek Higgins <[email protected]>

derekhiggins force-pushed the vllm-ci-2 branch from 1a759f5 to 0ec2427 Compare September 24, 2025 08:39

derekhiggins mentioned this pull request Sep 24, 2025

ci: Add vLLM support to integration testing infrastructure (with qwen) #3545

Open

derekhiggins closed this Oct 10, 2025

ci: Add vLLM support to integration testing infrastructure #3128

ci: Add vLLM support to integration testing infrastructure #3128

Uh oh!

Conversation

derekhiggins commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

derekhiggins commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r3v5 commented Aug 25, 2025

Uh oh!

derekhiggins commented Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

mattf Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

derekhiggins Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

mattf Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

derekhiggins Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mattf commented Sep 23, 2025

Uh oh!

derekhiggins commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

derekhiggins commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

derekhiggins commented Aug 13, 2025 •

edited

Loading

derekhiggins commented Aug 20, 2025 •

edited

Loading

derekhiggins commented Sep 23, 2025 •

edited

Loading