Migrate to OpenAI-compatible vLLM API format#168
Merged
FranardoHuang merged 6 commits intomainfrom Jan 30, 2026
Merged
Conversation
- Remove hardcoded API key from config.py (default to 'EMPTY') - Fix hardcoded URL in chat_service.py audio_generator (use settings) - Fix hardcoded paths in chat_service.py and rag_retriever.py (use dynamic paths) - Add vLLM server configuration to .env.example with localhost defaults - Rename 4090modelservice.md to docs/vllm-setup.md with improved docs - Add API key generation instructions and remote server setup guide - Update README with distributed architecture diagram and vLLM config section Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolve conflicts by keeping openai_format's configuration-based approach: - model.py: Keep vLLM client with settings config - chat_service.py: Keep clean whitespace - rag_retriever.py: Keep lazy loading embedding client pattern Incorporates main's changes: - Add F1_racing course support - Use dynamic paths for file locations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add start_vllm_servers.sh: tmux-based script to start all 3 vLLM servers sequentially with proper GPU assignments and startup monitoring - Add stop_vllm_servers.sh: script to cleanly shut down all vLLM servers - Update vllm-setup.md with GPU assignments (CUDA_VISIBLE_DEVICES) and automated script documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
FranardoHuang
added a commit
that referenced
this pull request
Mar 12, 2026
* Openai API format * Update * Edit * fix: remove hardcoded API keys and paths, add vLLM server documentation - Remove hardcoded API key from config.py (default to 'EMPTY') - Fix hardcoded URL in chat_service.py audio_generator (use settings) - Fix hardcoded paths in chat_service.py and rag_retriever.py (use dynamic paths) - Add vLLM server configuration to .env.example with localhost defaults - Rename 4090modelservice.md to docs/vllm-setup.md with improved docs - Add API key generation instructions and remote server setup guide - Update README with distributed architecture diagram and vLLM config section Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add vLLM server startup/shutdown scripts with GPU assignments - Add start_vllm_servers.sh: tmux-based script to start all 3 vLLM servers sequentially with proper GPU assignments and startup monitoring - Add stop_vllm_servers.sh: script to cleanly shut down all vLLM servers - Update vllm-setup.md with GPU assignments (CUDA_VISIBLE_DEVICES) and automated script documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Benzhang2004 <zhangjialin04@sina.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Architecture
AsyncOpenAIclient for chat completions with streaming supportNew Scripts
scripts/start_vllm_servers.sh- Starts all 3 vLLM servers in tmux with:scripts/stop_vllm_servers.sh- Cleanly shuts down all serversConfiguration
.envvariables:VLLM_CHAT_URL,VLLM_EMBEDDING_URL,VLLM_WHISPER_URL,VLLM_API_KEY.env.examplewith all vLLM configuration optionsdocs/vllm-setup.mdwith complete deployment guideTest plan
./scripts/start_vllm_servers.sh🤖 Generated with Claude Code