Skip to content

fix(tests): mock rl.agent_optimizer in feedback coverage tests#1342

Closed
anchapin wants to merge 15 commits into
mainfrom
fix/1341-test-api-feedback-coverage-import-mock
Closed

fix(tests): mock rl.agent_optimizer in feedback coverage tests#1342
anchapin wants to merge 15 commits into
mainfrom
fix/1341-test-api-feedback-coverage-import-mock

Conversation

@anchapin
Copy link
Copy Markdown
Owner

@anchapin anchapin commented May 7, 2026

Summary

  • Add patch.dict("sys.modules", {"rl.agent_optimizer": None}) to 3 import error tests in test_api_feedback_coverage.py
  • Tests were returning 200 instead of 503 because they didn't properly mock the module

Fixes #1341

- Add FewShotEnhancerAgent class (ai-engine/agents/fewshot_enhancer_agent.py)
  - Uses OpenRouter frontier models via premium_client.py
  - Provides fast, cost-effective first pass in conversion pipeline
  - 3 few-shot examples for blocks, items, entities
  - Cost: ~$0.006/conversion, ~40s, Quality: 6-7/10

- Integrate into crew_integration.py
  - Initialize FewShotEnhancerAgent in _initialize_agents()
  - Register fewshot_enhancer executor in _register_agents()
  - Create _create_fewshot_enhancer_executor() for hybrid workflow

- Hybrid approach benefits:
  - Few-shot handles basic mapping (reduces crew workload ~40%)
  - Crew focuses on refinement and validation
  - Cut full crew cost by ~40%, time by ~30%
  - Quality: 8.5/10 vs 9/10 full crew
…enRouter

- Add ai_engine/mmsd/premium_client.py with PortKitPremium client
  - Supports DeepSeek V4 Pro, Kimi K2, GLM-5, DeepSeek V4 Flash
  - 3 curated few-shot examples for blocks, items, entity spawning
  - Auto-fallback and retry with backoff for rate limits
  - Cost estimation (~$0.006 per conversion)
  - CLI interface for quick conversions

- Add backend/src/api/premium_conversion.py API endpoints
  - POST /api/v1/premium/convert - Premium conversion
  - GET /api/v1/premium/models - List available models
  - POST /api/v1/premium/estimate - Estimate conversion cost

- Update frontend API service (frontend/src/services/api.ts)
  - Add premiumConvert, listPremiumModels, estimatePremiumCost

- Update subscriptionTier.ts with premium_conversion feature
  - Studio tier required for premium conversion access

- Update documentation
  - ai-engine/README.md - Add BYOK section
  - docs/api-reference.md - Add Premium Conversion section

- Add unit tests (ai_engine/tests/test_premium_client.py)
…ration

- Add ai-engine/tests/test_fewshot_enhancer_agent.py (16 tests)
  - TestFewShotEnhancerAgent: initialization, enhance, batch, cost, quality
  - TestFewShotEnhancerTools: enhance_tool, get_tools
  - TestEnhancementResult: creation with quality scores

- Update ai-engine/tests/unit/test_crew_integration.py (5 new tests)
  - test_fewshot_enhancer_executor: verifies executor returns proper dict
  - test_fewshot_enhancer_executor_no_source: error handling
  - test_fewshot_enhancer_executor_handles_failure: API failure handling
  - test_fewshot_enhancer_initialized_when_available: agent init check
  - test_initialization_with_fewshot_agent: full initialization

Note: ai_engine/mmsd/synthesis_pairs_recovered.jsonl shows as modified
due to git-LFS pointer issues - should not affect test results.
- Add MojmapMappingValidator to detect SRG/MCP patterns (func_N, field_N, net_minecraft_)
- Integrate validator into run_validation.py pipeline
- Add comprehensive unit tests for the validator
- Document mapping standard in ai_engine/mmsd/README.md

Fixes #1321
The CeleryIntegration in sentry_sdk no longer supports monitor_all_tasks parameter.
It was replaced with monitor_beat_tasks in newer versions.
…tuning

Supplement training data with ~12% general Java/JS code from
m-a-p/CodeFeedback-Filtered-Instruction to prevent the model from
overwriting general code reasoning with Minecraft-specific patterns.

Changes:
- Add GENERAL_CODE_DATASET, MIX_RATIO (0.12), and sample size config
- Add format_general_code() for general code example formatting
- Add load_general_code_dataset() to download/filter/sample 200 Java/JS pairs
- Add count_tokens() and mix_datasets() for ratio-aware mixing
- Update main() to load and mix datasets before training
- Add general_code_mix info to training summary JSON
- Document mixing procedure in TRAINING_REPORT.md Section 4

Issue: #1324
…uantization floor enforcement

- Add runpod_server.py: FastAPI server with OpenAI-compatible /v1/chat/completions
  backed by vLLM (bfloat16, 8192 ctx), loads alexchapin/portkit-7b
- Add Dockerfile.runpod: slim Python image with vLLM + FastAPI, model downloaded
  at container cold-start via runpod_entrypoint.sh
- Add modal_inference.py: Modal deployment (refactored to single PortkitInference
  class with both generate() and chat_completions() methods)
- Add quantization floor enforcement to SelfHostedInferenceClient:
  - check_quantization_floor() validates Q5_K_M minimum for GGUF, 4-bit/AWQ floor
  - InferenceConfig.validate_quantization() called on init with startup warning
  - Quantization type tracked in InferenceConfig.model_quant_type
- Add INFERENCE_DEPLOYMENT.md: Quantization standards docs (floor table, rationale,
  env vars, AWQ vs GGUF vs vLLM tradeoffs)
- Update .env with RUNPOD_ENDPOINT_ID, RUNPOD_ENDPOINT, INFERENCE_MODE,
  INFERENCE_PROVIDER, LLM_MODEL (local only, not committed)
- RunPod serverless endpoint deployed: vuvrsoij8po6oa (ADA_24/RTX 4090 pool,
  scale-to-zero with 0 warm workers)
Ruff F401: json, dataclasses.field, httpx, ConversionResult unused
- Add @patch for api.conversions.cache to mock CacheService
- Add @patch for api.conversions.get_celery_monitor to mock CeleryQueueMonitor
- Both were attempting real Redis connections at localhost:6379

Fixes #1338
- Add mocks for cache.set_job_status, cache.set_progress, cache.get_job_status
- Add mock for get_celery_monitor.check_queue_health

Ref: #1338
The import_error tests were returning 200 instead of 503 because
they didn't properly mock the rl.agent_optimizer module. In CI,
the dynamic path resolution allowed the import to succeed.

Fixes: #1341
Copilot AI review requested due to automatic review settings May 7, 2026 07:29
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @anchapin, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@anchapin
Copy link
Copy Markdown
Owner Author

anchapin commented May 7, 2026

Duplicate of #1343 - closing in favor of cleaner branch with just the fix

@anchapin anchapin closed this May 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request primarily aims to make Feedback API “import error” tests deterministic by reliably simulating rl.agent_optimizer being unavailable, but it also bundles several unrelated backend/AI-engine/infrastructure changes (new inference deployment scripts, MMSD validation tooling, CI tweaks, etc.).

Changes:

  • Fix feedback coverage tests by patching sys.modules["rl.agent_optimizer"] to force import failure paths (503s).
  • Expand/adjust backend and integration tests (conversion endpoint mocking, performance test state reset, real-service defaults).
  • Add MMSD Mojmap validation tooling + docs, and introduce RunPod/Modal inference server deployment artifacts; tweak CI workflow step gating.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
backend/src/tests/unit/test_api_feedback_coverage.py Wraps 3 agent-performance “import error” tests with patch.dict(sys.modules, ...) to force 503 paths.
backend/src/tests/unit/test_api_conversions_targeted.py Mocks additional dependencies (cache, get_celery_monitor) used by conversion creation.
backend/src/tests/integration/test_performance_integration.py Ensures performance integration tests start from a clean in-memory mock state each test.
backend/src/tests/integration/conftest.py Changes default “real services” Postgres/Redis ports/URLs.
backend/src/services/sentry_config.py Adjusts Sentry Celery integration configuration flags.
backend/src/main.py Reformats Sentry-related imports into a multi-line import block.
backend/src/api/premium_conversion.py Minor formatting/docstring adjustments and small return formatting cleanup.
ai-engine/tests/unit/test_mojmap_validator.py Adds unit tests for Mojmap validator (but currently imports from ai_engine...).
ai-engine/services/runpod_server.py Adds a FastAPI vLLM-backed OpenAI-compatible inference server.
ai-engine/services/runpod_entrypoint.sh Adds RunPod entrypoint that downloads HF model weights before starting the server.
ai-engine/services/modal_inference.py Adds a Modal deployment implementation for vLLM inference.
ai-engine/services/Dockerfile.runpod Adds a Dockerfile intended for RunPod serverless deployment.
ai_engine/mmsd/validators/mojmap_validator.py Introduces Mojmap vs SRG/MCP pattern validator.
ai_engine/mmsd/run_validation.py Extends MMSD validation pipeline to skip non-Mojmap Java sources.
ai_engine/mmsd/README.md Documents the Mojmap requirement and validation pipeline for MMSD data.
.github/workflows/ci.yml Gates frontend bundle analysis/report steps on needs.changes.outputs.frontend.

Comment on lines 272 to +277
def test_get_agent_performance_import_error(self):
app = _make_app()
client = TestClient(app)
resp = client.get("/ai/performance/agents")
assert resp.status_code == 503
with patch.dict("sys.modules", {"rl.agent_optimizer": None}):
resp = client.get("/ai/performance/agents")
assert resp.status_code == 503
Comment on lines +2 to +4
from ai_engine.mmsd.validators.mojmap_validator import MojmapMappingValidator


Comment on lines +5 to +7
MODEL_REPO="alexchapin/portkit-7b"
MODEL_REVISION="${MODEL_REVISION:-main}"
MODEL_DIR="/model_cache/portkit_7b"
Comment on lines +22 to +37
MODEL_REPO = os.getenv("MODEL_REPO", "alexchapin/portkit-7b")
MODEL_REVISION = os.getenv("MODEL_REVISION", "main")
MODEL_DIR = os.getenv("MODEL_DIR", "/model_cache/portkit_7b")
MAX_MODEL_LEN = int(os.getenv("MAX_MODEL_LEN", "8192"))

llm: LLM | None = None
tokenizer: AutoTokenizer | None = None


def load_model() -> tuple[LLM, AutoTokenizer]:
"""Load vLLM model and tokenizer."""
print(f"[server] Loading vLLM (bfloat16, max_model_len={MAX_MODEL_LEN})...")
llm = LLM(
model=MODEL_DIR,
dtype="bfloat16",
max_model_len=MAX_MODEL_LEN,
Comment on lines +22 to +41
FROM python:3.11-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
git \
libgomp1 \
&& rm -rf /var/lib/apt/lists/*

# Install vLLM, transformers, huggingface libs
RUN pip install --no-cache-dir \
vllm>=0.6.0 \
transformers>=4.44.0 \
huggingface_hub>=0.24.0 \
hf-transfer>=0.1.0 \
accelerate>=0.34.0 \
fastapi \
uvicorn \
httpx
Comment on lines 15 to 20
USE_REAL_SERVICES = os.getenv("USE_REAL_SERVICES", "0") == "1"
REAL_DB_URL = os.getenv(
"TEST_DATABASE_URL", "postgresql+asyncpg://postgres:password@localhost:5436/modporter_test"
"TEST_DATABASE_URL", "postgresql+asyncpg://postgres:password@localhost:5434/modporter_test"
)
REAL_REDIS_URL = os.getenv("TEST_REDIS_URL", "redis://localhost:6381/0")
REAL_REDIS_URL = os.getenv("TEST_REDIS_URL", "redis://localhost:6380/0")
REAL_AI_ENGINE_URL = os.getenv("TEST_AI_ENGINE_URL", "http://localhost:8080")
Comment on lines +78 to +86
print(f"Loading vLLM (fp16, max_model_len=8192)...")
self.llm = LLM(
model=model_path,
dtype="bfloat16",
max_model_len=8192,
tensor_parallel_size=1,
gpu_memory_utilization=0.85,
trust_remote_code=True,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_api_feedback_coverage.py tests fail - returning 200 instead of 503

3 participants