-
Notifications
You must be signed in to change notification settings - Fork 4
Fix CI test order - install util-genai before dependent packages #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The test for opentelemetry-util-genai-emitters-splunk fails with: ModuleNotFoundError: No module named 'opentelemetry.util.genai.emitters.spec' This happens because emitters-splunk depends on opentelemetry-util-genai being installed first (which contains the emitters.spec module). Reordered the test steps to install and test the base util-genai package before the packages that depend on it (emitters-splunk, evals, etc.). (cherry picked from commit e8bfae2)
Some tests have cross-dependencies between packages: - util-genai tests import from util-genai-evals - emitters-splunk tests import from util-genai Changed strategy to install all packages first (with --no-deps), then run each package's tests separately. This ensures all inter-package dependencies are available during testing. (cherry picked from commit c8a3bdd)
Changed all 'logger' references to '_LOGGER' to match the module's logger variable name (line 239 and 245). This fixes the NameError that was occurring during test execution. (cherry picked from commit d5b7953)
Tests were trying to mock 'handler._load_completion_callbacks' but the function is actually 'utils.load_completion_callbacks' (imported from utils module). Updated the mock patch paths to point to the correct location. (cherry picked from commit ccc7665)
…ctories, update tests to match current implementation (cherry picked from commit a1910cf)
(cherry picked from commit d2a5711)
…vals - Fix 4 instances of handler._load_completion_callbacks -> utils.load_completion_callbacks in test_evaluators.py - Fix test_evaluation_dynamic_aggregation.py to set _aggregate_results to None instead of False to enable dynamic environment variable reading as per actual implementation (cherry picked from commit a20742f)
The test for opentelemetry-util-genai-emitters-splunk fails with: ModuleNotFoundError: No module named 'opentelemetry.util.genai.emitters.spec' This happens because emitters-splunk depends on opentelemetry-util-genai being installed first (which contains the emitters.spec module). Reordered the test steps to install and test the base util-genai package before the packages that depend on it (emitters-splunk, evals, etc.). (cherry picked from commit e8bfae2)
Some tests have cross-dependencies between packages: - util-genai tests import from util-genai-evals - emitters-splunk tests import from util-genai Changed strategy to install all packages first (with --no-deps), then run each package's tests separately. This ensures all inter-package dependencies are available during testing. (cherry picked from commit c8a3bdd)
- Add [test] extra to pyproject.toml with langchain-core, langchain-openai, pytest-recording, vcrpy, pyyaml, flaky - Update CI workflow to install [instruments,test] dependencies - Fixes CI failure: ModuleNotFoundError: No module named 'langchain_openai'
Fixes ImportError: cannot import name 'call_runtest_hook' from 'flaky.flaky_pytest_plugin' The flaky 3.7.0 version is incompatible with pytest 7.4.4 used in CI. Upgrading to flaky>=3.8.1 resolves the compatibility issue.
- Fix incorrect patch targets for _instantiate_metrics and _run_deepeval * These are module-level functions, not class methods * Change from patch.object(class, method) to patch(module.function) * Update function signatures to match actual implementations - Fix _build_llm_test_case lambda signatures * Function takes only 1 argument (invocation), not 2 - Fix test assertions for bias metric labels * 'Not Biased' for success=True (not 'pass') * 'Biased' for success=False (not 'fail') - Fix default metrics test expectation * Remove 'faithfulness' from expected defaults * Actual defaults: bias, toxicity, answer_relevancy, hallucination, sentiment All deepeval tests now pass (15 passed, 2 warnings)
- Fixed LangchainCallbackHandler initialization to use telemetry_handler parameter - Fixed _resolve_agent_name to not return 'agent' tag as agent name - Added missing workflow methods to _StubTelemetryHandler (start_workflow, stop_workflow, fail_workflow, fail_by_run_id) - Filter gen_ai.tool.* metadata from ToolCall attributes (stored in dedicated fields) - Process invocation_params in on_chat_model_start: * Extract model_name from invocation_params with higher priority * Add invocation params with request_ prefix to attributes * Move ls_* metadata to langchain_legacy sub-dict * Set provider from ls_provider metadata * Add callback.name and callback.id from serialized data - Configure pytest-recording (VCR) for cassette playback - Fix vcr fixture scopes and cassette directory configuration All 7 callback handler agent tests now pass.
…er layers Root cause: The opentelemetry.instrumentation.utils.unwrap() function only unwraps ONE layer of wrapt wrappers. When tests call uninstrument() and then re-instrument(), the second wrapping creates nested wrappers. The unwrap only removed the outer layer, leaving the old wrapper still active. Fix: Modified _uninstrument() to unwrap ALL layers by looping while __wrapped__ exists, ensuring complete cleanup before re-instrumentation. Also added _callback_handler reference to instrumentor for test access. Result: ALL 9 langchain tests now passing (was 8 passed, 1 skipped)
- Extract response_model_name from generation.message.response_metadata - Fixes test_langchain_llm.py and test_langchain_llm_util.py - Ensures gen_ai.response.model attribute is set in spans
- CacheConfig is only available in deepeval >= 3.7.0 - Use try-except import to support older versions - Conditionally add cache_config to eval_kwargs - Fixes CI ImportError on older deepeval versions
- Update pyproject.toml: deepeval>=0.21.0 -> deepeval>=3.7.0 - Revert deepeval_runner.py to original simple implementation - CacheConfig is required in deepeval >= 3.7.0 - Removes compatibility code, cleaner solution
- Add CacheConfig class to stub modules in test files - Update evaluate() stub signature to accept cache_config parameter - Fixes CI import errors when using deepeval stubs - Both test_deepeval_evaluator.py and test_deepeval_sentiment_metric.py updated
- Add check in run_evaluation to detect if deepeval is patched to None - Raises ImportError when sys.modules['deepeval'] is None - Fixes test_dependency_missing which patches deepeval to None - Ensures proper error handling when dependency is missing Previously, CacheConfig wasn't in the stubs, so import would fail. Now that we added CacheConfig to stubs (for newer deepeval >= 3.7.0), we need runtime check to handle the dependency_missing test case.
|
- pyproject.toml: Format include list to single line - deepeval_runner.py: Add blank lines for PEP 8 compliance - test files: Format function parameters to multi-line style No functional changes, formatting only.
Resolved conflicts: - callback_handler.py: Keep blank line before attributes section - test_callback_handler_agent.py: Use new telemetry_handler parameter
- Remove unused imports (MagicMock, TracerProvider) - Sort imports in test_splunk_emitters.py - Remove trailing whitespace
| python -m pytest util/opentelemetry-util-genai-evals-deepeval/tests/ -v | ||
| - name: Run tests - opentelemetry-util-genai | ||
| run: | | ||
| python -m pytest util/opentelemetry-util-genai/tests/ -v --cov=opentelemetry.util.genai --cov-report=term-missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, why do we add these options for opentelemetry-util-genai testing only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing it separately ensures core functionality works independently before testing dependent packages
| # Check if deepeval module is actually available (not patched to None in tests) | ||
| import sys | ||
|
|
||
| if sys.modules.get("deepeval") is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not change the actual code to accommodate tests.
- CacheConfig parameter has default value in deepeval.evaluate() - No need to import or pass it explicitly - Works with all deepeval versions (>=0.21.0) - Simpler code without conditional imports
| "presence_penalty", | ||
| ): | ||
| if key in invocation_params: | ||
| attrs[f"request_{key}"] = invocation_params[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code changes the logic of the instrumentation. I am a bit worried that it will pollute the data types with extra parameters, like attributes invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reverted the changes in init.py, callback_handler and deepeval_runner.py
- Remove CacheConfig from deepeval_runner.py (use default parameter) - Remove sys.modules runtime check (no longer needed) - Remove CacheConfig stubs from test files - Remove cache_config parameter from stub evaluate() functions - Fix flaky timestamp test: use >= instead of > for end_time (Windows CI can have identical timestamps in same nanosecond)
Revert to main branch versions: - deepeval_runner.py (keep CacheConfig import/usage) - langchain/callback_handler.py - langchain/__init__.py These changes will be submitted in a separate MR. Test file improvements are kept in this branch.
The test for opentelemetry-util-genai-emitters-splunk fails with: ModuleNotFoundError: No module named 'opentelemetry.util.genai.emitters.spec'
This happens because emitters-splunk depends on opentelemetry-util-genai being installed first (which contains the emitters.spec module).
Reordered the test steps to install and test the base util-genai package before the packages that depend on it (emitters-splunk, evals, etc.).