Skip to content

Conversation

@shuningc
Copy link
Contributor

@shuningc shuningc commented Nov 5, 2025

The test for opentelemetry-util-genai-emitters-splunk fails with: ModuleNotFoundError: No module named 'opentelemetry.util.genai.emitters.spec'

This happens because emitters-splunk depends on opentelemetry-util-genai being installed first (which contains the emitters.spec module).

Reordered the test steps to install and test the base util-genai package before the packages that depend on it (emitters-splunk, evals, etc.).

@shuningc shuningc requested review from a team as code owners November 5, 2025 00:22
The test for opentelemetry-util-genai-emitters-splunk fails with:
ModuleNotFoundError: No module named 'opentelemetry.util.genai.emitters.spec'

This happens because emitters-splunk depends on opentelemetry-util-genai
being installed first (which contains the emitters.spec module).

Reordered the test steps to install and test the base util-genai package
before the packages that depend on it (emitters-splunk, evals, etc.).

(cherry picked from commit e8bfae2)
Some tests have cross-dependencies between packages:
- util-genai tests import from util-genai-evals
- emitters-splunk tests import from util-genai

Changed strategy to install all packages first (with --no-deps),
then run each package's tests separately. This ensures all
inter-package dependencies are available during testing.

(cherry picked from commit c8a3bdd)
Changed all 'logger' references to '_LOGGER' to match the module's
logger variable name (line 239 and 245). This fixes the NameError
that was occurring during test execution.

(cherry picked from commit d5b7953)
Tests were trying to mock 'handler._load_completion_callbacks' but
the function is actually 'utils.load_completion_callbacks' (imported
from utils module). Updated the mock patch paths to point to the
correct location.

(cherry picked from commit ccc7665)
…ctories, update tests to match current implementation

(cherry picked from commit a1910cf)
…vals

- Fix 4 instances of handler._load_completion_callbacks -> utils.load_completion_callbacks in test_evaluators.py
- Fix test_evaluation_dynamic_aggregation.py to set _aggregate_results to None instead of False to enable dynamic environment variable reading as per actual implementation

(cherry picked from commit a20742f)
The test for opentelemetry-util-genai-emitters-splunk fails with:
ModuleNotFoundError: No module named 'opentelemetry.util.genai.emitters.spec'

This happens because emitters-splunk depends on opentelemetry-util-genai
being installed first (which contains the emitters.spec module).

Reordered the test steps to install and test the base util-genai package
before the packages that depend on it (emitters-splunk, evals, etc.).

(cherry picked from commit e8bfae2)
Some tests have cross-dependencies between packages:
- util-genai tests import from util-genai-evals
- emitters-splunk tests import from util-genai

Changed strategy to install all packages first (with --no-deps),
then run each package's tests separately. This ensures all
inter-package dependencies are available during testing.

(cherry picked from commit c8a3bdd)
- Add [test] extra to pyproject.toml with langchain-core, langchain-openai, pytest-recording, vcrpy, pyyaml, flaky
- Update CI workflow to install [instruments,test] dependencies
- Fixes CI failure: ModuleNotFoundError: No module named 'langchain_openai'
Fixes ImportError: cannot import name 'call_runtest_hook' from 'flaky.flaky_pytest_plugin'

The flaky 3.7.0 version is incompatible with pytest 7.4.4 used in CI.
Upgrading to flaky>=3.8.1 resolves the compatibility issue.
- Fix incorrect patch targets for _instantiate_metrics and _run_deepeval
  * These are module-level functions, not class methods
  * Change from patch.object(class, method) to patch(module.function)
  * Update function signatures to match actual implementations

- Fix _build_llm_test_case lambda signatures
  * Function takes only 1 argument (invocation), not 2

- Fix test assertions for bias metric labels
  * 'Not Biased' for success=True (not 'pass')
  * 'Biased' for success=False (not 'fail')

- Fix default metrics test expectation
  * Remove 'faithfulness' from expected defaults
  * Actual defaults: bias, toxicity, answer_relevancy, hallucination, sentiment

All deepeval tests now pass (15 passed, 2 warnings)
- Fixed LangchainCallbackHandler initialization to use telemetry_handler parameter
- Fixed _resolve_agent_name to not return 'agent' tag as agent name
- Added missing workflow methods to _StubTelemetryHandler (start_workflow, stop_workflow, fail_workflow, fail_by_run_id)
- Filter gen_ai.tool.* metadata from ToolCall attributes (stored in dedicated fields)
- Process invocation_params in on_chat_model_start:
  * Extract model_name from invocation_params with higher priority
  * Add invocation params with request_ prefix to attributes
  * Move ls_* metadata to langchain_legacy sub-dict
  * Set provider from ls_provider metadata
  * Add callback.name and callback.id from serialized data
- Configure pytest-recording (VCR) for cassette playback
- Fix vcr fixture scopes and cassette directory configuration

All 7 callback handler agent tests now pass.
…er layers

Root cause: The opentelemetry.instrumentation.utils.unwrap() function only
unwraps ONE layer of wrapt wrappers. When tests call uninstrument() and then
re-instrument(), the second wrapping creates nested wrappers. The unwrap only
removed the outer layer, leaving the old wrapper still active.

Fix: Modified _uninstrument() to unwrap ALL layers by looping while
__wrapped__ exists, ensuring complete cleanup before re-instrumentation.

Also added _callback_handler reference to instrumentor for test access.

Result: ALL 9 langchain tests now passing (was 8 passed, 1 skipped)
- Extract response_model_name from generation.message.response_metadata
- Fixes test_langchain_llm.py and test_langchain_llm_util.py
- Ensures gen_ai.response.model attribute is set in spans
- CacheConfig is only available in deepeval >= 3.7.0
- Use try-except import to support older versions
- Conditionally add cache_config to eval_kwargs
- Fixes CI ImportError on older deepeval versions
- Update pyproject.toml: deepeval>=0.21.0 -> deepeval>=3.7.0
- Revert deepeval_runner.py to original simple implementation
- CacheConfig is required in deepeval >= 3.7.0
- Removes compatibility code, cleaner solution
- Add CacheConfig class to stub modules in test files
- Update evaluate() stub signature to accept cache_config parameter
- Fixes CI import errors when using deepeval stubs
- Both test_deepeval_evaluator.py and test_deepeval_sentiment_metric.py updated
- Add check in run_evaluation to detect if deepeval is patched to None
- Raises ImportError when sys.modules['deepeval'] is None
- Fixes test_dependency_missing which patches deepeval to None
- Ensures proper error handling when dependency is missing

Previously, CacheConfig wasn't in the stubs, so import would fail.
Now that we added CacheConfig to stubs (for newer deepeval >= 3.7.0),
we need runtime check to handle the dependency_missing test case.
@shuningc
Copy link
Contributor Author

shuningc commented Nov 7, 2025

  1. callback_handler.py
    Response Model Extraction: Added logic to extract model_name from LangChain's generation.message.response_metadata and set it as inv.response_model_name. This ensures the gen_ai.response.model span attribute is properly populated.
    Invocation Parameters Processing: Enhanced on_chat_model_start to extract model name from invocation_params, add request parameters with request_ prefix to attributes, organize LangSmith metadata into langchain_legacy sub-dict, and capture callback metadata.
    Tool Metadata Filtering: Filter out gen_ai.tool.* metadata from ToolCall attributes since these are stored in dedicated fields.
    Merge Conflict Cleanup: Resolved leftover merge conflict markers from previous cherry-picks.
  2. init.py (LangChainInstrumentor)
    Test Isolation Fix: Added unwrap_all() function to remove ALL layers of wrapt wrappers, not just the outermost one. The default unwrap() only removes one layer, causing nested wrappers to accumulate during test re-instrumentation cycles.
    Complete Cleanup: Modified _uninstrument() to loop while wrapped exists, ensuring complete cleanup before re-instrumentation.
    Test Access: Added _callback_handler reference to instrumentor for test verification.
  3. deepeval_runner.py
    Version Upgrade: Simplified implementation by requiring deepeval >= 3.7.0, removing backward compatibility code for older versions.
    Runtime Dependency Check: Added check at function start to detect when sys.modules['deepeval'] is None (patched in tests) and raise ImportError. This ensures the test_dependency_missing test properly validates error handling.
    Clean Implementation: Reverted to original straightforward code without conditional import logic.
  4. pyproject.toml
    Dependency Update: Upgraded minimum deepeval version from >=0.21.0 to >=3.7.0 to ensure CacheConfig is always available.
  5. Test Stubs (test_deepeval_evaluator.py, test_deepeval_sentiment_metric.py)
    CacheConfig Support: Added CacheConfig class to test stub modules to match the API of deepeval >= 3.7.0.
    Signature Update: Updated stub evaluate() function to accept cache_config parameter.

- pyproject.toml: Format include list to single line
- deepeval_runner.py: Add blank lines for PEP 8 compliance
- test files: Format function parameters to multi-line style

No functional changes, formatting only.
Resolved conflicts:
- callback_handler.py: Keep blank line before attributes section
- test_callback_handler_agent.py: Use new telemetry_handler parameter
- Remove unused imports (MagicMock, TracerProvider)
- Sort imports in test_splunk_emitters.py
- Remove trailing whitespace
python -m pytest util/opentelemetry-util-genai-evals-deepeval/tests/ -v
- name: Run tests - opentelemetry-util-genai
run: |
python -m pytest util/opentelemetry-util-genai/tests/ -v --cov=opentelemetry.util.genai --cov-report=term-missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, why do we add these options for opentelemetry-util-genai testing only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing it separately ensures core functionality works independently before testing dependent packages

# Check if deepeval module is actually available (not patched to None in tests)
import sys

if sys.modules.get("deepeval") is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not change the actual code to accommodate tests.

- CacheConfig parameter has default value in deepeval.evaluate()
- No need to import or pass it explicitly
- Works with all deepeval versions (>=0.21.0)
- Simpler code without conditional imports
"presence_penalty",
):
if key in invocation_params:
attrs[f"request_{key}"] = invocation_params[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code changes the logic of the instrumentation. I am a bit worried that it will pollute the data types with extra parameters, like attributes invocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reverted the changes in init.py, callback_handler and deepeval_runner.py

shuningc and others added 4 commits November 7, 2025 13:31
- Remove CacheConfig from deepeval_runner.py (use default parameter)
- Remove sys.modules runtime check (no longer needed)
- Remove CacheConfig stubs from test files
- Remove cache_config parameter from stub evaluate() functions
- Fix flaky timestamp test: use >= instead of > for end_time
  (Windows CI can have identical timestamps in same nanosecond)
Revert to main branch versions:
- deepeval_runner.py (keep CacheConfig import/usage)
- langchain/callback_handler.py
- langchain/__init__.py

These changes will be submitted in a separate MR.
Test file improvements are kept in this branch.
@zhirafovod zhirafovod merged commit 5edff3d into main Nov 10, 2025
1 of 14 checks passed
@zhirafovod zhirafovod deleted the unitTestFix branch November 10, 2025 07:19
@github-actions github-actions bot locked and limited conversation to collaborators Nov 10, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants