-
Notifications
You must be signed in to change notification settings - Fork 12
An example of validating LLM responses through observability. #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces an example implementation for validating LLM responses through observability, leveraging OpenLIT and OpenTelemetry. Key changes include:
- Converting the faulthandler timeout configuration to an integer in pyproject.toml.
- Adding helper and test functions for integration tests that validate LLM responses.
- Introducing an OpenTelemetry Collector configuration with transformation rules and updating the README documentation.
Reviewed Changes
Copilot reviewed 7 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Update to set faulthandler_timeout as an integer rather than a string. |
| integrations/opentelemetry/tests/helpers.py | New helper functions for loading JSON fixtures for tests. |
| integrations/opentelemetry/tests/example_1_opentelemetry_integration/test_responses_available_in_opentelemetry.py | Added integration tests for validating LLM responses using Anthropic and OpenAI clients. |
| integrations/opentelemetry/src/config.yaml | New configuration for OpenTelemetry Collector with detailed developer validations. |
| integrations/opentelemetry/readme.md | Updated documentation to reflect new setup and execution instructions. |
| integrations/opentelemetry/conftest.py | Added test configuration to load environment variables and setup sys.path. |
Files not reviewed (3)
- integrations/opentelemetry/.env.example: Language not supported
- integrations/opentelemetry/tests/fixtures/example_output.json: Language not supported
- integrations/opentelemetry/tests/fixtures/output_schema.json: Language not supported
Comments suppressed due to low confidence (1)
integrations/opentelemetry/src/config.yaml:35
- The repeated condition 'and cache["no_developer_name_is_hallucinated"] == true' in the validation logic may cause unintended behavior if a valid name is found later; please review the ordering and necessity of this condition.
- set(cache["no_developer_name_is_hallucinated"], true) where IsMatch(cache["1_name"], "Sam\sThomas|Drew\sAnderson|Alex\sWilson|Alex\sJohnson") and cache["1_name"] != nil and cache["no_developer_name_is_hallucinated"] == true
73a60cf to
105b1ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR provides an example demonstrating how to validate LLM responses via observability using OpenLIT and OpenTelemetry. The changes include configuration and test updates to auto instrument LLM API calls, integrate JSON fixtures for tests, and add documentation on running the example.
- Updated pyproject.toml to use a numeric value for the faulthandler_timeout configuration
- Added test helpers and two integration tests for Anthropics and OpenAI client responses
- Introduced an OpenTelemetry Collector configuration with trace transformation logic and updated documentation
Reviewed Changes
Copilot reviewed 7 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Changed faulthandler_timeout value type to integer |
| integrations/opentelemetry/tests/helpers.py | Added JSON fixture loader helper functions |
| integrations/opentelemetry/tests/example_1_opentelemetry_integration/test_responses_available_in_opentelemetry.py | Added integration tests for LLM response validation |
| integrations/opentelemetry/src/config.yaml | Added OpenTelemetry Collector configuration for trace processing and validations |
| integrations/opentelemetry/readme.md | Added documentation for setup and running the observability example |
| integrations/opentelemetry/conftest.py | Introduced conftest.py to load environment variables and amend sys.path |
Files not reviewed (3)
- integrations/opentelemetry/.env.example: Language not supported
- integrations/opentelemetry/tests/fixtures/example_output.json: Language not supported
- integrations/opentelemetry/tests/fixtures/output_schema.json: Language not supported
paulz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you!
105b1ed to
0b5480f
Compare
|
I'm going to close this PR so that I can stop seeing it in my open PR lists. I'll keep the branch around for posterity. The work is also available in https://github.com/thisisartium/continuous-alignment-in-observability. |
Important
A new repo was created for this example since there is no actual sharing of code between it and this repo. Checkout Continuous Alignment in Observability
Purpose
Overview
Running the example
Setup (if needed)
Install package manager
brew install uvInstall dependencies
Setup environment
Running OpenTelemetry Collector
docker run -p 4317:4317 -p 4318:4318 -v $(pwd)/integrations/opentelemetry/src/config.yaml:/etc/otelcol/config.yaml otel/opentelemetry-collector-contrib:latest --config /etc/otelcol/config.yamlExecuting LLM calls using a test
See the results
validations: Map({"correct_developer_suggested":true,"no_developer_name_is_hallucinated":true,"not_empty_response":true})