Skip to content

Conversation

@tkersey
Copy link
Contributor

@tkersey tkersey commented Mar 29, 2025

Important

A new repo was created for this example since there is no actual sharing of code between it and this repo. Checkout Continuous Alignment in Observability

Purpose

To provide an example of validating LLM responses through observability. This example does not actually use any of the CAT python library functionality. It's more of an example of how one might do continuous alignment through observability. Perhaps we could call that CAO, Continuous Alignment in Observability.

Overview

This example uses OpenLIT to auto instrument calls to an LLMs api, providing OpenTelemetry-native observability. OpenTelemetry Collector is used to process the traces and attach validations on the fly to the traces, ready for downstream visualization.

This is a simple, hardcoded example to prove out the possiblity. There is a number of places for automation and further development that might be quite time consuming.

Running the example

Setup (if needed)

Clone this repo locally

git clone https://github.com/thisisartium/continuous-alignment-testing

Install dependencies

Install package manager

  • install uv - Python package manager
    • brew install uv

Install dependencies

uv pip install openlit
uv sync

Setup environment

populate your new .env file with required values

cp .env.example .env

Setup environment

Running OpenTelemetry Collector

Run the following command

docker run -p 4317:4317 -p 4318:4318 -v $(pwd)/integrations/opentelemetry/src/config.yaml:/etc/otelcol/config.yaml otel/opentelemetry-collector-contrib:latest --config /etc/otelcol/config.yaml

Executing LLM calls using a test

Run one of the tests found in `/integrations/opentelementry/tests/test_responses_available_in_opentelemetry.py

See the results

Look at the logs from the OpenTelemetry Collector. At the end you'll see a line similar to the following.

validations: Map({"correct_developer_suggested":true,"no_developer_name_is_hallucinated":true,"not_empty_response":true})

@tkersey tkersey requested review from Copilot and jwilger March 29, 2025 17:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces an example implementation for validating LLM responses through observability, leveraging OpenLIT and OpenTelemetry. Key changes include:

  • Converting the faulthandler timeout configuration to an integer in pyproject.toml.
  • Adding helper and test functions for integration tests that validate LLM responses.
  • Introducing an OpenTelemetry Collector configuration with transformation rules and updating the README documentation.

Reviewed Changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pyproject.toml Update to set faulthandler_timeout as an integer rather than a string.
integrations/opentelemetry/tests/helpers.py New helper functions for loading JSON fixtures for tests.
integrations/opentelemetry/tests/example_1_opentelemetry_integration/test_responses_available_in_opentelemetry.py Added integration tests for validating LLM responses using Anthropic and OpenAI clients.
integrations/opentelemetry/src/config.yaml New configuration for OpenTelemetry Collector with detailed developer validations.
integrations/opentelemetry/readme.md Updated documentation to reflect new setup and execution instructions.
integrations/opentelemetry/conftest.py Added test configuration to load environment variables and setup sys.path.
Files not reviewed (3)
  • integrations/opentelemetry/.env.example: Language not supported
  • integrations/opentelemetry/tests/fixtures/example_output.json: Language not supported
  • integrations/opentelemetry/tests/fixtures/output_schema.json: Language not supported
Comments suppressed due to low confidence (1)

integrations/opentelemetry/src/config.yaml:35

  • The repeated condition 'and cache["no_developer_name_is_hallucinated"] == true' in the validation logic may cause unintended behavior if a valid name is found later; please review the ordering and necessity of this condition.
- set(cache["no_developer_name_is_hallucinated"], true) where IsMatch(cache["1_name"], "Sam\sThomas|Drew\sAnderson|Alex\sWilson|Alex\sJohnson") and cache["1_name"] != nil and cache["no_developer_name_is_hallucinated"] == true

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR provides an example demonstrating how to validate LLM responses via observability using OpenLIT and OpenTelemetry. The changes include configuration and test updates to auto instrument LLM API calls, integrate JSON fixtures for tests, and add documentation on running the example.

  • Updated pyproject.toml to use a numeric value for the faulthandler_timeout configuration
  • Added test helpers and two integration tests for Anthropics and OpenAI client responses
  • Introduced an OpenTelemetry Collector configuration with trace transformation logic and updated documentation

Reviewed Changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyproject.toml Changed faulthandler_timeout value type to integer
integrations/opentelemetry/tests/helpers.py Added JSON fixture loader helper functions
integrations/opentelemetry/tests/example_1_opentelemetry_integration/test_responses_available_in_opentelemetry.py Added integration tests for LLM response validation
integrations/opentelemetry/src/config.yaml Added OpenTelemetry Collector configuration for trace processing and validations
integrations/opentelemetry/readme.md Added documentation for setup and running the observability example
integrations/opentelemetry/conftest.py Introduced conftest.py to load environment variables and amend sys.path
Files not reviewed (3)
  • integrations/opentelemetry/.env.example: Language not supported
  • integrations/opentelemetry/tests/fixtures/example_output.json: Language not supported
  • integrations/opentelemetry/tests/fixtures/output_schema.json: Language not supported

Copy link
Contributor

@paulz paulz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

@tkersey tkersey force-pushed the explore-opentelemetry-integration branch from 105b1ed to 0b5480f Compare April 7, 2025 20:46
@tkersey
Copy link
Contributor Author

tkersey commented Apr 11, 2025

I'm going to close this PR so that I can stop seeing it in my open PR lists. I'll keep the branch around for posterity. The work is also available in https://github.com/thisisartium/continuous-alignment-in-observability.

@tkersey tkersey closed this Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants