An example of validating LLM responses through observability. #68

tkersey · 2025-03-29T17:06:09Z

Important

A new repo was created for this example since there is no actual sharing of code between it and this repo. Checkout Continuous Alignment in Observability

Purpose

To provide an example of validating LLM responses through observability. This example does not actually use any of the CAT python library functionality. It's more of an example of how one might do continuous alignment through observability. Perhaps we could call that CAO, Continuous Alignment in Observability.

Overview

This example uses OpenLIT to auto instrument calls to an LLMs api, providing OpenTelemetry-native observability. OpenTelemetry Collector is used to process the traces and attach validations on the fly to the traces, ready for downstream visualization.

This is a simple, hardcoded example to prove out the possiblity. There is a number of places for automation and further development that might be quite time consuming.

Running the example

Setup (if needed)

Clone this repo locally

git clone https://github.com/thisisartium/continuous-alignment-testing

Install dependencies

Install package manager

install uv - Python package manager
- brew install uv

Install dependencies

uv pip install openlit
uv sync

Setup environment

populate your new .env file with required values

cp .env.example .env

Setup environment

Running OpenTelemetry Collector

Run the following command

docker run -p 4317:4317 -p 4318:4318 -v $(pwd)/integrations/opentelemetry/src/config.yaml:/etc/otelcol/config.yaml otel/opentelemetry-collector-contrib:latest --config /etc/otelcol/config.yaml

Executing LLM calls using a test

Run one of the tests found in `/integrations/opentelementry/tests/test_responses_available_in_opentelemetry.py

See the results

Look at the logs from the OpenTelemetry Collector. At the end you'll see a line similar to the following.

validations: Map({"correct_developer_suggested":true,"no_developer_name_is_hallucinated":true,"not_empty_response":true})

Copilot

Pull Request Overview

This pull request introduces an example implementation for validating LLM responses through observability, leveraging OpenLIT and OpenTelemetry. Key changes include:

Converting the faulthandler timeout configuration to an integer in pyproject.toml.
Adding helper and test functions for integration tests that validate LLM responses.
Introducing an OpenTelemetry Collector configuration with transformation rules and updating the README documentation.

Reviewed Changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pyproject.toml	Update to set faulthandler_timeout as an integer rather than a string.
integrations/opentelemetry/tests/helpers.py	New helper functions for loading JSON fixtures for tests.
integrations/opentelemetry/tests/example_1_opentelemetry_integration/test_responses_available_in_opentelemetry.py	Added integration tests for validating LLM responses using Anthropic and OpenAI clients.
integrations/opentelemetry/src/config.yaml	New configuration for OpenTelemetry Collector with detailed developer validations.
integrations/opentelemetry/readme.md	Updated documentation to reflect new setup and execution instructions.
integrations/opentelemetry/conftest.py	Added test configuration to load environment variables and setup sys.path.

Files not reviewed (3)

integrations/opentelemetry/.env.example: Language not supported
integrations/opentelemetry/tests/fixtures/example_output.json: Language not supported
integrations/opentelemetry/tests/fixtures/output_schema.json: Language not supported

Comments suppressed due to low confidence (1)

integrations/opentelemetry/src/config.yaml:35

The repeated condition 'and cache["no_developer_name_is_hallucinated"] == true' in the validation logic may cause unintended behavior if a valid name is found later; please review the ordering and necessity of this condition.

- set(cache["no_developer_name_is_hallucinated"], true) where IsMatch(cache["1_name"], "Sam\sThomas|Drew\sAnderson|Alex\sWilson|Alex\sJohnson") and cache["1_name"] != nil and cache["no_developer_name_is_hallucinated"] == true

integrations/opentelemetry/readme.md

Copilot

Pull Request Overview

This PR provides an example demonstrating how to validate LLM responses via observability using OpenLIT and OpenTelemetry. The changes include configuration and test updates to auto instrument LLM API calls, integrate JSON fixtures for tests, and add documentation on running the example.

Updated pyproject.toml to use a numeric value for the faulthandler_timeout configuration
Added test helpers and two integration tests for Anthropics and OpenAI client responses
Introduced an OpenTelemetry Collector configuration with trace transformation logic and updated documentation

Reviewed Changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
pyproject.toml	Changed faulthandler_timeout value type to integer
integrations/opentelemetry/tests/helpers.py	Added JSON fixture loader helper functions
integrations/opentelemetry/tests/example_1_opentelemetry_integration/test_responses_available_in_opentelemetry.py	Added integration tests for LLM response validation
integrations/opentelemetry/src/config.yaml	Added OpenTelemetry Collector configuration for trace processing and validations
integrations/opentelemetry/readme.md	Added documentation for setup and running the observability example
integrations/opentelemetry/conftest.py	Introduced conftest.py to load environment variables and amend sys.path

Files not reviewed (3)

integrations/opentelemetry/.env.example: Language not supported
integrations/opentelemetry/tests/fixtures/example_output.json: Language not supported
integrations/opentelemetry/tests/fixtures/output_schema.json: Language not supported

integrations/opentelemetry/src/config.yaml

paulz

Looks great, thank you!

tkersey · 2025-04-11T13:59:49Z

I'm going to close this PR so that I can stop seeing it in my open PR lists. I'll keep the branch around for posterity. The work is also available in https://github.com/thisisartium/continuous-alignment-in-observability.

tkersey requested review from Copilot and jwilger March 29, 2025 17:06

Copilot AI reviewed Mar 29, 2025

View reviewed changes

integrations/opentelemetry/readme.md Outdated Show resolved Hide resolved

tkersey force-pushed the explore-opentelemetry-integration branch from 73a60cf to 105b1ed Compare March 29, 2025 17:07

mikegehard mentioned this pull request Mar 29, 2025

Check this out mikegehard/ai-assisted-agile-development#136

Open

tkersey requested a review from Copilot March 31, 2025 21:56

Copilot AI reviewed Mar 31, 2025

View reviewed changes

integrations/opentelemetry/src/config.yaml Show resolved Hide resolved

integrations/opentelemetry/src/config.yaml Show resolved Hide resolved

paulz approved these changes Apr 4, 2025

View reviewed changes

An example of validating LLM responses through observability.

0b5480f

tkersey force-pushed the explore-opentelemetry-integration branch from 105b1ed to 0b5480f Compare April 7, 2025 20:46

tkersey closed this Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

An example of validating LLM responses through observability. #68

An example of validating LLM responses through observability. #68

Uh oh!

tkersey commented Mar 29, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

paulz left a comment

Uh oh!

tkersey commented Apr 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

An example of validating LLM responses through observability. #68

An example of validating LLM responses through observability. #68

Uh oh!

Conversation

tkersey commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A new repo was created for this example since there is no actual sharing of code between it and this repo. Checkout Continuous Alignment in Observability

Purpose

Overview

Running the example

Setup (if needed)

Install package manager

Install dependencies

Setup environment

Running OpenTelemetry Collector

Executing LLM calls using a test

See the results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

paulz left a comment

Choose a reason for hiding this comment

Uh oh!

tkersey commented Apr 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tkersey commented Mar 29, 2025 •

edited

Loading