feat(evals): enable Azure OpenAI executor for CI content assertions by Kunall7890 · Pull Request #211 · AzureCosmosDB/cosmosdb-agent-kit

Kunall7890 · 2026-06-19T05:32:55Z

Closes #186

What this does

Adds .github/workflows/eval-ci.yaml — new CI job that runs Waza evals
using copilot-sdk executor routed through Azure OpenAI (bypasses the
GITHUB_TOKEN S2S rejection issue entirely)
Updates eval.yaml — adds text grader for domain-term assertions +
sets trials_per_task: 3 to handle non-determinism; mock executor
preserved as default for local dev
Adds 3 content-assertion tasks covering partition key guidance, SDK
singleton pattern, and a CI-enforcement baseline that proves CI actually
blocks incorrect guidance

Acceptance criteria met

Real model called in CI via Azure OpenAI endpoint
Domain-specific term grader checks response content
CI fails when incorrect guidance is produced (baseline task)
trials_per_task: 3 handles non-deterministic responses
No secrets hardcoded — uses repo secrets

Maintainer action required

Add 3 repo secrets before merging:

AZURE_OPENAI_ENDPOINT
AZURE_OPENAI_API_KEY
AZURE_OPENAI_MODEL

Adds a new eval-ci.yaml workflow that runs Waza evals against a real Azure OpenAI endpoint, bypassing the GITHUB_TOKEN S2S rejection that blocks the GitHub Copilot models API. Updates eval.yaml with trials_per_task: 3 for non-determinism handling and a text grader for domain-term assertions. Adds three content-assertion tasks covering partition key guidance, SDK singleton pattern, and a CI-enforcement baseline. Closes AzureCosmosDB#186 Signed-off-by: Kunal Jaiswal <140198382+Kunall7890@users.noreply.github.com>

Copilot

⚠️ Not ready to approve

Multiple added “text” graders use unsupported Waza config keys (match_any/match_all), which will likely break or nullify the intended CI assertions.

Pull request overview

This PR adds a GitHub Actions workflow to run the cosmosdb-best-practices Waza eval suite against a real model via Azure OpenAI, and extends the eval suite with content-assertion tasks and a domain-terms text grader to enforce response quality in CI.

Changes:

Add .github/workflows/eval-ci.yaml to run Waza evals in CI using copilot-sdk routed through Azure OpenAI.
Update evals/cosmosdb-best-practices/eval.yaml to increase trials_per_task to 3 and add a domain-term text grader.
Add three new eval tasks that assert specific Cosmos DB guidance content.

File summaries

File	Description
`.github/workflows/eval-ci.yaml`	New CI workflow to run Waza evals with Azure OpenAI + upload results artifact.
`evals/cosmosdb-best-practices/eval.yaml`	Increase trials to reduce flakiness; add a global domain-term text grader.
`evals/cosmosdb-best-practices/tasks/sdk-singleton-content.yaml`	New task to assert singleton/reuse guidance for CosmosClient.
`evals/cosmosdb-best-practices/tasks/partition-key-content.yaml`	New task to assert actionable partition key guidance (incl. cardinality).
`evals/cosmosdb-best-practices/tasks/incorrect-guidance-baseline.yaml`	New baseline task intended to prove CI blocks incorrect guidance.

Copilot's findings

Files reviewed: 5/5 changed files
Comments generated: 8

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

+    config:
+      match_any:
+        - "singleton"
+        - "single instance"
+        - "reuse"
+        - "module-level"
+        - "global"
+  - type: text


+    config:
+      match_any:
+        - "overhead"
+        - "connection pool"
+        - "performance"
+        - "per request"
+        - "each request"


+    config:
+      match_all:
+        - "partition key"
+        - "tenantId"
+  - type: text


+    config:
+      match_any:
+        - "cardinality"
+        - "high cardinality"
+        - "evenly"
+        - "distribution"
+        - "hot partition"


+    config:
+      match_any:
+        - "avoid"
+        - "not recommended"
+        - "performance"
+        - "cost"
+        - "RU"
+        - "point read"
+        - "when possible"


+      - name: Install Waza
+        run: curl -fsSL https://raw.githubusercontent.com/microsoft/waza/main/install.sh | bash
+


+  eval:
+    name: Run evals with real model
+    runs-on: ubuntu-latest
+


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Kunal jaiswal <140198382+Kunall7890@users.noreply.github.com>

Kunall7890 · 2026-06-19T05:40:16Z

@sajeetharan addressed all Copilot review comments — replaced unsupported match_any/match_all with regex_match/contains per Waza's text grader schema, pinned Waza installer to v0.37.0 for reproducibility, and added fork PR guard to prevent secret-missing failures. Ready for your review!

jaydestro · 2026-06-23T15:48:53Z

@Kunall7890 there are some ongoing changes being evaluated to the structure that could require this to be modified. you'll definetely get notice when it's time to make any changes to avoid merge conflicts.

Copilot AI review requested due to automatic review settings June 19, 2026 05:32

Kunall7890 requested review from TheovanKraay, jaydestro and sajeetharan as code owners June 19, 2026 05:32

Copilot started reviewing on behalf of Kunall7890 June 19, 2026 05:33 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Potential fix for pull request finding

d8b7655

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Kunal jaiswal <140198382+Kunall7890@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(evals): enable Azure OpenAI executor for CI content assertions#211

feat(evals): enable Azure OpenAI executor for CI content assertions#211
Kunall7890 wants to merge 2 commits into
AzureCosmosDB:mainfrom
Kunall7890:feat/azure-openai-ci-executor

Kunall7890 commented Jun 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Kunall7890 commented Jun 19, 2026

Uh oh!

jaydestro commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		- name: Install Waza
		run: curl -fsSL https://raw.githubusercontent.com/microsoft/waza/main/install.sh \| bash

Uh oh!

Conversation

Kunall7890 commented Jun 19, 2026

What this does

Acceptance criteria met

Maintainer action required

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

⚠️ Not ready to approve

Copilot's findings

Uh oh!

Uh oh!

Kunall7890 commented Jun 19, 2026

Uh oh!

jaydestro commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants