vitest evals by cpinn · Pull Request #1232 · braintrustdata/braintrust-sdk-javascript

cpinn · 2026-01-06T21:39:57Z

Add ability to create experiments writing tests with vitest

Use existing braintrust datasets or pass their own data to test
Use existing scorers or pass their own custom scorer to tests

Each describe will create an experiment and the tests inside will each be their own span in the experiment.

See golden-ts-vitest-experiment-v* projects for some examples

Example:

import { initDataset } from "braintrust";

bt.describe("My experiment", () => {
  const evalData = initDataset({
    project: "llm-evals",
    dataset: "qa-benchmark",
  }).fetchedData();
 // test will also accept the dataset array and it will be expanded or pass the data individually
  bt.test.each(await evalData)(
    "Q&A evaluation",
    {
      scorers: [
        Factuality, // from autoevals
        ({ output, expected }) => ({ // custom scorer
          name: "conciseness",
          score: output.length <= expected.length * 1.2 ? 1 : 0.7,
        }),
      ],
    },
    async (record) => {
      const { input, expected } = record;
      const answer = await llm.answer(input.question);
      return answer; 
    },
  );
});

ibolmo

various questions

ibolmo · 2026-01-08T22:24:33Z

internal/golden/vitest-v2/vitest-experiment.eval.ts

+
+// indicate the project name the tests will be sent to
+const bt = wrapVitest(
+  { test, expect, describe, afterAll },


I'd recommend for users to:

import * as vitest from 'vitest'; const { describe, expect, test, ... } = wrapVitest(vitest);

simpler, less prone to error, and future proof if we add more functions to our support

ibolmo · 2026-01-08T22:36:29Z

internal/golden/vitest-v2/vitest-experiment.eval.ts

+      metadata: { category: "math" },
+      tags: ["arithmetic"],
+    },
+    async ({ input, expected }) => {


til.. so the context is re-inserted as an argument

js/src/wrappers/vitest/index.ts

ibolmo · 2026-01-08T22:44:50Z

js/src/wrappers/vitest/vitest.config.js

@@ -0,0 +1,46 @@
+import tsconfigPaths from "vite-tsconfig-paths";


didn't expect to see this file in here 🤔

ibolmo · 2026-01-09T00:48:21Z

js/src/wrappers/vitest/wrapper.ts

+    return originalDescribe(suiteName, () => {
+      // Lazily initialize experiment context on first access
+      let context: ExperimentContext | null = null;
+      const getOrCreateContext = (): ExperimentContext => {


like we want to extract and use the same getOrCreate inside of test() calls

ibolmo · 2026-01-09T00:51:16Z

js/src/wrappers/vitest/index.ts

+      "Braintrust: vitestMethods.describe is required. Please pass in the describe function from vitest.",
+    );
+  }
+  if (!vitestMethods.expect) {


i didn't see a wrapExpect in wrapper.ts. I wonder if each expect is a scorer?

ibolmo · 2026-01-09T00:52:46Z

js/src/wrappers/vitest/wrapper.ts

+            });
+
+            // If test function returns a value, log it as output
+            if (testResult !== undefined) {


i think if you traced(maybeFn || configOrFn, ...) you may have gotten this automatically?

ibolmo · 2026-01-09T00:58:40Z

js/src/wrappers/vitest/wrapper.ts

+              scores: {
+                pass: 0,
+              },
+              metadata: {


you should probably just throw again. the traced() call should handle the error.

ibolmo · 2026-01-09T01:01:25Z

js/src/wrappers/vitest/wrapper.ts

+  datasetExamples: Map<string, string>; // test name -> example id
+}
+
+// Global context holder (one per describe block)


what happens with concurrent calls i.e.

it.concurrent( describe(..., () => { }) ); it.concurrent( describe(..., () => { }) );

did you give currentExperiment a try?

I modified how the context is created for experiments. I added some additional tests for the concurrent experiments.

remove dataset creation Add tests for concurrent tests

ibolmo

small suggestions not blocking

ibolmo · 2026-02-26T05:19:32Z

internal/golden/vitest-v2/vitest-experiment.eval.ts

+    async ({ input, expected }) => {
+      const result = 4;
+      logOutputs({ answer: result });
+      expect(result).toBe(expected);


are the expects scorers?

ah yes, could we try multiple expect() calls? I wonder how that would look in the bt logs.

what does the custom message look like in braintrust? i.e.

expect(result, 'equality').toBe(expected)

ibolmo · 2026-02-26T16:35:13Z

js/src/wrappers/vitest/index.ts

+  return {
+    test: wrappedTest,
+    it: wrappedTest,
+    expect: vitestMethods.expect,


ah so we don't wrap expects.

like i mentioned earlier, it would be interesting to use the expect(..., 'message')... and the message is the index for the output, or maybe we have a counter/stack that we push expect(output) to then the output might be something like

expect(0).toBe(0); expect('something', 'message').toBe('something'); expect('foo').toBe('bar');

then the event could be

{ ... output: { 0: output, 'message': output, 1: output, }, scores: { 0: 1, 'message': 1, 1: 0 } } }

ah yes I didn't wrap the expect method just the test methods. I was thinking people cared about getting the scoring which does add its output to individual logs. Thinking about this flow a bit.

did a compromise and logged named outputs automatically. Users can still log additional outputs as necessary. Will see what the feedback is like.

ibolmo · 2026-02-26T16:35:43Z

internal/golden/README.md

 pnpm dlx tsx openai.ts
 ```

+### Vitest Golden Tests


great coverage

ibolmo · 2026-02-26T16:39:03Z

js/src/wrappers/vitest/index.ts

+ *       expected: 'hola',
+ *       metadata: { language: 'spanish' },
+ *     },
+ *     async ({ input, expected }) => {


i wonder if we need the trace() method 🤔

ibolmo · 2026-02-26T16:39:27Z

js/src/wrappers/vitest/index.ts

+ *
+ * bt.describe('Translation Tests', () => {
+ *   bt.afterAll(async () => {
+ *     await bt.flushExperiment(); // Flushes and displays experiment summary


i'd expect this to be done automatically 🤔. looking at their docs it doesn't seem like they require them to be explicit

This is done automatically. The method is exposed for access but doesn't need to be called. I thought I removed it from the examples and tests, might have missed this one.

…utput

Includes - #1392 - #1302 - #1232

cpinn added 3 commits January 6, 2026 11:58

POC wrapping vitest

2da18df

small cleanup

eb7b1b1

update tests

d1b1631

ibolmo reviewed Jan 9, 2026

View reviewed changes

cpinn added 5 commits January 9, 2026 17:24

Merge branch 'main' into caitlin/vitest

c9b5d41

modify how experiments are held in the context

4300215

remove dataset creation Add tests for concurrent tests

restructure wrapping

d4a2c41

more evals with dataset and scorers

b4b78ff

Merge branch 'main' into caitlin/vitest

b2d3406

cpinn force-pushed the caitlin/vitest branch from 3300ebb to b2d3406 Compare February 11, 2026 22:48

cpinn marked this pull request as ready for review February 11, 2026 22:48

cpinn requested review from ankrgyl and clutchski as code owners February 11, 2026 22:48

cpinn added 4 commits February 11, 2026 14:49

update lockfile

7838e51

remove some files from the merge

6e879cd

fix build

032eb00

fix failing tests

ce961db

cpinn changed the title ~~[WIP] vitest evals~~ vitest evals Feb 18, 2026

cpinn added 7 commits February 17, 2026 17:32

update tests

c246ef2

temp commit

64a8995

Merge branch 'main' into caitlin/vitest

1a96b83

update readme and tests

ef987c3

fix test errors

57f9817

fix golden tests

002db7c

fix makefile to run all tests

a4057d0

cpinn requested a review from ibolmo February 20, 2026 19:29

Qard approved these changes Feb 23, 2026

View reviewed changes

ibolmo reviewed Feb 26, 2026

View reviewed changes

cpinn added 2 commits February 26, 2026 12:09

Update to ensure named expects are logged automatically in the span o…

356feb5

…utput

Merge branch 'main' into caitlin/vitest

be8fb4c

fix type errors

bfe8287

cpinn force-pushed the caitlin/vitest branch from 69f9e7a to bfe8287 Compare February 26, 2026 21:14

ibolmo approved these changes Feb 26, 2026

View reviewed changes

cpinn merged commit 2111f87 into main Feb 26, 2026
43 of 44 checks passed

cpinn deleted the caitlin/vitest branch February 26, 2026 22:34

AbhiPrasad mentioned this pull request Feb 27, 2026

chore: Bump SDK version for 3.2.0 #1416

Merged

AbhiPrasad added a commit that referenced this pull request Feb 27, 2026

chore: Bump SDK version for 3.2.0 (#1416)

cea268b

Includes - #1392 - #1302 - #1232

		@@ -0,0 +1,46 @@
		import tsconfigPaths from "vite-tsconfig-paths";

Conversation

cpinn commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add ability to create experiments writing tests with vitest

Uh oh!

ibolmo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ibolmo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpinn Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cpinn commented Jan 6, 2026 •

edited

Loading

cpinn Feb 26, 2026 •

edited

Loading