add package manager option #75

ibolmo · 2025-07-07T18:12:48Z

No description provided.

github-actions · 2025-07-07T18:13:00Z

Braintrust eval report

Say Hi Bot Python (add-package-manager-1751934197)

Score	Average	Improvements	Regressions
Levenshtein	77.8% (+0pp)	-	-
Start	1751934197.14s (+0s)	-	-
End	1751934197.15s (+0s)	-	-
Duration	0s (+0s)	1 🟢	1 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-07T18:13:01Z

Braintrust eval report

Say Hi Bot (add-package-manager-1751934202)

Score	Average	Improvements	Regressions
Levenshtein	85.3% (+1pp)	7 🟢	6 🔴
Start	1751934202.27s	-	-
End	1751934203.27s	-	-
Duration	1s (0s)	7 🟢	2 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-07T18:13:04Z

Braintrust eval report

Console logging (add-package-manager-1751934204)

Score	Average	Improvements	Regressions
Levenshtein	82.6% (+1pp)	5 🟢	4 🔴
Start	1751934204.32s	-	-
End	1751934204.33s	-	-
Duration	0s (+0s)	2 🟢	13 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

My Evaluation (add-package-manager-1751934204)

Score	Average	Improvements	Regressions
Exact match	100% (+0pp)	-	-
Start	1751934204.37s	-	-
End	1751934204.38s	-	-
Duration	0.02s (-0.51s)	1 🟢	-
Prompt_tokens	10tok (+0tok)	-	-
Completion_tokens	2tok (+0tok)	-	-
Total_tokens	12tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

Say Hi Bot (add-package-manager-1751934204)

Score	Average	Improvements	Regressions
Levenshtein	83.8% (-2pp)	4 🟢	7 🔴
Start	1751934204.36s	-	-
End	1751934205.36s	-	-
Duration	1s (0s)	19 🟢	-
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-07T18:13:10Z

Braintrust eval report

Say Hi Bot (add-package-manager-1751934209)

Score	Average	Improvements	Regressions
Levenshtein	86.5% (+3pp)	9 🟢	6 🔴
Start	1751934209.51s	-	-
End	1751934210.51s	-	-
Duration	1s (+0s)	-	19 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-07T18:13:37Z

Braintrust eval report

Say Hi Bot Python (add-package-manager-1751934196)

Score	Average	Improvements	Regressions
Levenshtein	77.8% (+0pp)	-	-
Start	1751934195.87s (+0s)	-	-
End	1751934195.88s (+0s)	-	-
Duration	0s (+0s)	1 🟢	1 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

eval/src/braintrust.ts

examples/python/uv.yml

eval/src/braintrust.ts

daviddkkim · 2025-07-07T19:47:47Z

action.yml

+      "The package manager to use for evals. Valid values: npm, pnpm, yarn, pip,
+      or uv depending on the runtime."
+    required: false
+    default: ""


Should we define a default? it looks like if package_manager is "", it goes into an empty case statement

yeah it's a bit odd, but it's for backward compatibility. I could try without the default '', but ultimately in the code it will fall back to '' due to the zod parsing

ahh gotchu! makes sense

daviddkkim · 2025-07-07T19:48:28Z

eval/src/braintrust.ts

+    switch (args.runtime.toLowerCase().trim()) {
+      case "node":
+        switch (args.package_manager) {
+          case "":


Wondering if default should be defined and this should be removed -- what happens in this empty case?

yeah for backward compatibility it'll be whatever we were doing before. so for node it'll be npx and for python it'll be pip. is that what you had in mind? right now it's just a switch case aliased to npm

yeet i forgot about the case statement behavior on no returns -- this makes sense

daviddkkim

Looks good -- just came across an error with yarn.
plz fix when you are ready!

github-actions · 2025-07-08T00:57:39Z

Braintrust eval report

Say Hi Bot Python (main-1751936261)

Score	Average	Improvements	Regressions
Levenshtein	77.8% (+0pp)	-	-
Start	1751936261.27s (+0s)	-	-
End	1751936261.27s (+0s)	-	-
Duration	0s (0s)	1 🟢	-
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T00:57:39Z

Braintrust eval report

Say Hi Bot Python (main-1751936260)

Score	Average	Improvements	Regressions
Levenshtein	77.8% (+0pp)	-	-
Start	1751936260.48s (+0s)	-	-
End	1751936260.49s (+0s)	-	-
Duration	0s (+0s)	-	2 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T00:57:43Z

Braintrust eval report

Console logging (main-1751936265)

Score	Average	Improvements	Regressions
Levenshtein	82.1% (-1pp)	3 🟢	4 🔴
Start	1751936265.13s	-	-
End	1751936265.14s	-	-
Duration	0s (0s)	4 🟢	-
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

My Evaluation (main-1751936265)

Score	Average	Improvements	Regressions
Exact match	100% (+0pp)	-	-
Start	1751936265.22s	-	-
End	1751936265.92s	-	-
Duration	0.7s (+0.69s)	-	1 🔴
Prompt_tokens	10tok (+0tok)	-	-
Completion_tokens	2tok (+0tok)	-	-
Total_tokens	12tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

Say Hi Bot (main-1751936265)

Score	Average	Improvements	Regressions
Levenshtein	82.1% (-4pp)	2 🟢	8 🔴
Start	1751936265.21s	-	-
End	1751936266.21s	-	-
Duration	1s (0s)	19 🟢	-
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T00:57:43Z

Braintrust eval report

Say Hi Bot (main-1751936265-955e18da)

Score	Average	Improvements	Regressions
Levenshtein	82.8% (+1pp)	8 🟢	6 🔴
Start	1751936265.37s	-	-
End	1751936266.37s	-	-
Duration	1s (+0s)	-	20 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T00:57:51Z

Braintrust eval report

Say Hi Bot (main-1751936273)

Score	Average	Improvements	Regressions
Levenshtein	81.4% (-1pp)	4 🟢	7 🔴
Start	1751936273.03s	-	-
End	1751936274.03s	-	-
Duration	1s (+0s)	1 🟢	11 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:00:48Z

Braintrust eval report

Say Hi Bot Python ([email protected])

Score	Average	Improvements	Regressions
Levenshtein	77.8% (+0pp)	-	-
Start	1751936449.88s (+0s)	-	-
End	1751936449.89s (+0s)	-	-
Duration	0s (0s)	2 🟢	-
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:00:48Z

Braintrust eval report

Say Hi Bot Python ([email protected])

Score	Average	Improvements	Regressions
Levenshtein	77.8% (+0pp)	-	-
Start	1751936450.34s (+0s)	-	-
End	1751936450.35s (+0s)	-	-
Duration	0s (+0s)	-	2 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:00:51Z

Braintrust eval report

Say Hi Bot Python ([email protected])

Score	Average	Improvements	Regressions
Levenshtein	77.8% (+0pp)	-	-
Start	1751936453.59s (+0s)	-	-
End	1751936453.6s (+0s)	-	-
Duration	0s (+0s)	-	1 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:00:55Z

Braintrust eval report

Say Hi Bot Python ([email protected])

Score	Average	Improvements	Regressions
Levenshtein	77.8% (+0pp)	-	-
Start	1751936457.19s (+0s)	-	-
End	1751936457.2s (+0s)	-	-
Duration	0s (0s)	1 🟢	-
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:00:58Z

Braintrust eval report

Console logging (HEAD-1751936460)

Score	Average	Improvements	Regressions
Levenshtein	83.9% (+2pp)	8 🟢	7 🔴
Start	1751936460.19s	-	-
End	1751936460.2s	-	-
Duration	0s (0s)	4 🟢	-
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

My Evaluation (HEAD-1751936460)

Score	Average	Improvements	Regressions
Exact match	100% (+0pp)	-	-
Start	1751936460.2s	-	-
End	1751936460.59s	-	-
Duration	0.38s (-0.32s)	1 🟢	-
Prompt_tokens	10tok (+0tok)	-	-
Completion_tokens	2tok (+0tok)	-	-
Total_tokens	12tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

Say Hi Bot (HEAD-1751936460)

Score	Average	Improvements	Regressions
Levenshtein	83.2% (+2pp)	6 🟢	6 🔴
Start	1751936460.25s	-	-
End	1751936461.25s	-	-
Duration	1s (0s)	20 🟢	-
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:00:58Z

Braintrust eval report

Say Hi Bot (HEAD-1751936460-18369f47)

Score	Average	Improvements	Regressions
Levenshtein	83.1% (0pp)	7 🟢	6 🔴
Start	1751936460.3s	-	-
End	1751936461.3s	-	-
Duration	1s (0s)	6 🟢	4 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:01:01Z

Braintrust eval report

Say Hi Bot (HEAD-1751936463)

Score	Average	Improvements	Regressions
Levenshtein	81.3% (-2pp)	4 🟢	8 🔴
Start	1751936463.16s	-	-
End	1751936464.16s	-	-
Duration	1s (+0s)	-	20 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:01:06Z

Braintrust eval report

Console logging (HEAD-1751936469)

Score	Average	Improvements	Regressions
Levenshtein	83.8% (0pp)	6 🟢	5 🔴
Start	1751936468.67s	-	-
End	1751936468.69s	-	-
Duration	0s (+0s)	-	18 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

My Evaluation (HEAD-1751936469)

Score	Average	Improvements	Regressions
Exact match	100% (+0pp)	-	-
Start	1751936468.74s	-	-
End	1751936469.3s	-	-
Duration	0.56s (+0.17s)	-	1 🔴
Prompt_tokens	10tok (+0tok)	-	-
Completion_tokens	2tok (+0tok)	-	-
Total_tokens	12tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

Say Hi Bot (HEAD-1751936469)

Score	Average	Improvements	Regressions
Levenshtein	81% (0pp)	7 🟢	7 🔴
Start	1751936468.77s	-	-
End	1751936469.77s	-	-
Duration	1s (0s)	17 🟢	-
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:01:06Z

Braintrust eval report

Say Hi Bot (HEAD-1751936469-91b995a0)

Score	Average	Improvements	Regressions
Levenshtein	82.9% (+2pp)	6 🟢	4 🔴
Start	1751936468.88s	-	-
End	1751936469.88s	-	-
Duration	1s (+0s)	-	15 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

github-actions · 2025-07-08T01:01:08Z

Braintrust eval report

Say Hi Bot (HEAD-1751936470)

Score	Average	Improvements	Regressions
Levenshtein	82.6% (0pp)	6 🟢	6 🔴
Start	1751936470.21s	-	-
End	1751936471.22s	-	-
Duration	1s (+0s)	1 🟢	14 🔴
Prompt_tokens	0tok (+0tok)	-	-
Completion_tokens	0tok (+0tok)	-	-
Total_tokens	0tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-

ibolmo self-assigned this Jul 7, 2025

ibolmo force-pushed the add-package-manager branch from 34ecc62 to bb949ca Compare July 7, 2025 18:13

graphite-app bot reviewed Jul 7, 2025

View reviewed changes

eval/src/braintrust.ts Outdated Show resolved Hide resolved

examples/python/uv.yml Show resolved Hide resolved

graphite-app bot reviewed Jul 7, 2025

View reviewed changes

eval/src/braintrust.ts Outdated Show resolved Hide resolved

ibolmo force-pushed the add-package-manager branch 7 times, most recently from c77bfc9 to 472d62a Compare July 7, 2025 18:35

add package manager option

5d9a09d

ibolmo force-pushed the add-package-manager branch from 472d62a to 5d9a09d Compare July 7, 2025 18:36

ibolmo added 2 commits July 7, 2025 13:39

commit build

f649a1e

add debug

a1e00ad

daviddkkim reviewed Jul 7, 2025

View reviewed changes

remove type assertions

49abc7f

daviddkkim self-requested a review July 7, 2025 23:59

daviddkkim approved these changes Jul 7, 2025

View reviewed changes

daviddkkim and others added 2 commits July 7, 2025 17:03

prettier -- build issue

0b980bd

drop yarn

8c90476

ibolmo merged commit a0d327a into main Jul 8, 2025
7 checks passed

ibolmo deleted the add-package-manager branch July 8, 2025 00:57

add package manager option #75

add package manager option #75

Uh oh!

Conversation

ibolmo commented Jul 7, 2025

Uh oh!

github-actions bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daviddkkim Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

ibolmo Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

daviddkkim Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

daviddkkim Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

ibolmo Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daviddkkim Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

daviddkkim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

github-actions bot commented Jul 7, 2025 •

edited

Loading

github-actions bot commented Jul 7, 2025 •

edited

Loading

github-actions bot commented Jul 7, 2025 •

edited

Loading

github-actions bot commented Jul 7, 2025 •

edited

Loading

github-actions bot commented Jul 7, 2025 •

edited

Loading

ibolmo Jul 7, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading