Skip to content

Commit

Permalink
fix: evaluator
Browse files Browse the repository at this point in the history
  • Loading branch information
hanxiao committed Feb 22, 2025
1 parent c8cd9bc commit c73a673
Showing 1 changed file with 13 additions and 5 deletions.
18 changes: 13 additions & 5 deletions src/tools/evaluator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -302,41 +302,49 @@ Question: "fam PLEASE help me calculate the eigenvalues of this 4x4 matrix ASAP!
Evaluation: {
"needsFreshness": false,
"needsPlurality": true,
"think": "Multiple eigenvalues needed but no time-sensitive information required",
"think": "I see the user needs help with eigenvalues - that's a calculation task. Since it's a 4x4 matrix, there will be multiple eigenvalues to find. The user's language is very informal with 'fam', 'ASAP', and emojis, suggesting panicked student speech with math terms mixed in.",
"languageStyle": "panicked student English with math jargon"
}
Question: "Can someone explain how tf did Ferrari mess up their pit stop strategy AGAIN?! 🤦‍♂️ #MonacoGP"
Evaluation: {
"needsFreshness": true,
"needsPlurality": true,
"think": "Refers to recent race event and requires analysis of multiple strategic decisions",
"think": "The user is asking about a specific F1 race incident. The 'AGAIN' and MonacoGP hashtag tell me this is about a recent event. They want analysis of several strategic decisions. Their tone shows clear frustration with informal 'tf' and facepalm emoji - classic angry F1 fan speak.",
"languageStyle": "frustrated fan English with F1 terminology"
}
Question: "肖老师您好,请您介绍一下最近量子计算领域的三个重大突破,特别是它们在密码学领域的应用价值吗?🤔"
Evaluation: {
"needsFreshness": true,
"needsPlurality": true,
"think": "Asks for recent breakthroughs (freshness) and specifically requests three examples (plurality)",
"think": "The user wants three recent quantum computing breakthroughs - the '最近' (recent) indicates freshness needed. They use formal address '老师您好' and technical terms, suggesting academic Chinese. The structure asks for multiple examples with cryptography applications.",
"languageStyle": "formal technical Chinese with academic undertones"
}
Question: "Bruder krass, kannst du mir erklären warum meine neural network training loss komplett durchdreht? Hab schon alles probiert 😤"
Evaluation: {
"needsFreshness": false,
"needsPlurality": true,
"think": "Requires comprehensive debugging analysis of multiple potential issues",
"think": "The user has a technical ML problem but explains it very casually. They've 'tried everything' so I'll need to cover multiple debugging angles. Their mix of German slang ('Bruder krass') with English ML terms shows frustrated tech-casual speech.",
"languageStyle": "frustrated German-English tech slang"
}
Question: "Does anyone have insights into the sociopolitical implications of GPT-4's emergence in the Global South, particularly regarding indigenous knowledge systems and linguistic diversity? Looking for a nuanced analysis."
Evaluation: {
"needsFreshness": true,
"needsPlurality": true,
"think": "Requires analysis of current impacts (freshness) across multiple dimensions: sociopolitical, cultural, and linguistic (plurality)",
"think": "The user asks about current GPT-4 impacts, so freshness matters. They specify multiple aspects (sociopolitical, indigenous knowledge, linguistics) and explicitly request nuanced analysis. Their formal academic vocabulary and structure signals scholarly discourse.",
"languageStyle": "formal academic English with sociological terminology"
}
Question: "what's 7 * 9? need to check something real quick"
Evaluation: {
"needsFreshness": false,
"needsPlurality": false,
"think": "The user wants a single multiplication result - that's all. No need for recent info since math is constant, and no need for multiple examples since it's just one calculation. Their casual phrasing suggests quick mental math check.",
"languageStyle": "casual English"
}
</examples>
Now evaluate this question:
Expand Down

0 comments on commit c73a673

Please sign in to comment.