Skip to content

[FEATURE]: Evaluate prompt responses using another model #26

@colbytimm

Description

@colbytimm

Problem Statement

  • Be able to evaluate responses using another LLM model

Proposed Solution

  • Define a new evaluator that uses an LLM prompt to test.

Example

Input Prompt

You are a helpful assistant. Answer the user's question clearly and politely.

What is the capital of France?

Example Response

The capital of France is Paris.

Evaluator Prompt

You are an evaluator checking the quality of a model's response to a user's question.

Rate the model's response from 1 to 5 based on these criteria:
- Accuracy: Is the answer factually correct?
- Friendliness: Is the tone polite and helpful?

Scoring:
- 5: Fully accurate and very friendly
- 3: Mostly accurate or somewhat friendly
- 1: Inaccurate or unfriendly

Question: {{input}}
Response: {{response}}

Only return the numeric score (1–5).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions