Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docsite search: Create a larger generated test set for retrieved documents #187

Open
hanna-paasivirta opened this issue Mar 17, 2025 · 0 comments
Assignees

Comments

@hanna-paasivirta
Copy link
Contributor

To evaluate Docsite Search, generate a test set that can be quickly used to evaluate changes to the pipeline.

This will be a low quality dataset of generated user questions to evaluate effects of changes. The aim is not to get 100% or to reflect real-word accuracy, just as a pointer to compare if something improves from the baseline or the previous version.

Generation & usage:

  • Include different tagged sections to track different types of questions (e.g. coding, general, workflow)
  • T/F labels to show whether the documentation should be consulted, and if either adaptor/general documentation should be consulted specifically
  • Evaluated primarily with an LLM to get a score and summaries of failures/successes
  • This dataset should later be iterated based on questions collected for the qualitative test sets below. (I.e. Improve dataset generation prompt)
@hanna-paasivirta hanna-paasivirta self-assigned this Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant