You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To evaluate Docsite Search, generate a test set that can be quickly used to evaluate changes to the pipeline.
This will be a low quality dataset of generated user questions to evaluate effects of changes. The aim is not to get 100% or to reflect real-word accuracy, just as a pointer to compare if something improves from the baseline or the previous version.
Generation & usage:
Include different tagged sections to track different types of questions (e.g. coding, general, workflow)
T/F labels to show whether the documentation should be consulted, and if either adaptor/general documentation should be consulted specifically
Evaluated primarily with an LLM to get a score and summaries of failures/successes
This dataset should later be iterated based on questions collected for the qualitative test sets below. (I.e. Improve dataset generation prompt)
The text was updated successfully, but these errors were encountered:
To evaluate Docsite Search, generate a test set that can be quickly used to evaluate changes to the pipeline.
This will be a low quality dataset of generated user questions to evaluate effects of changes. The aim is not to get 100% or to reflect real-word accuracy, just as a pointer to compare if something improves from the baseline or the previous version.
Generation & usage:
The text was updated successfully, but these errors were encountered: