Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cookbook/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,5 @@ The following notebooks exemplify workflow steps, features, and possible uses of
## Evaluation

1. [Test Recommendations with a Prompt Dataset](./test_recommendations.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IBM/responsible-prompting-api/blob/develop/cookbook/test_recommendations.ipynb)
2. [Evaluate Embedding Model](./evaluate_embedding_model.ipynb) - Intrinsic embedding quality metrics (inter-cluster distance, misclassification rate, intra-cluster K-means distance).
3. [Embedding Model Comparison: Red Team Evaluation](./embeddings_comparison_red_team.ipynb) - Extrinsic task-level evaluation comparing how different embedding models affect recommendation quality using the red team dataset. Computes accuracy, precision, recall, and F1-score for add and remove recommendations.
Comment on lines +17 to +18
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new notebook entries in the Evaluation section don’t include the existing “Open In Colab” badge/link that all the other cookbook notebooks use. For consistency and easier access, consider adding the same Colab badge links for these two notebooks as well.

Suggested change
2. [Evaluate Embedding Model](./evaluate_embedding_model.ipynb) - Intrinsic embedding quality metrics (inter-cluster distance, misclassification rate, intra-cluster K-means distance).
3. [Embedding Model Comparison: Red Team Evaluation](./embeddings_comparison_red_team.ipynb) - Extrinsic task-level evaluation comparing how different embedding models affect recommendation quality using the red team dataset. Computes accuracy, precision, recall, and F1-score for add and remove recommendations.
2. [Evaluate Embedding Model](./evaluate_embedding_model.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IBM/responsible-prompting-api/blob/develop/cookbook/evaluate_embedding_model.ipynb) - Intrinsic embedding quality metrics (inter-cluster distance, misclassification rate, intra-cluster K-means distance).
3. [Embedding Model Comparison: Red Team Evaluation](./embeddings_comparison_red_team.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IBM/responsible-prompting-api/blob/develop/cookbook/embeddings_comparison_red_team.ipynb) - Extrinsic task-level evaluation comparing how different embedding models affect recommendation quality using the red team dataset. Computes accuracy, precision, recall, and F1-score for add and remove recommendations.

Copilot uses AI. Check for mistakes.
Loading