Skip to content

Cue-swap probe dataset for multi-turn red-team scorers #2121

Description

@connerlambden

Helium Model Worldview includes structured cue-swap pairs (name swap, label swap, topic swap) that could seed PyRIT multi-turn adversarial scenarios.

304 prompts with per-model flip rates on Hugging Face. Safety split shows wide refusal/compliance spread across frontier models.

Dataset: https://huggingface.co/datasets/HeliumTrades/helium-model-worldview-benchmark
Scoring guide: https://heliumtrades.com/benchmarks/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions