[data] fix KeyError on unmapped role tag in openai converter main loop by he-yufeng · Pull Request #10606 · hiyouga/LlamaFactory

he-yufeng · 2026-06-24T18:57:37Z

What does this PR do?

OpenAIDatasetConverter.__call__ builds aligned_messages like this:

aligned_messages.append(
    {
        "role": tag_mapping[role],
        "content": content,
    }
)

role is never checked against tag_mapping before this lookup, so a single
message whose role tag isn't one of the configured tags (user_tag,
assistant_tag, observation_tag, function_tag, system_tag) raises a
KeyError and takes down the whole datasets.map conversion. The validity
loop right below only runs on aligned_messages after they're built, so it
never gets the chance to flag the row.

The SharegptDatasetConverter main loop already guards this exact case and
just skips the malformed example with a warning:

if message[self.dataset_attr.role_tag] not in accept_tags[turn_idx % 2]:
    logger.warning_rank0(f"Invalid role tag in {messages}.")
    broken_data = True
    break

This adds the matching guard to the openai converter so one bad row gets
skipped with a warning instead of aborting the run.

Repro on main (the new test fails with KeyError: 'not_a_role' before the
fix, passes after):

example = {
    "conversations": [
        {"from": "human", "value": "Solve the math problem.\n3 + 4"},
        {"from": "not_a_role", "value": "The answer is 7."},
    ]
}
get_dataset_converter("openai", dataset_attr, data_args)(example)

This is a sibling of #10601, which fixes the same tag_mapping KeyError
class on the ranking/pairwise chosen/rejected path. This one is the
regular (non-ranking) main message loop, a different code path; the diffs
don't overlap.

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests? (tests/data/test_converter.py::test_openai_converter_skips_invalid_role)

The OpenAIDatasetConverter builds aligned_messages with tag_mapping[role] but never checks role against tag_mapping first, so a single message whose role tag is not in the mapping raises KeyError and aborts the whole dataset conversion. The later validity loop runs on aligned_messages, which is too late to catch it. The SharegptDatasetConverter main loop already guards this case and skips the example, so this just brings the openai path in line. Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request introduces validation in the dataset converter to skip messages with invalid role tags, preventing potential KeyError crashes, and adds a corresponding unit test. The review feedback suggests explicitly checking if the role is None to ensure robust validation and prevent unexpected behavior when certain tags are unconfigured.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-24T18:59:20Z

+            if role not in tag_mapping:
+                logger.warning_rank0(f"Invalid role tag in {messages}.")
+                broken_data = True
+                break


If role is None (e.g., due to a malformed message) and any of the dataset attributes in tag_mapping (such as system_tag or observation_tag) are also None (unconfigured), role not in tag_mapping will evaluate to False. This can lead to incorrect mapping or unexpected behavior.

We should explicitly guard against role being None to ensure robust validation.

Suggested change

if role not in tag_mapping:

logger.warning_rank0(f"Invalid role tag in {messages}.")

broken_data = True

break

if role is None or role not in tag_mapping:

logger.warning_rank0(f"Invalid role tag in {messages}.")

broken_data = True

break

gemini-code-assist Bot reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] fix KeyError on unmapped role tag in openai converter main loop#10606

[data] fix KeyError on unmapped role tag in openai converter main loop#10606
he-yufeng wants to merge 1 commit into
hiyouga:mainfrom
he-yufeng:fix/openai-converter-invalid-role-keyerror

he-yufeng commented Jun 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

he-yufeng commented Jun 24, 2026

What does this PR do?

Before submitting

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant