Skip to content

Conversation

grimoire
Copy link
Collaborator

@grimoire grimoire commented Sep 24, 2025

requirements

enable different TP for Attention/MLP/MoE

@grimoire grimoire marked this pull request as ready for review September 25, 2025 04:52
@grimoire grimoire mentioned this pull request Sep 25, 2025
# Prefill
prefill_request_dict = copy.deepcopy(request_dict)
prefill_request_dict['max_tokens'] = 1
prefill_request_dict['max_completion_tokens'] = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's max_completion_tokens used for? What's the difference between prefill_request_dict['max_completion_tokens'] and prefill_request_dict['max_tokens']

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants