-
Notifications
You must be signed in to change notification settings - Fork 146
feat: add support of internlm2.5-1.8B-chat model #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v2
Are you sure you want to change the base?
Conversation
WalkthroughAdds InternLM2.5 model support including build configuration, example application, and a complete model stack. Introduces model configuration parameters, a C++ implementation featuring attention mechanisms, MLP layers, rotary embeddings, KV caching, and an interactive tokenizer with BPE support for text-to-token conversion and generation. Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as main.cpp
participant Tokenizer as InternLM2Tokenizer
participant Model as InternLM2ForCausalLM
participant Attention as InternLM2Attention
participant Cache as KV Cache
User->>CLI: Input prompt
CLI->>Tokenizer: convertMessage(prompt)
Tokenizer->>Tokenizer: Tokenize with BPE
Tokenizer-->>CLI: Token sequence (Tensor)
CLI->>Model: forward(tokens, position_ids)
Model->>Model: Embed tokens + RoPE
loop Each Decoder Block
Model->>Attention: forward(hidden, KV cache)
Attention->>Attention: Apply rotary embeddings
Attention->>Attention: Compute attention with causal mask
Attention->>Cache: Store/retrieve KV
Attention-->>Model: Attended output
Model->>Model: MLP + residuals
end
Model-->>CLI: Logits
CLI->>CLI: Sample next token
loop Generate Tokens
CLI->>Model: forward(next_token, updated_position_ids)
Model->>Cache: Use cached KV states
Model-->>CLI: Logits
CLI->>Tokenizer: Detokenize token
Tokenizer-->>CLI: Text
CLI->>User: Output text token
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
examples/CMakeLists.txt(1 hunks)examples/internlm2_5/CMakeLists.txt(1 hunks)examples/internlm2_5/config_1.8B.json(1 hunks)examples/internlm2_5/main.cpp(1 hunks)mllm/models/internlm2/configuration_internlm2.hpp(1 hunks)mllm/models/internlm2/modeling_internlm2.hpp(1 hunks)mllm/models/internlm2/tokenization_internlm2.hpp(1 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
examples/internlm2_5/main.cpp
[error] 1-1: 'iostream' file not found
(clang-diagnostic-error)
[error] 9-9: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 9-9: invalid case style for variable 'MLLM_MAIN'
(readability-identifier-naming,-warnings-as-errors)
mllm/models/internlm2/tokenization_internlm2.hpp
[error] 6-6: 'regex' file not found
(clang-diagnostic-error)
[error] 18-18: constructor does not initialize these fields: prompt
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 20-20: variable 'message_template' is non-const and globally accessible, consider making it const
(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)
[error] 27-27: constructor does not initialize these fields: bpe_, bytes_2_unicode_dict_, bytes_2_unicode_dict_inverse_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 30-30: variable 'bytes_2_unicode_dict_' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 30-30: invalid case style for variable 'bytes_2_unicode_dict_'
(readability-identifier-naming,-warnings-as-errors)
[error] 82-82: variable 'ret' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 95-95: variable 'pos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 106-106: variable 'sequence' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 123-123: variable 'normalized' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 124-124: variable 'space_char' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 125-125: variable 'underline_char' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 127-127: variable 'pos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 141-141: variable 'processed' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 149-149: redundant access specifier has the same accessibility as the previous access specifier
(readability-redundant-access-specifiers,-warnings-as-errors)
mllm/models/internlm2/modeling_internlm2.hpp
[error] 4-4: 'cmath' file not found
(clang-diagnostic-error)
[error] 18-18: 3 adjacent parameters of 'makeRoPEInvFreq' of convertible types are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 71-71: constructor does not initialize these fields: w1_, w3_, w2_, silu_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 80-80: constructor does not initialize these fields: w1_, w3_, w2_, silu_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 80-80: 2 adjacent parameters of 'InternLM2MLP' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 88-88: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 90-90: variable name 'y' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 97-97: constructor does not initialize these fields: wqkv_, o_proj_, q_rope_, k_rope_, mask_, softmax_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 114-114: constructor does not initialize these fields: wqkv_, o_proj_, q_rope_, k_rope_, mask_, softmax_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 114-114: 2 adjacent parameters of 'InternLM2Attention' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 119-119: 'num_key_value_groups_' should be initialized in a member initializer of the constructor
(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)
[error] 131-131: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 132-132: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 139-139: variable 'B' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 139-139: variable name 'B' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 139-139: invalid case style for variable 'B'
(readability-identifier-naming,-warnings-as-errors)
[error] 140-140: variable 'S' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 140-140: variable name 'S' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 140-140: invalid case style for variable 'S'
(readability-identifier-naming,-warnings-as-errors)
[error] 175-175: member variable 'layer_idx_' has public visibility
(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)
[error] 178-178: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 185-185: redundant access specifier has the same accessibility as the previous access specifier
(readability-redundant-access-specifiers,-warnings-as-errors)
[error] 188-188: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 188-188: 2 adjacent parameters of 'InternLM2Decoder' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 195-195: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 200-200: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 210-210: constructor does not initialize these fields: embedding_, layers_, norm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 218-218: constructor does not initialize these fields: embedding_, layers_, norm_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 218-218: 2 adjacent parameters of 'InternLM2Model' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 225-225: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 226-226: variable name 'x' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 240-240: constructor does not initialize these fields: cfg_, lm_head_, kv_cache_, rope_scaling_type_
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 265-265: variable 'position_ids' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 286-286: variable 'inv_freq' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 300-300: variable name 'S' is too short, expected at least 3 characters
(readability-identifier-length,-warnings-as-errors)
[error] 300-300: invalid case style for variable 'S'
(readability-identifier-naming,-warnings-as-errors)
[error] 320-320: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 321-321: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
mllm/models/internlm2/configuration_internlm2.hpp
[error] 6-6: 'string' file not found
(clang-diagnostic-error)
[error] 13-13: constructor does not initialize these fields: rope_scaling_type
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 16-16: constructor does not initialize these fields: rope_scaling_type
(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)
[error] 24-24: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 65-65: 1e-6f is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 65-65: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 66-66: 10000.0f is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 66-66: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 67-67: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
[error] 70-70: 0.02f is a magic number; consider replacing it with a named constant
(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)
[error] 70-70: floating point literal has suffix 'f', which is not uppercase
(readability-uppercase-literal-suffix,-warnings-as-errors)
| fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI "); | ||
| fmt::print("Enter 'exit' or 'quit' to end the session\n\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the banner text
The banner still says “InternLM2.5 1.5B CLI”, which is misleading now that this runner targets the 1.8 B chat model. Please update the label to match the actual model size.
Apply this diff:
- fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI ");
+ fmt::print("\n{:*^60}\n", " InternLM2.5 1.8B CLI ");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI "); | |
| fmt::print("Enter 'exit' or 'quit' to end the session\n\n"); | |
| fmt::print("\n{:*^60}\n", " InternLM2.5 1.8B CLI "); | |
| fmt::print("Enter 'exit' or 'quit' to end the session\n\n"); |
🤖 Prompt for AI Agents
In examples/internlm2_5/main.cpp around lines 38 to 39, the banner currently
reads "InternLM2.5 1.5B CLI" but the runner targets the 1.8B chat model; update
the banner text to "InternLM2.5 1.8B CLI" by changing the formatted string
passed to fmt::print so the displayed label reflects the correct model size.
| std::string prompt_text; | ||
| fmt::print("💬 Prompt text (or 'exit/quit'): "); | ||
| std::getline(std::cin, prompt_text); | ||
| if (!(prompt_text == "exit" || prompt_text == "quit")) { | ||
| try { | ||
| fmt::print("🔄 Processing...\n"); | ||
| mllm::models::internlm2::InternLM2Message prompt{prompt_text}; | ||
| auto inputs = tokenizer.convertMessage(prompt); | ||
|
|
||
| fmt::print("\n🤖 Response: "); | ||
| for (auto& step : model.chat(inputs)) { | ||
| auto token = tokenizer.detokenize(step.cur_token_id); | ||
| std::wcout << token << std::flush; | ||
| } | ||
| fmt::print("\n{}\n", std::string(60, '-')); | ||
| } catch (const std::exception& e) { fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-')); } | ||
| model.perfSummary(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loop the interactive chat
The CLI prints “Enter 'exit' or 'quit' to end the session”, but it only processes a single prompt and then exits. Users can’t hold a multi-turn conversation, which defeats the whole purpose of exposing a chat runner. Wrap the prompt/response block in a loop so exit/quit truly controls termination.
Apply this diff to address the issue:
- std::string prompt_text;
- fmt::print("💬 Prompt text (or 'exit/quit'): ");
- std::getline(std::cin, prompt_text);
- if (!(prompt_text == "exit" || prompt_text == "quit")) {
- try {
- fmt::print("🔄 Processing...\n");
- mllm::models::internlm2::InternLM2Message prompt{prompt_text};
- auto inputs = tokenizer.convertMessage(prompt);
-
- fmt::print("\n🤖 Response: ");
- for (auto& step : model.chat(inputs)) {
- auto token = tokenizer.detokenize(step.cur_token_id);
- std::wcout << token << std::flush;
- }
- fmt::print("\n{}\n", std::string(60, '-'));
- } catch (const std::exception& e) { fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-')); }
- model.perfSummary();
- }
+ for (std::string prompt_text;;) {
+ fmt::print("💬 Prompt text (or 'exit/quit'): ");
+ if (!std::getline(std::cin, prompt_text)) { break; }
+ if (prompt_text == "exit" || prompt_text == "quit") { break; }
+ try {
+ fmt::print("🔄 Processing...\n");
+ mllm::models::internlm2::InternLM2Message prompt{prompt_text};
+ auto inputs = tokenizer.convertMessage(prompt);
+
+ fmt::print("\n🤖 Response: ");
+ for (auto& step : model.chat(inputs)) {
+ auto token = tokenizer.detokenize(step.cur_token_id);
+ std::wcout << token << std::flush;
+ }
+ fmt::print("\n{}\n", std::string(60, '-'));
+ } catch (const std::exception& e) {
+ fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-'));
+ }
+ model.perfSummary();
+ }
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
for (std::string prompt_text;;) {
fmt::print("💬 Prompt text (or 'exit/quit'): ");
if (!std::getline(std::cin, prompt_text)) { break; }
if (prompt_text == "exit" || prompt_text == "quit") { break; }
try {
fmt::print("🔄 Processing...\n");
mllm::models::internlm2::InternLM2Message prompt{prompt_text};
auto inputs = tokenizer.convertMessage(prompt);
fmt::print("\n🤖 Response: ");
for (auto& step : model.chat(inputs)) {
auto token = tokenizer.detokenize(step.cur_token_id);
std::wcout << token << std::flush;
}
fmt::print("\n{}\n", std::string(60, '-'));
} catch (const std::exception& e) {
fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-'));
}
model.perfSummary();
}🤖 Prompt for AI Agents
In examples/internlm2_5/main.cpp around lines 41-58, the interactive prompt is
executed only once; wrap the prompt/response block in a loop so users can hold a
multi-turn conversation. Replace the single read+if with a for/while loop that
repeatedly prints the prompt, uses std::getline to read into prompt_text
(breaking the loop on EOF), and breaks when the input equals "exit" or "quit";
keep the existing try/catch/response printing and call model.perfSummary() after
each iteration.
| head_dim = hidden_size / num_attention_heads; | ||
| max_cache_length = max_position_embeddings; | ||
| end_of_text_token_id = static_cast<int32_t>(eos_token_id); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate head dimension before use
head_dim is computed with plain integer division. If hidden_size isn’t divisible by num_attention_heads (or if the value is zero), this silently truncates and breaks attention math later on. The reference InternLM2 implementations explicitly guard against this with a runtime check. Please add the same validation before assigning head_dim so that bad configs fail fast instead of producing corrupted tensors. (huggingface.co)
Use this diff as a starting point:
- head_dim = hidden_size / num_attention_heads;
+ if (num_attention_heads <= 0 || hidden_size % num_attention_heads != 0) {
+ throw std::invalid_argument(
+ fmt::format("hidden_size ({}) must be divisible by num_attention_heads ({})",
+ hidden_size, num_attention_heads));
+ }
+ head_dim = hidden_size / num_attention_heads;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| head_dim = hidden_size / num_attention_heads; | |
| max_cache_length = max_position_embeddings; | |
| end_of_text_token_id = static_cast<int32_t>(eos_token_id); | |
| } | |
| if (num_attention_heads <= 0 || hidden_size % num_attention_heads != 0) { | |
| throw std::invalid_argument( | |
| fmt::format("hidden_size ({}) must be divisible by num_attention_heads ({})", | |
| hidden_size, num_attention_heads)); | |
| } | |
| head_dim = hidden_size / num_attention_heads; | |
| max_cache_length = max_position_embeddings; | |
| end_of_text_token_id = static_cast<int32_t>(eos_token_id); | |
| } |
🤖 Prompt for AI Agents
In mllm/models/internlm2/configuration_internlm2.hpp around lines 50 to 53,
validate that num_attention_heads is > 0 and that hidden_size is divisible by
num_attention_heads before computing head_dim; if either check fails, throw or
log a clear runtime error (include hidden_size and num_attention_heads in the
message) so misconfigured models fail fast rather than producing truncated
head_dim, then compute head_dim = hidden_size / num_attention_heads and proceed
with max_cache_length and end_of_text_token_id assignments.
| Tensor position_ids; | ||
| if (input.count("position_ids")) { | ||
| position_ids = input.at("position_ids"); | ||
| if (seq_len == 1) { | ||
| auto last_pos = *position_ids.offsettedPtr<int64_t>({0, position_ids.shape()[1] - 1}); | ||
| position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc(); | ||
| *position_ids.offsettedPtr<int64_t>({0, 0}) = last_pos + 1; | ||
| } | ||
| } else { | ||
| position_ids = Tensor::empty({batch_size, seq_len}, kInt64, kCPU).alloc(); | ||
| auto position_ids_ptr = position_ids.ptr<int64_t>(); | ||
| for (int b = 0; b < batch_size; ++b) { | ||
| for (int s = 0; s < seq_len; ++s) { position_ids_ptr[b * seq_len + s] = s; } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
position_ids increment only updates batch 0.
In the single-token decode path, we allocate a new {batch_size, 1} tensor but write last_pos + 1 only for row 0. Every other batch element keeps undefined data, so downstream RoPE receives incorrect (often zero) positions and generation breaks for batched decoding. Iterate over every batch entry when populating the new tensor.
- if (seq_len == 1) {
- auto last_pos = *position_ids.offsettedPtr<int64_t>({0, position_ids.shape()[1] - 1});
- position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc();
- *position_ids.offsettedPtr<int64_t>({0, 0}) = last_pos + 1;
+ if (seq_len == 1) {
+ auto provided = position_ids;
+ auto last_col = provided.shape()[1] - 1;
+ position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc();
+ auto new_ptr = position_ids.ptr<int64_t>();
+ auto provided_ptr = provided.ptr<int64_t>();
+ auto stride = provided.shape()[1];
+ for (int b = 0; b < batch_size; ++b) {
+ new_ptr[b] = provided_ptr[b * stride + last_col] + 1;
+ }🧰 Tools
🪛 Clang (14.0.6)
[error] 265-265: variable 'position_ids' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/models/internlm2/modeling_internlm2.hpp around lines 265-278, the
single-token decode path creates a {batch_size,1} position_ids tensor but only
writes last_pos+1 to row 0, leaving other rows undefined; update the allocation
path to fill every batch row with last_pos+1 (e.g., loop b from 0 to
batch_size-1 and write last_pos+1 into position_ids at {b,0} using the same
offsettedPtr or pointer arithmetic) so all batch elements have correct position
ids for RoPE.
| ARGenerationOutputPast convertMessage(const InternLM2Message& prompt) { | ||
| auto applied_string = InternLM2Message::message_template; | ||
| size_t pos = applied_string.find("{{{prompt}}}"); | ||
| applied_string.replace(pos, 12, prompt.prompt); | ||
| auto tokens = tokenize(prompt.prompt); | ||
|
|
||
| if (!prompt.add_bos && !tokens.empty() && tokens.front() == L"<s>") { tokens.erase(tokens.begin()); } | ||
| if (prompt.add_eos && (tokens.empty() || tokens.back() != L"</s>")) { tokens.emplace_back(L"</s>"); } | ||
|
|
||
| std::vector<int64_t> ids; | ||
| ids.reserve(tokens.size()); | ||
| for (const auto& token : tokens) { ids.emplace_back(bpe_._lookup_vocab(token)); } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Message template is never applied in convertMessage.
applied_string builds the chat-format prompt (with <|im_start|> / <|im_end|> markers), but the code still tokenizes prompt.prompt, so those special tokens never reach the model. As a result, generated conversations miss the role delimiters the model expects, breaking adherence to InternLM2's chat format. Please tokenize the formatted string instead.
- auto tokens = tokenize(prompt.prompt);
+ auto tokens = tokenize(applied_string);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ARGenerationOutputPast convertMessage(const InternLM2Message& prompt) { | |
| auto applied_string = InternLM2Message::message_template; | |
| size_t pos = applied_string.find("{{{prompt}}}"); | |
| applied_string.replace(pos, 12, prompt.prompt); | |
| auto tokens = tokenize(prompt.prompt); | |
| if (!prompt.add_bos && !tokens.empty() && tokens.front() == L"<s>") { tokens.erase(tokens.begin()); } | |
| if (prompt.add_eos && (tokens.empty() || tokens.back() != L"</s>")) { tokens.emplace_back(L"</s>"); } | |
| std::vector<int64_t> ids; | |
| ids.reserve(tokens.size()); | |
| for (const auto& token : tokens) { ids.emplace_back(bpe_._lookup_vocab(token)); } | |
| ARGenerationOutputPast convertMessage(const InternLM2Message& prompt) { | |
| auto applied_string = InternLM2Message::message_template; | |
| size_t pos = applied_string.find("{{{prompt}}}"); | |
| applied_string.replace(pos, 12, prompt.prompt); | |
| auto tokens = tokenize(applied_string); | |
| if (!prompt.add_bos && !tokens.empty() && tokens.front() == L"<s>") { tokens.erase(tokens.begin()); } | |
| if (prompt.add_eos && (tokens.empty() || tokens.back() != L"</s>")) { tokens.emplace_back(L"</s>"); } | |
| std::vector<int64_t> ids; | |
| ids.reserve(tokens.size()); | |
| for (const auto& token : tokens) { ids.emplace_back(bpe_._lookup_vocab(token)); } |
🧰 Tools
🪛 Clang (14.0.6)
[error] 95-95: variable 'pos' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🤖 Prompt for AI Agents
In mllm/models/internlm2/tokenization_internlm2.hpp around lines 93 to 105, the
function builds applied_string with the chat-format template but still calls
tokenize(prompt.prompt), so the special chat markers are never tokenized; change
the tokenize call to use applied_string (i.e., tokenize(applied_string)) and
then proceed with the existing BOS/EOS and vocab lookup logic on the resulting
tokens so the model receives the formatted prompt with role delimiters.
Summary by CodeRabbit