Skip to content

Conversation

@Sp0tless
Copy link

@Sp0tless Sp0tless commented Nov 12, 2025

Summary by CodeRabbit

  • New Features
    • Added support for InternLM2 model architecture with configuration and tokenization
    • Added interactive CLI example demonstrating InternLM2 model inference and chat functionality
    • Included pre-configured setup for InternLM2 1.8B variant

@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

Walkthrough

Adds InternLM2.5 model support including build configuration, example application, and a complete model stack. Introduces model configuration parameters, a C++ implementation featuring attention mechanisms, MLP layers, rotary embeddings, KV caching, and an interactive tokenizer with BPE support for text-to-token conversion and generation.

Changes

Cohort / File(s) Summary
Build Configuration
examples/CMakeLists.txt, examples/internlm2_5/CMakeLists.txt
Adds internlm2_5 subdirectory to build system and creates executable target mllm-internlm2-5-chat-runner with runtime and CPU backend dependencies.
Example Resources
examples/internlm2_5/config_1.8B.json
Model configuration file specifying InternLM2ForCausalLM hyperparameters (hidden size, attention heads, layer count, rope scaling, token IDs, dtype).
Example Application
examples/internlm2_5/main.cpp
Interactive CLI chat runner with Argparse flags, model initialization, tokenization pipeline, token-by-token generation, performance timing (Perfetto), and memory reporting.
Model Configuration
mllm/models/internlm2/configuration_internlm2.hpp
Struct InternLM2Config parsing JSON hyperparameters, computing derived fields (head_dim, max_cache_length), supporting optional rope_scaling and linear implementation type mapping.
Model Implementation
mllm/models/internlm2/modeling_internlm2.hpp
Complete model stack: utility functions for RoPE frequency and rotary embeddings, InternLM2MLP with gating, InternLM2Attention with causal masking and KV caching, InternLM2Decoder with residual connections, InternLM2Model container, and InternLM2ForCausalLM causal LM wrapper with dynamic rope scaling and position ID handling.
Tokenizer
mllm/models/internlm2/tokenization_internlm2.hpp
Struct InternLM2Message for prompt templating; class InternLM2Tokenizer deriving from AutoTokenizer with BPE vocabulary loading, bytes-to-unicode mapping, special token registration, tokenize/detokenize methods, and convertMessage for sequence tensor generation.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as main.cpp
    participant Tokenizer as InternLM2Tokenizer
    participant Model as InternLM2ForCausalLM
    participant Attention as InternLM2Attention
    participant Cache as KV Cache

    User->>CLI: Input prompt
    CLI->>Tokenizer: convertMessage(prompt)
    Tokenizer->>Tokenizer: Tokenize with BPE
    Tokenizer-->>CLI: Token sequence (Tensor)
    
    CLI->>Model: forward(tokens, position_ids)
    Model->>Model: Embed tokens + RoPE
    loop Each Decoder Block
        Model->>Attention: forward(hidden, KV cache)
        Attention->>Attention: Apply rotary embeddings
        Attention->>Attention: Compute attention with causal mask
        Attention->>Cache: Store/retrieve KV
        Attention-->>Model: Attended output
        Model->>Model: MLP + residuals
    end
    Model-->>CLI: Logits
    
    CLI->>CLI: Sample next token
    loop Generate Tokens
        CLI->>Model: forward(next_token, updated_position_ids)
        Model->>Cache: Use cached KV states
        Model-->>CLI: Logits
        CLI->>Tokenizer: Detokenize token
        Tokenizer-->>CLI: Text
        CLI->>User: Output text token
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • High-density mathematical logic: RoPE computation, attention mechanisms with dynamic scaling, and causal masking require careful verification
  • Complex state management: KV cache integration, buffer registration for inv_freq_base, and layer-index tracking across decoder blocks
  • Tokenizer implementation: BPE vocabulary handling, bytes-to-unicode mapping, special token trie registration, and message template processing
  • Multiple interconnected public interfaces: Six new exported classes/structs with non-trivial forward signatures and initialization paths
  • Key areas requiring extra attention:
    • modeling_internlm2.hpp: RoPE frequency scaling logic (dynamic vs. linear modes), attention mask computation, and cache state transitions
    • tokenization_internlm2.hpp: BPE token lookup correctness and special token handling edge cases
    • configuration_internlm2.hpp: JSON parsing and derived field calculations (head_dim, max_cache_length)
    • main.cpp: Model loading error paths and CLI argument validation

Suggested reviewers

  • yirongjie
  • oreomaker

Poem

🐰 A rabbit hops through embeddings grand,
With RoPE and attention hand-in-hand,
InternLM2 now joins the fray,
Tokenizing tokens the BPE way!
KV caches cached, let generation bloom—
Chat runners dancing out of the room! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: adding support for the internlm2.5-1.8B-chat model. The summary confirms new model configuration, tokenizer, and inference implementation files for this specific model.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chenghuaWang
Copy link
Collaborator

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a16b55 and 190c7e7.

📒 Files selected for processing (7)
  • examples/CMakeLists.txt (1 hunks)
  • examples/internlm2_5/CMakeLists.txt (1 hunks)
  • examples/internlm2_5/config_1.8B.json (1 hunks)
  • examples/internlm2_5/main.cpp (1 hunks)
  • mllm/models/internlm2/configuration_internlm2.hpp (1 hunks)
  • mllm/models/internlm2/modeling_internlm2.hpp (1 hunks)
  • mllm/models/internlm2/tokenization_internlm2.hpp (1 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
examples/internlm2_5/main.cpp

[error] 1-1: 'iostream' file not found

(clang-diagnostic-error)


[error] 9-9: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 9-9: invalid case style for variable 'MLLM_MAIN'

(readability-identifier-naming,-warnings-as-errors)

mllm/models/internlm2/tokenization_internlm2.hpp

[error] 6-6: 'regex' file not found

(clang-diagnostic-error)


[error] 18-18: constructor does not initialize these fields: prompt

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 20-20: variable 'message_template' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)


[error] 27-27: constructor does not initialize these fields: bpe_, bytes_2_unicode_dict_, bytes_2_unicode_dict_inverse_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 30-30: variable 'bytes_2_unicode_dict_' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 30-30: invalid case style for variable 'bytes_2_unicode_dict_'

(readability-identifier-naming,-warnings-as-errors)


[error] 82-82: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 95-95: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 106-106: variable 'sequence' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 123-123: variable 'normalized' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 124-124: variable 'space_char' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 125-125: variable 'underline_char' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 127-127: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 141-141: variable 'processed' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 149-149: redundant access specifier has the same accessibility as the previous access specifier

(readability-redundant-access-specifiers,-warnings-as-errors)

mllm/models/internlm2/modeling_internlm2.hpp

[error] 4-4: 'cmath' file not found

(clang-diagnostic-error)


[error] 18-18: 3 adjacent parameters of 'makeRoPEInvFreq' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 71-71: constructor does not initialize these fields: w1_, w3_, w2_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 80-80: constructor does not initialize these fields: w1_, w3_, w2_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 80-80: 2 adjacent parameters of 'InternLM2MLP' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 88-88: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 90-90: variable name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 97-97: constructor does not initialize these fields: wqkv_, o_proj_, q_rope_, k_rope_, mask_, softmax_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 114-114: constructor does not initialize these fields: wqkv_, o_proj_, q_rope_, k_rope_, mask_, softmax_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 114-114: 2 adjacent parameters of 'InternLM2Attention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 119-119: 'num_key_value_groups_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 131-131: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 132-132: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 139-139: variable 'B' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 139-139: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 139-139: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)


[error] 140-140: variable 'S' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 140-140: variable name 'S' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 140-140: invalid case style for variable 'S'

(readability-identifier-naming,-warnings-as-errors)


[error] 175-175: member variable 'layer_idx_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 178-178: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 185-185: redundant access specifier has the same accessibility as the previous access specifier

(readability-redundant-access-specifiers,-warnings-as-errors)


[error] 188-188: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 188-188: 2 adjacent parameters of 'InternLM2Decoder' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 195-195: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 200-200: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 210-210: constructor does not initialize these fields: embedding_, layers_, norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 218-218: constructor does not initialize these fields: embedding_, layers_, norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 218-218: 2 adjacent parameters of 'InternLM2Model' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 225-225: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 226-226: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 240-240: constructor does not initialize these fields: cfg_, lm_head_, kv_cache_, rope_scaling_type_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 265-265: variable 'position_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 286-286: variable 'inv_freq' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 300-300: variable name 'S' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 300-300: invalid case style for variable 'S'

(readability-identifier-naming,-warnings-as-errors)


[error] 320-320: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 321-321: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

mllm/models/internlm2/configuration_internlm2.hpp

[error] 6-6: 'string' file not found

(clang-diagnostic-error)


[error] 13-13: constructor does not initialize these fields: rope_scaling_type

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 16-16: constructor does not initialize these fields: rope_scaling_type

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 24-24: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)


[error] 65-65: 1e-6f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 65-65: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 66-66: 10000.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 66-66: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 67-67: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 70-70: 0.02f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 70-70: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

Comment on lines +38 to +39
fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI ");
fmt::print("Enter 'exit' or 'quit' to end the session\n\n");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the banner text
The banner still says “InternLM2.5 1.5B CLI”, which is misleading now that this runner targets the 1.8 B chat model. Please update the label to match the actual model size.

Apply this diff:

-  fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI ");
+  fmt::print("\n{:*^60}\n", " InternLM2.5 1.8B CLI ");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI ");
fmt::print("Enter 'exit' or 'quit' to end the session\n\n");
fmt::print("\n{:*^60}\n", " InternLM2.5 1.8B CLI ");
fmt::print("Enter 'exit' or 'quit' to end the session\n\n");
🤖 Prompt for AI Agents
In examples/internlm2_5/main.cpp around lines 38 to 39, the banner currently
reads "InternLM2.5 1.5B CLI" but the runner targets the 1.8B chat model; update
the banner text to "InternLM2.5 1.8B CLI" by changing the formatted string
passed to fmt::print so the displayed label reflects the correct model size.

Comment on lines +41 to +58
std::string prompt_text;
fmt::print("💬 Prompt text (or 'exit/quit'): ");
std::getline(std::cin, prompt_text);
if (!(prompt_text == "exit" || prompt_text == "quit")) {
try {
fmt::print("🔄 Processing...\n");
mllm::models::internlm2::InternLM2Message prompt{prompt_text};
auto inputs = tokenizer.convertMessage(prompt);

fmt::print("\n🤖 Response: ");
for (auto& step : model.chat(inputs)) {
auto token = tokenizer.detokenize(step.cur_token_id);
std::wcout << token << std::flush;
}
fmt::print("\n{}\n", std::string(60, '-'));
} catch (const std::exception& e) { fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-')); }
model.perfSummary();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Loop the interactive chat
The CLI prints “Enter 'exit' or 'quit' to end the session”, but it only processes a single prompt and then exits. Users can’t hold a multi-turn conversation, which defeats the whole purpose of exposing a chat runner. Wrap the prompt/response block in a loop so exit/quit truly controls termination.

Apply this diff to address the issue:

-  std::string prompt_text;
-  fmt::print("💬 Prompt text (or 'exit/quit'): ");
-  std::getline(std::cin, prompt_text);
-  if (!(prompt_text == "exit" || prompt_text == "quit")) {
-    try {
-      fmt::print("🔄 Processing...\n");
-      mllm::models::internlm2::InternLM2Message prompt{prompt_text};
-      auto inputs = tokenizer.convertMessage(prompt);
-
-      fmt::print("\n🤖 Response: ");
-      for (auto& step : model.chat(inputs)) {
-        auto token = tokenizer.detokenize(step.cur_token_id);
-        std::wcout << token << std::flush;
-      }
-      fmt::print("\n{}\n", std::string(60, '-'));
-    } catch (const std::exception& e) { fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-')); }
-    model.perfSummary();
-  }
+  for (std::string prompt_text;;) {
+    fmt::print("💬 Prompt text (or 'exit/quit'): ");
+    if (!std::getline(std::cin, prompt_text)) { break; }
+    if (prompt_text == "exit" || prompt_text == "quit") { break; }
+    try {
+      fmt::print("🔄 Processing...\n");
+      mllm::models::internlm2::InternLM2Message prompt{prompt_text};
+      auto inputs = tokenizer.convertMessage(prompt);
+
+      fmt::print("\n🤖 Response: ");
+      for (auto& step : model.chat(inputs)) {
+        auto token = tokenizer.detokenize(step.cur_token_id);
+        std::wcout << token << std::flush;
+      }
+      fmt::print("\n{}\n", std::string(60, '-'));
+    } catch (const std::exception& e) {
+      fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-'));
+    }
+    model.perfSummary();
+  }

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion
  for (std::string prompt_text;;) {
    fmt::print("💬 Prompt text (or 'exit/quit'): ");
    if (!std::getline(std::cin, prompt_text)) { break; }
    if (prompt_text == "exit" || prompt_text == "quit") { break; }
    try {
      fmt::print("🔄 Processing...\n");
      mllm::models::internlm2::InternLM2Message prompt{prompt_text};
      auto inputs = tokenizer.convertMessage(prompt);

      fmt::print("\n🤖 Response: ");
      for (auto& step : model.chat(inputs)) {
        auto token = tokenizer.detokenize(step.cur_token_id);
        std::wcout << token << std::flush;
      }
      fmt::print("\n{}\n", std::string(60, '-'));
    } catch (const std::exception& e) {
      fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-'));
    }
    model.perfSummary();
  }
🤖 Prompt for AI Agents
In examples/internlm2_5/main.cpp around lines 41-58, the interactive prompt is
executed only once; wrap the prompt/response block in a loop so users can hold a
multi-turn conversation. Replace the single read+if with a for/while loop that
repeatedly prints the prompt, uses std::getline to read into prompt_text
(breaking the loop on EOF), and breaks when the input equals "exit" or "quit";
keep the existing try/catch/response printing and call model.perfSummary() after
each iteration.

Comment on lines +50 to +53
head_dim = hidden_size / num_attention_heads;
max_cache_length = max_position_embeddings;
end_of_text_token_id = static_cast<int32_t>(eos_token_id);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate head dimension before use
head_dim is computed with plain integer division. If hidden_size isn’t divisible by num_attention_heads (or if the value is zero), this silently truncates and breaks attention math later on. The reference InternLM2 implementations explicitly guard against this with a runtime check. Please add the same validation before assigning head_dim so that bad configs fail fast instead of producing corrupted tensors. (huggingface.co)

Use this diff as a starting point:

-    head_dim = hidden_size / num_attention_heads;
+    if (num_attention_heads <= 0 || hidden_size % num_attention_heads != 0) {
+      throw std::invalid_argument(
+          fmt::format("hidden_size ({}) must be divisible by num_attention_heads ({})",
+                      hidden_size, num_attention_heads));
+    }
+    head_dim = hidden_size / num_attention_heads;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
head_dim = hidden_size / num_attention_heads;
max_cache_length = max_position_embeddings;
end_of_text_token_id = static_cast<int32_t>(eos_token_id);
}
if (num_attention_heads <= 0 || hidden_size % num_attention_heads != 0) {
throw std::invalid_argument(
fmt::format("hidden_size ({}) must be divisible by num_attention_heads ({})",
hidden_size, num_attention_heads));
}
head_dim = hidden_size / num_attention_heads;
max_cache_length = max_position_embeddings;
end_of_text_token_id = static_cast<int32_t>(eos_token_id);
}
🤖 Prompt for AI Agents
In mllm/models/internlm2/configuration_internlm2.hpp around lines 50 to 53,
validate that num_attention_heads is > 0 and that hidden_size is divisible by
num_attention_heads before computing head_dim; if either check fails, throw or
log a clear runtime error (include hidden_size and num_attention_heads in the
message) so misconfigured models fail fast rather than producing truncated
head_dim, then compute head_dim = hidden_size / num_attention_heads and proceed
with max_cache_length and end_of_text_token_id assignments.

Comment on lines +265 to +278
Tensor position_ids;
if (input.count("position_ids")) {
position_ids = input.at("position_ids");
if (seq_len == 1) {
auto last_pos = *position_ids.offsettedPtr<int64_t>({0, position_ids.shape()[1] - 1});
position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc();
*position_ids.offsettedPtr<int64_t>({0, 0}) = last_pos + 1;
}
} else {
position_ids = Tensor::empty({batch_size, seq_len}, kInt64, kCPU).alloc();
auto position_ids_ptr = position_ids.ptr<int64_t>();
for (int b = 0; b < batch_size; ++b) {
for (int s = 0; s < seq_len; ++s) { position_ids_ptr[b * seq_len + s] = s; }
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

position_ids increment only updates batch 0.
In the single-token decode path, we allocate a new {batch_size, 1} tensor but write last_pos + 1 only for row 0. Every other batch element keeps undefined data, so downstream RoPE receives incorrect (often zero) positions and generation breaks for batched decoding. Iterate over every batch entry when populating the new tensor.

-      if (seq_len == 1) {
-        auto last_pos = *position_ids.offsettedPtr<int64_t>({0, position_ids.shape()[1] - 1});
-        position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc();
-        *position_ids.offsettedPtr<int64_t>({0, 0}) = last_pos + 1;
+      if (seq_len == 1) {
+        auto provided = position_ids;
+        auto last_col = provided.shape()[1] - 1;
+        position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc();
+        auto new_ptr = position_ids.ptr<int64_t>();
+        auto provided_ptr = provided.ptr<int64_t>();
+        auto stride = provided.shape()[1];
+        for (int b = 0; b < batch_size; ++b) {
+          new_ptr[b] = provided_ptr[b * stride + last_col] + 1;
+        }
🧰 Tools
🪛 Clang (14.0.6)

[error] 265-265: variable 'position_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/models/internlm2/modeling_internlm2.hpp around lines 265-278, the
single-token decode path creates a {batch_size,1} position_ids tensor but only
writes last_pos+1 to row 0, leaving other rows undefined; update the allocation
path to fill every batch row with last_pos+1 (e.g., loop b from 0 to
batch_size-1 and write last_pos+1 into position_ids at {b,0} using the same
offsettedPtr or pointer arithmetic) so all batch elements have correct position
ids for RoPE.

Comment on lines +93 to +105
ARGenerationOutputPast convertMessage(const InternLM2Message& prompt) {
auto applied_string = InternLM2Message::message_template;
size_t pos = applied_string.find("{{{prompt}}}");
applied_string.replace(pos, 12, prompt.prompt);
auto tokens = tokenize(prompt.prompt);

if (!prompt.add_bos && !tokens.empty() && tokens.front() == L"<s>") { tokens.erase(tokens.begin()); }
if (prompt.add_eos && (tokens.empty() || tokens.back() != L"</s>")) { tokens.emplace_back(L"</s>"); }

std::vector<int64_t> ids;
ids.reserve(tokens.size());
for (const auto& token : tokens) { ids.emplace_back(bpe_._lookup_vocab(token)); }

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Message template is never applied in convertMessage.
applied_string builds the chat-format prompt (with <|im_start|> / <|im_end|> markers), but the code still tokenizes prompt.prompt, so those special tokens never reach the model. As a result, generated conversations miss the role delimiters the model expects, breaking adherence to InternLM2's chat format. Please tokenize the formatted string instead.

-    auto tokens = tokenize(prompt.prompt);
+    auto tokens = tokenize(applied_string);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ARGenerationOutputPast convertMessage(const InternLM2Message& prompt) {
auto applied_string = InternLM2Message::message_template;
size_t pos = applied_string.find("{{{prompt}}}");
applied_string.replace(pos, 12, prompt.prompt);
auto tokens = tokenize(prompt.prompt);
if (!prompt.add_bos && !tokens.empty() && tokens.front() == L"<s>") { tokens.erase(tokens.begin()); }
if (prompt.add_eos && (tokens.empty() || tokens.back() != L"</s>")) { tokens.emplace_back(L"</s>"); }
std::vector<int64_t> ids;
ids.reserve(tokens.size());
for (const auto& token : tokens) { ids.emplace_back(bpe_._lookup_vocab(token)); }
ARGenerationOutputPast convertMessage(const InternLM2Message& prompt) {
auto applied_string = InternLM2Message::message_template;
size_t pos = applied_string.find("{{{prompt}}}");
applied_string.replace(pos, 12, prompt.prompt);
auto tokens = tokenize(applied_string);
if (!prompt.add_bos && !tokens.empty() && tokens.front() == L"<s>") { tokens.erase(tokens.begin()); }
if (prompt.add_eos && (tokens.empty() || tokens.back() != L"</s>")) { tokens.emplace_back(L"</s>"); }
std::vector<int64_t> ids;
ids.reserve(tokens.size());
for (const auto& token : tokens) { ids.emplace_back(bpe_._lookup_vocab(token)); }
🧰 Tools
🪛 Clang (14.0.6)

[error] 95-95: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/models/internlm2/tokenization_internlm2.hpp around lines 93 to 105, the
function builds applied_string with the chat-format template but still calls
tokenize(prompt.prompt), so the special chat markers are never tokenized; change
the tokenize call to use applied_string (i.e., tokenize(applied_string)) and
then proceed with the existing BOS/EOS and vocab lookup logic on the resulting
tokens so the model receives the formatted prompt with role delimiters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants