feat: add support of internlm2.5-1.8B-chat model #515

Sp0tless · 2025-11-12T09:11:32Z

Summary by CodeRabbit

New Features
- Added support for InternLM2 model architecture with configuration and tokenization
- Added interactive CLI example demonstrating InternLM2 model inference and chat functionality
- Included pre-configured setup for InternLM2 1.8B variant

coderabbitai · 2025-11-12T09:11:43Z

Walkthrough

Adds InternLM2.5 model support including build configuration, example application, and a complete model stack. Introduces model configuration parameters, a C++ implementation featuring attention mechanisms, MLP layers, rotary embeddings, KV caching, and an interactive tokenizer with BPE support for text-to-token conversion and generation.

Changes

Cohort / File(s)	Summary
Build Configuration `examples/CMakeLists.txt`, `examples/internlm2_5/CMakeLists.txt`	Adds internlm2_5 subdirectory to build system and creates executable target `mllm-internlm2-5-chat-runner` with runtime and CPU backend dependencies.
Example Resources `examples/internlm2_5/config_1.8B.json`	Model configuration file specifying InternLM2ForCausalLM hyperparameters (hidden size, attention heads, layer count, rope scaling, token IDs, dtype).
Example Application `examples/internlm2_5/main.cpp`	Interactive CLI chat runner with Argparse flags, model initialization, tokenization pipeline, token-by-token generation, performance timing (Perfetto), and memory reporting.
Model Configuration `mllm/models/internlm2/configuration_internlm2.hpp`	Struct `InternLM2Config` parsing JSON hyperparameters, computing derived fields (head_dim, max_cache_length), supporting optional rope_scaling and linear implementation type mapping.
Model Implementation `mllm/models/internlm2/modeling_internlm2.hpp`	Complete model stack: utility functions for RoPE frequency and rotary embeddings, `InternLM2MLP` with gating, `InternLM2Attention` with causal masking and KV caching, `InternLM2Decoder` with residual connections, `InternLM2Model` container, and `InternLM2ForCausalLM` causal LM wrapper with dynamic rope scaling and position ID handling.
Tokenizer `mllm/models/internlm2/tokenization_internlm2.hpp`	Struct `InternLM2Message` for prompt templating; class `InternLM2Tokenizer` deriving from AutoTokenizer with BPE vocabulary loading, bytes-to-unicode mapping, special token registration, tokenize/detokenize methods, and `convertMessage` for sequence tensor generation.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as main.cpp
    participant Tokenizer as InternLM2Tokenizer
    participant Model as InternLM2ForCausalLM
    participant Attention as InternLM2Attention
    participant Cache as KV Cache

    User->>CLI: Input prompt
    CLI->>Tokenizer: convertMessage(prompt)
    Tokenizer->>Tokenizer: Tokenize with BPE
    Tokenizer-->>CLI: Token sequence (Tensor)
    
    CLI->>Model: forward(tokens, position_ids)
    Model->>Model: Embed tokens + RoPE
    loop Each Decoder Block
        Model->>Attention: forward(hidden, KV cache)
        Attention->>Attention: Apply rotary embeddings
        Attention->>Attention: Compute attention with causal mask
        Attention->>Cache: Store/retrieve KV
        Attention-->>Model: Attended output
        Model->>Model: MLP + residuals
    end
    Model-->>CLI: Logits
    
    CLI->>CLI: Sample next token
    loop Generate Tokens
        CLI->>Model: forward(next_token, updated_position_ids)
        Model->>Cache: Use cached KV states
        Model-->>CLI: Logits
        CLI->>Tokenizer: Detokenize token
        Tokenizer-->>CLI: Text
        CLI->>User: Output text token
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

High-density mathematical logic: RoPE computation, attention mechanisms with dynamic scaling, and causal masking require careful verification
Complex state management: KV cache integration, buffer registration for inv_freq_base, and layer-index tracking across decoder blocks
Tokenizer implementation: BPE vocabulary handling, bytes-to-unicode mapping, special token trie registration, and message template processing
Multiple interconnected public interfaces: Six new exported classes/structs with non-trivial forward signatures and initialization paths
Key areas requiring extra attention:
- modeling_internlm2.hpp: RoPE frequency scaling logic (dynamic vs. linear modes), attention mask computation, and cache state transitions
- tokenization_internlm2.hpp: BPE token lookup correctness and special token handling edge cases
- configuration_internlm2.hpp: JSON parsing and derived field calculations (head_dim, max_cache_length)
- main.cpp: Model loading error paths and CLI argument validation

Suggested reviewers

yirongjie
oreomaker

Poem

🐰 A rabbit hops through embeddings grand,
With RoPE and attention hand-in-hand,
InternLM2 now joins the fray,
Tokenizing tokens the BPE way!
KV caches cached, let generation bloom—
Chat runners dancing out of the room! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately describes the main change: adding support for the internlm2.5-1.8B-chat model. The summary confirms new model configuration, tokenizer, and inference implementation files for this specific model.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chenghuaWang · 2025-11-12T11:00:32Z

@coderabbitai review

coderabbitai · 2025-11-12T11:00:37Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a16b55 and 190c7e7.

📒 Files selected for processing (7)

examples/CMakeLists.txt (1 hunks)
examples/internlm2_5/CMakeLists.txt (1 hunks)
examples/internlm2_5/config_1.8B.json (1 hunks)
examples/internlm2_5/main.cpp (1 hunks)
mllm/models/internlm2/configuration_internlm2.hpp (1 hunks)
mllm/models/internlm2/modeling_internlm2.hpp (1 hunks)
mllm/models/internlm2/tokenization_internlm2.hpp (1 hunks)

🧰 Additional context used

🪛 Clang (14.0.6)

examples/internlm2_5/main.cpp

[error] 1-1: 'iostream' file not found

(clang-diagnostic-error)

[error] 9-9: variable 'MLLM_MAIN' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 9-9: invalid case style for variable 'MLLM_MAIN'

(readability-identifier-naming,-warnings-as-errors)

mllm/models/internlm2/tokenization_internlm2.hpp

[error] 6-6: 'regex' file not found

(clang-diagnostic-error)

[error] 18-18: constructor does not initialize these fields: prompt

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 20-20: variable 'message_template' is non-const and globally accessible, consider making it const

(cppcoreguidelines-avoid-non-const-global-variables,-warnings-as-errors)

[error] 27-27: constructor does not initialize these fields: bpe_, bytes_2_unicode_dict_, bytes_2_unicode_dict_inverse_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 30-30: variable 'bytes_2_unicode_dict_' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 30-30: invalid case style for variable 'bytes_2_unicode_dict_'

(readability-identifier-naming,-warnings-as-errors)

[error] 82-82: variable 'ret' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 95-95: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 106-106: variable 'sequence' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 123-123: variable 'normalized' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 124-124: variable 'space_char' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 125-125: variable 'underline_char' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 127-127: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 141-141: variable 'processed' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 149-149: redundant access specifier has the same accessibility as the previous access specifier

(readability-redundant-access-specifiers,-warnings-as-errors)

mllm/models/internlm2/modeling_internlm2.hpp

[error] 4-4: 'cmath' file not found

(clang-diagnostic-error)

[error] 18-18: 3 adjacent parameters of 'makeRoPEInvFreq' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 71-71: constructor does not initialize these fields: w1_, w3_, w2_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 80-80: constructor does not initialize these fields: w1_, w3_, w2_, silu_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 80-80: 2 adjacent parameters of 'InternLM2MLP' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 88-88: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 90-90: variable name 'y' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 97-97: constructor does not initialize these fields: wqkv_, o_proj_, q_rope_, k_rope_, mask_, softmax_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 114-114: constructor does not initialize these fields: wqkv_, o_proj_, q_rope_, k_rope_, mask_, softmax_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 114-114: 2 adjacent parameters of 'InternLM2Attention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 119-119: 'num_key_value_groups_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 131-131: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 132-132: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 139-139: variable 'B' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 139-139: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 139-139: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)

[error] 140-140: variable 'S' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 140-140: variable name 'S' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 140-140: invalid case style for variable 'S'

(readability-identifier-naming,-warnings-as-errors)

[error] 175-175: member variable 'layer_idx_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 178-178: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 185-185: redundant access specifier has the same accessibility as the previous access specifier

(readability-redundant-access-specifiers,-warnings-as-errors)

[error] 188-188: constructor does not initialize these fields: input_layer_norm_, post_attention_layer_norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 188-188: 2 adjacent parameters of 'InternLM2Decoder' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 195-195: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 200-200: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 210-210: constructor does not initialize these fields: embedding_, layers_, norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 218-218: constructor does not initialize these fields: embedding_, layers_, norm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 218-218: 2 adjacent parameters of 'InternLM2Model' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 225-225: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 226-226: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 240-240: constructor does not initialize these fields: cfg_, lm_head_, kv_cache_, rope_scaling_type_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 265-265: variable 'position_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 286-286: variable 'inv_freq' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 300-300: variable name 'S' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 300-300: invalid case style for variable 'S'

(readability-identifier-naming,-warnings-as-errors)

[error] 320-320: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 321-321: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

mllm/models/internlm2/configuration_internlm2.hpp

[error] 6-6: 'string' file not found

(clang-diagnostic-error)

[error] 13-13: constructor does not initialize these fields: rope_scaling_type

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 16-16: constructor does not initialize these fields: rope_scaling_type

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 24-24: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)

[error] 65-65: 1e-6f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 65-65: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 66-66: 10000.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 66-66: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 67-67: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 70-70: 0.02f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 70-70: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

coderabbitai · 2025-11-12T11:05:10Z

examples/internlm2_5/main.cpp

+  fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI ");
+  fmt::print("Enter 'exit' or 'quit' to end the session\n\n");


⚠️ Potential issue | 🟡 Minor

Fix the banner text
The banner still says “InternLM2.5 1.5B CLI”, which is misleading now that this runner targets the 1.8 B chat model. Please update the label to match the actual model size.

Apply this diff:

- fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI "); + fmt::print("\n{:*^60}\n", " InternLM2.5 1.8B CLI ");

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI ");

fmt::print("Enter 'exit' or 'quit' to end the session\n\n");

fmt::print("\n{:*^60}\n", " InternLM2.5 1.8B CLI ");

fmt::print("Enter 'exit' or 'quit' to end the session\n\n");

🤖 Prompt for AI Agents

In examples/internlm2_5/main.cpp around lines 38 to 39, the banner currently reads "InternLM2.5 1.5B CLI" but the runner targets the 1.8B chat model; update the banner text to "InternLM2.5 1.8B CLI" by changing the formatted string passed to fmt::print so the displayed label reflects the correct model size.

coderabbitai · 2025-11-12T11:05:10Z

examples/internlm2_5/main.cpp

+  std::string prompt_text;
+  fmt::print("💬 Prompt text (or 'exit/quit'): ");
+  std::getline(std::cin, prompt_text);
+  if (!(prompt_text == "exit" || prompt_text == "quit")) {
+    try {
+      fmt::print("🔄 Processing...\n");
+      mllm::models::internlm2::InternLM2Message prompt{prompt_text};
+      auto inputs = tokenizer.convertMessage(prompt);
+
+      fmt::print("\n🤖 Response: ");
+      for (auto& step : model.chat(inputs)) {
+        auto token = tokenizer.detokenize(step.cur_token_id);
+        std::wcout << token << std::flush;
+      }
+      fmt::print("\n{}\n", std::string(60, '-'));
+    } catch (const std::exception& e) { fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-')); }
+    model.perfSummary();
+  }


⚠️ Potential issue | 🟠 Major

Loop the interactive chat
The CLI prints “Enter 'exit' or 'quit' to end the session”, but it only processes a single prompt and then exits. Users can’t hold a multi-turn conversation, which defeats the whole purpose of exposing a chat runner. Wrap the prompt/response block in a loop so exit/quit truly controls termination.

Apply this diff to address the issue:

- std::string prompt_text; - fmt::print("💬 Prompt text (or 'exit/quit'): "); - std::getline(std::cin, prompt_text); - if (!(prompt_text == "exit" || prompt_text == "quit")) { - try { - fmt::print("🔄 Processing...\n"); - mllm::models::internlm2::InternLM2Message prompt{prompt_text}; - auto inputs = tokenizer.convertMessage(prompt); - - fmt::print("\n🤖 Response: "); - for (auto& step : model.chat(inputs)) { - auto token = tokenizer.detokenize(step.cur_token_id); - std::wcout << token << std::flush; - } - fmt::print("\n{}\n", std::string(60, '-')); - } catch (const std::exception& e) { fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-')); } - model.perfSummary(); - } + for (std::string prompt_text;;) { + fmt::print("💬 Prompt text (or 'exit/quit'): "); + if (!std::getline(std::cin, prompt_text)) { break; } + if (prompt_text == "exit" || prompt_text == "quit") { break; } + try { + fmt::print("🔄 Processing...\n"); + mllm::models::internlm2::InternLM2Message prompt{prompt_text}; + auto inputs = tokenizer.convertMessage(prompt); + + fmt::print("\n🤖 Response: "); + for (auto& step : model.chat(inputs)) { + auto token = tokenizer.detokenize(step.cur_token_id); + std::wcout << token << std::flush; + } + fmt::print("\n{}\n", std::string(60, '-')); + } catch (const std::exception& e) { + fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-')); + } + model.perfSummary(); + }  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion for (std::string prompt_text;;) { fmt::print("💬 Prompt text (or 'exit/quit'): "); if (!std::getline(std::cin, prompt_text)) { break; } if (prompt_text == "exit" || prompt_text == "quit") { break; } try { fmt::print("🔄 Processing...\n"); mllm::models::internlm2::InternLM2Message prompt{prompt_text}; auto inputs = tokenizer.convertMessage(prompt); fmt::print("\n🤖 Response: "); for (auto& step : model.chat(inputs)) { auto token = tokenizer.detokenize(step.cur_token_id); std::wcout << token << std::flush; } fmt::print("\n{}\n", std::string(60, '-')); } catch (const std::exception& e) { fmt::print("\n❌ Error: {}\n{}\n", e.what(), std::string(60, '-')); } model.perfSummary(); }

🤖 Prompt for AI Agents

In examples/internlm2_5/main.cpp around lines 41-58, the interactive prompt is executed only once; wrap the prompt/response block in a loop so users can hold a multi-turn conversation. Replace the single read+if with a for/while loop that repeatedly prints the prompt, uses std::getline to read into prompt_text (breaking the loop on EOF), and breaks when the input equals "exit" or "quit"; keep the existing try/catch/response printing and call model.perfSummary() after each iteration.

coderabbitai · 2025-11-12T11:05:10Z

mllm/models/internlm2/configuration_internlm2.hpp

+    head_dim = hidden_size / num_attention_heads;
+    max_cache_length = max_position_embeddings;
+    end_of_text_token_id = static_cast<int32_t>(eos_token_id);
+  }


⚠️ Potential issue | 🟠 Major

Validate head dimension before use
head_dim is computed with plain integer division. If hidden_size isn’t divisible by num_attention_heads (or if the value is zero), this silently truncates and breaks attention math later on. The reference InternLM2 implementations explicitly guard against this with a runtime check. Please add the same validation before assigning head_dim so that bad configs fail fast instead of producing corrupted tensors. (huggingface.co)

Use this diff as a starting point:

- head_dim = hidden_size / num_attention_heads; + if (num_attention_heads <= 0 || hidden_size % num_attention_heads != 0) { + throw std::invalid_argument( + fmt::format("hidden_size ({}) must be divisible by num_attention_heads ({})", + hidden_size, num_attention_heads)); + } + head_dim = hidden_size / num_attention_heads;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

head_dim = hidden_size / num_attention_heads;

max_cache_length = max_position_embeddings;

end_of_text_token_id = static_cast<int32_t>(eos_token_id);

}

if (num_attention_heads <= 0 || hidden_size % num_attention_heads != 0) {

throw std::invalid_argument(

fmt::format("hidden_size ({}) must be divisible by num_attention_heads ({})",

hidden_size, num_attention_heads));

}

head_dim = hidden_size / num_attention_heads;

max_cache_length = max_position_embeddings;

end_of_text_token_id = static_cast<int32_t>(eos_token_id);

}

🤖 Prompt for AI Agents

In mllm/models/internlm2/configuration_internlm2.hpp around lines 50 to 53, validate that num_attention_heads is > 0 and that hidden_size is divisible by num_attention_heads before computing head_dim; if either check fails, throw or log a clear runtime error (include hidden_size and num_attention_heads in the message) so misconfigured models fail fast rather than producing truncated head_dim, then compute head_dim = hidden_size / num_attention_heads and proceed with max_cache_length and end_of_text_token_id assignments.

coderabbitai · 2025-11-12T11:05:10Z

mllm/models/internlm2/modeling_internlm2.hpp

+    Tensor position_ids;
+    if (input.count("position_ids")) {
+      position_ids = input.at("position_ids");
+      if (seq_len == 1) {
+        auto last_pos = *position_ids.offsettedPtr<int64_t>({0, position_ids.shape()[1] - 1});
+        position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc();
+        *position_ids.offsettedPtr<int64_t>({0, 0}) = last_pos + 1;
+      }
+    } else {
+      position_ids = Tensor::empty({batch_size, seq_len}, kInt64, kCPU).alloc();
+      auto position_ids_ptr = position_ids.ptr<int64_t>();
+      for (int b = 0; b < batch_size; ++b) {
+        for (int s = 0; s < seq_len; ++s) { position_ids_ptr[b * seq_len + s] = s; }
+      }


⚠️ Potential issue | 🟠 Major

position_ids increment only updates batch 0.
In the single-token decode path, we allocate a new {batch_size, 1} tensor but write last_pos + 1 only for row 0. Every other batch element keeps undefined data, so downstream RoPE receives incorrect (often zero) positions and generation breaks for batched decoding. Iterate over every batch entry when populating the new tensor.

- if (seq_len == 1) { - auto last_pos = *position_ids.offsettedPtr<int64_t>({0, position_ids.shape()[1] - 1}); - position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc(); - *position_ids.offsettedPtr<int64_t>({0, 0}) = last_pos + 1; + if (seq_len == 1) { + auto provided = position_ids; + auto last_col = provided.shape()[1] - 1; + position_ids = Tensor::empty({batch_size, 1}, kInt64, kCPU).alloc(); + auto new_ptr = position_ids.ptr<int64_t>(); + auto provided_ptr = provided.ptr<int64_t>(); + auto stride = provided.shape()[1]; + for (int b = 0; b < batch_size; ++b) { + new_ptr[b] = provided_ptr[b * stride + last_col] + 1; + }

🧰 Tools

🪛 Clang (14.0.6)

[error] 265-265: variable 'position_ids' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents

In mllm/models/internlm2/modeling_internlm2.hpp around lines 265-278, the single-token decode path creates a {batch_size,1} position_ids tensor but only writes last_pos+1 to row 0, leaving other rows undefined; update the allocation path to fill every batch row with last_pos+1 (e.g., loop b from 0 to batch_size-1 and write last_pos+1 into position_ids at {b,0} using the same offsettedPtr or pointer arithmetic) so all batch elements have correct position ids for RoPE.

coderabbitai · 2025-11-12T11:05:11Z

mllm/models/internlm2/tokenization_internlm2.hpp

+  ARGenerationOutputPast convertMessage(const InternLM2Message& prompt) {
+    auto applied_string = InternLM2Message::message_template;
+    size_t pos = applied_string.find("{{{prompt}}}");
+    applied_string.replace(pos, 12, prompt.prompt);
+    auto tokens = tokenize(prompt.prompt);
+
+    if (!prompt.add_bos && !tokens.empty() && tokens.front() == L"<s>") { tokens.erase(tokens.begin()); }
+    if (prompt.add_eos && (tokens.empty() || tokens.back() != L"</s>")) { tokens.emplace_back(L"</s>"); }
+
+    std::vector<int64_t> ids;
+    ids.reserve(tokens.size());
+    for (const auto& token : tokens) { ids.emplace_back(bpe_._lookup_vocab(token)); }
+


⚠️ Potential issue | 🟠 Major

Message template is never applied in convertMessage.
applied_string builds the chat-format prompt (with <|im_start|> / <|im_end|> markers), but the code still tokenizes prompt.prompt, so those special tokens never reach the model. As a result, generated conversations miss the role delimiters the model expects, breaking adherence to InternLM2's chat format. Please tokenize the formatted string instead.

- auto tokens = tokenize(prompt.prompt); + auto tokens = tokenize(applied_string);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ARGenerationOutputPast convertMessage(const InternLM2Message& prompt) {

auto applied_string = InternLM2Message::message_template;

size_t pos = applied_string.find("{{{prompt}}}");

applied_string.replace(pos, 12, prompt.prompt);

auto tokens = tokenize(prompt.prompt);

if (!prompt.add_bos && !tokens.empty() && tokens.front() == L"<s>") { tokens.erase(tokens.begin()); }

if (prompt.add_eos && (tokens.empty() || tokens.back() != L"</s>")) { tokens.emplace_back(L"</s>"); }

std::vector<int64_t> ids;

ids.reserve(tokens.size());

for (const auto& token : tokens) { ids.emplace_back(bpe_._lookup_vocab(token)); }

ARGenerationOutputPast convertMessage(const InternLM2Message& prompt) {

auto applied_string = InternLM2Message::message_template;

size_t pos = applied_string.find("{{{prompt}}}");

applied_string.replace(pos, 12, prompt.prompt);

auto tokens = tokenize(applied_string);

if (!prompt.add_bos && !tokens.empty() && tokens.front() == L"<s>") { tokens.erase(tokens.begin()); }

if (prompt.add_eos && (tokens.empty() || tokens.back() != L"</s>")) { tokens.emplace_back(L"</s>"); }

std::vector<int64_t> ids;

ids.reserve(tokens.size());

for (const auto& token : tokens) { ids.emplace_back(bpe_._lookup_vocab(token)); }

🧰 Tools

🪛 Clang (14.0.6)

[error] 95-95: variable 'pos' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

🤖 Prompt for AI Agents

In mllm/models/internlm2/tokenization_internlm2.hpp around lines 93 to 105, the function builds applied_string with the chat-format template but still calls tokenize(prompt.prompt), so the special chat markers are never tokenized; change the tokenize call to use applied_string (i.e., tokenize(applied_string)) and then proceed with the existing BOS/EOS and vocab lookup logic on the resulting tokens so the model receives the formatted prompt with role delimiters.

feat: implement internlm2.5-1.8B-chat

190c7e7

coderabbitai bot reviewed Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support of internlm2.5-1.8B-chat model #515

feat: add support of internlm2.5-1.8B-chat model #515

Uh oh!

Sp0tless commented Nov 12, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

chenghuaWang commented Nov 12, 2025

Uh oh!

coderabbitai bot commented Nov 12, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		fmt::print("\n{:*^60}\n", " InternLM2.5 1.5B CLI ");
		fmt::print("Enter 'exit' or 'quit' to end the session\n\n");

feat: add support of internlm2.5-1.8B-chat model #515

Are you sure you want to change the base?

feat: add support of internlm2.5-1.8B-chat model #515

Uh oh!

Conversation

Sp0tless commented Nov 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

chenghuaWang commented Nov 12, 2025

Uh oh!

coderabbitai bot commented Nov 12, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sp0tless commented Nov 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 12, 2025 •

edited

Loading