Skip to content

Update llama.py - Fix embedding generation error #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MikeLP
Copy link

@MikeLP MikeLP commented Jul 12, 2025

Replace llama_kv_cache_clear -> llama_kv_self_clear.
Revert back until llama_kv_cache_clear function will be fixed

Replace llama_kv_cache_clear -> llama_kv_self_clear
@Copilot Copilot AI review requested due to automatic review settings July 12, 2025 11:57
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an embedding generation error by replacing calls to the wrong cache-clear function with the correct one.

  • Calls to llama_kv_cache_clear have been updated to llama_kv_self_clear in the embedding flow.
  • Ensures the context cache is properly cleared before and after decoding batches.
Comments suppressed due to low confidence (1)

llama_cpp/llama.py:982

  • Add or update unit tests for the embed function to verify that embeddings are generated correctly with llama_kv_self_clear and that the cache is fully cleared before and after decoding.
        data: Union[List[List[float]], List[List[List[float]]]] = []

@@ -982,7 +982,7 @@ def embed(
data: Union[List[List[float]], List[List[List[float]]]] = []

def decode_batch(seq_sizes: List[int]):
Copy link
Preview

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider adding a brief comment explaining why llama_kv_self_clear is used here instead of the previous llama_kv_cache_clear, to clarify the intended cache-clearing behavior for future maintainers.

Suggested change
def decode_batch(seq_sizes: List[int]):
def decode_batch(seq_sizes: List[int]):
# Clear the self-attention key-value cache to prepare for decoding the next batch.
# `llama_kv_self_clear` is used here instead of `llama_kv_cache_clear` because it specifically
# clears the cache for self-attention mechanisms, which is required for accurate embedding generation.

Copilot uses AI. Check for mistakes.

@MikeLP
Copy link
Author

MikeLP commented Jul 13, 2025

We can close this PR if llama_kv_cache_clear error will be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant