Update llama.py - Fix embedding generation error #5

MikeLP · 2025-07-12T11:57:32Z

Replace llama_kv_cache_clear -> llama_kv_self_clear.
Revert back until llama_kv_cache_clear function will be fixed

Replace llama_kv_cache_clear -> llama_kv_self_clear

Copilot

Pull Request Overview

This PR fixes an embedding generation error by replacing calls to the wrong cache-clear function with the correct one.

Calls to llama_kv_cache_clear have been updated to llama_kv_self_clear in the embedding flow.
Ensures the context cache is properly cleared before and after decoding batches.

Comments suppressed due to low confidence (1)

llama_cpp/llama.py:982

Add or update unit tests for the embed function to verify that embeddings are generated correctly with llama_kv_self_clear and that the cache is fully cleared before and after decoding.

        data: Union[List[List[float]], List[List[List[float]]]] = []

Copilot · 2025-07-12T11:57:58Z

llama_cpp/llama.py

@@ -982,7 +982,7 @@ def embed(
        data: Union[List[List[float]], List[List[List[float]]]] = []

        def decode_batch(seq_sizes: List[int]):


[nitpick] Consider adding a brief comment explaining why llama_kv_self_clear is used here instead of the previous llama_kv_cache_clear, to clarify the intended cache-clearing behavior for future maintainers.

Suggested change

def decode_batch(seq_sizes: List[int]):

def decode_batch(seq_sizes: List[int]):

# Clear the self-attention key-value cache to prepare for decoding the next batch.

# `llama_kv_self_clear` is used here instead of `llama_kv_cache_clear` because it specifically

# clears the cache for self-attention mechanisms, which is required for accurate embedding generation.

MikeLP · 2025-07-13T03:09:19Z

We can close this PR if llama_kv_cache_clear error will be fixed

Update llama.py - Fix embedding generation error

3d30f0c

Replace llama_kv_cache_clear -> llama_kv_self_clear

Copilot AI review requested due to automatic review settings July 12, 2025 11:57

Copilot AI reviewed Jul 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update llama.py - Fix embedding generation error #5

Update llama.py - Fix embedding generation error #5

Uh oh!

MikeLP commented Jul 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 12, 2025

Uh oh!

MikeLP commented Jul 13, 2025

Uh oh!

Uh oh!

		@@ -982,7 +982,7 @@ def embed(
		data: Union[List[List[float]], List[List[List[float]]]] = []

		def decode_batch(seq_sizes: List[int]):

-        def decode_batch(seq_sizes: List[int]):
+        def decode_batch(seq_sizes: List[int]):
+            # Clear the self-attention key-value cache to prepare for decoding the next batch.
+            # `llama_kv_self_clear` is used here instead of `llama_kv_cache_clear` because it specifically
+            # clears the cache for self-attention mechanisms, which is required for accurate embedding generation.

Update llama.py - Fix embedding generation error #5

Are you sure you want to change the base?

Update llama.py - Fix embedding generation error #5

Uh oh!

Conversation

MikeLP commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

MikeLP commented Jul 13, 2025

Uh oh!

Uh oh!

MikeLP commented Jul 12, 2025 •

edited

Loading