Skip to content

Bug: Memory search fails to match words with punctuation in extractWords #569

@zesty-clawd

Description

@zesty-clawd

Bug: Memory search fails to match words with punctuation

Summary

The extractWords function in memory/inmemory.go uses space-only splitting and doesn't strip punctuation, causing memory search to miss relevant results when text contains punctuation or non-space whitespace.

Reproduction

Scenario: Add a session with content containing punctuation, then search for a word without punctuation.

// Add session with punctuation
session := makeSession("app1", "user1", "sess1", []*session.Event{
    {
        LLMResponse: model.LLMResponse{
            Content: genai.NewContentFromText("The agent works great!", genai.RoleModel),
        },
    },
})
memSvc.AddSession(ctx, session)

// Search for "great" (without punctuation)
resp, _ := memSvc.Search(ctx, &memory.SearchRequest{
    AppName: "app1",
    UserID:  "user1",
    Query:   "great",
})

// Expected: 1 memory found
// Actual: 0 memories (because stored token is "great!" not "great")

Root Cause

File: memory/inmemory.go, line ~155

func extractWords(text string) map[string]struct{} {
	res := make(map[string]struct{})

	for s := range strings.SplitSeq(text, " ") {  // ← Only splits on space
		if s == "" {
			continue
		}
		res[strings.ToLower(s)] = struct{}{}  // ← Doesn't strip punctuation
	}

	return res
}

Issues:

  1. Space-only splitting: strings.SplitSeq(text, " ") doesn't handle tabs, newlines, or multiple spaces
  2. No punctuation normalization: "great!" is stored as-is, won't match "great"
  3. Case sensitivity handled but not enough: Lowercasing happens after punctuation is included

Impact

  • Search accuracy degraded: Users searching for "error" won't find memories containing "error." or "error," or "error!"
  • Common patterns affected:
    • Sentences ending with punctuation (., !, ?)
    • Comma-separated lists
    • Quoted text
    • Multi-line responses with \n or \t

Proposed Fix

Replace space-only splitting with proper whitespace tokenization and strip punctuation:

func extractWords(text string) map[string]struct{} {
	res := make(map[string]struct{})

	for _, word := range strings.Fields(text) {  // Splits on all whitespace
		// Strip punctuation
		cleaned := strings.TrimFunc(word, func(r rune) bool {
			return !unicode.IsLetter(r) && !unicode.IsNumber(r)
		})
		if cleaned == "" {
			continue
		}
		res[strings.ToLower(cleaned)] = struct{}{}
	}

	return res
}

Alternative: Use a proper tokenizer/stemmer for production-grade search, but the above fix would resolve the immediate issue.

Test Case to Add

{
    name: "match words with punctuation",
    initSessions: []session.Session{
        makeSession(t, "app1", "user1", "sess1", []*session.Event{
            {
                LLMResponse: model.LLMResponse{
                    Content: genai.NewContentFromText("Error: connection timeout! Please retry.", genai.RoleModel),
                },
            },
        }),
    },
    req: &memory.SearchRequest{
        AppName: "app1",
        UserID:  "user1",
        Query:   "error timeout retry",  // No punctuation
    },
    wantResp: &memory.SearchResponse{
        Memories: []memory.Entry{/* should find the memory */},
    },
},

Environment

  • Version: main branch (commit: latest as of 2026-02-16)
  • Go version: 1.22+

Additional Context

This is particularly problematic for AI agent memory since LLM responses naturally contain punctuation. The current implementation significantly reduces search recall in real-world usage.

Metadata

Metadata

Labels

bugSomething isn't workinghelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions