Summary
The current Auto Detect search in STTM-WEB is not reliably finding expected results when users:
- type a full/partial panktee
- type in common roman spellings (e.g.
jo mange thakur)
- type with misspellings / phonetic variations (e.g.
jooo menge thakoor)
- type what they remember “loosely” instead of exact spelling
This creates a poor user experience because users often search from memory, and the search should still find the expected shabad/panktee.
We currently use Meilisearch. Meilisearch already supports typo tolerance and configurable relevancy (ranking rules + searchable attribute order), so we should improve our indexing + query strategy to better support transliteration and fuzzy matching. ([meilisearch.com]1)
Problem Statement
Current issues observed
-
Full panktee search is weak
- Searching a full line / near-full line does not reliably return the correct shabad/panktee.
-
Roman transliteration queries often fail
-
Mistyped / phonetic queries fail
-
Search seems overly dependent on “first letter of each shabad” style matching
- This is useful as a fallback, but should not be the primary behavior for most user searches.
Goal
Make Auto Detect search behave more like how humans search:
- If user types roman transliteration (even imperfectly), return the intended shabad.
- If user types Gurmukhi, return strong Gurbani matches.
- If user types acrostic/first-letter style, still support that as fallback.
- If user types meaning/English recall phrase, support that as lower-priority fallback.
Desired Ranking Priority (Auto Detect)
For Roman queries (Latin script)
- Strong transliteration match (exact / prefix / fuzzy) ✅ highest priority
- Gurmukhi direct match (if relevant)
- First-letter/acrostic Gurbani match
- Meaning/translation match (lower priority fallback)
For Gurmukhi queries
- Direct Gurbani match ✅ highest priority
- First-letter/acrostic Gurbani match
- Transliteration match (if useful)
- Meaning/translation match
Proposed Solution (Implementation Direction)
1) Expand indexed search fields (Meilisearch documents)
For each searchable unit (shabad / panktee / line), index multiple searchable fields, not just one.
Suggested fields (example names):
gurbani (original Gurmukhi text)
gurbani_first_letters (existing/derived acrostic)
transliteration (current transliteration)
transliteration_normalized (normalized roman text for fuzzy matching)
meaning / translation (English meaning / gloss)
- (optional)
transliteration_aliases (common spellings if available)
Why: Meilisearch relevancy depends heavily on which attributes are searchable and their order. Earlier searchable attributes are treated as more relevant. ([meilisearch.com]2)
2) Add query normalization (app-side) before sending to Meilisearch
Roman-input search quality will improve significantly if we normalize user input before querying.
Examples of normalization:
This can be done in a conservative way (don’t over-normalize).
Note: Meilisearch typo tolerance helps, but by default typo tolerance is limited based on word length (e.g., short words like jo, tu are less tolerant), so app-side normalization is important for roman transliteration use cases. ([meilisearch.com]1)
3) Detect script type (Roman vs Gurmukhi) and run ranked search strategy
Add lightweight query classification:
- Gurmukhi query
- Roman query
- Mixed query (edge case)
Then run search in priority order (multi-pass or weighted merge):
Option A (recommended): Multi-pass search + merge in app layer
Run multiple queries (or searches against different fields) and merge results with explicit priority buckets.
Example for Roman query:
- Pass 1: transliteration + transliteration_normalized
- Pass 2: gurbani_first_letters
- Pass 3: meaning
- Merge + dedupe + preserve priority
This gives us deterministic behavior and avoids fighting global index settings.
Option B: Single-pass search with tuned searchable attribute order
Possible, but harder to make behave differently for Roman vs Gurmukhi queries.
4) Tune Meilisearch typo tolerance and relevancy settings
Meilisearch supports:
- typo tolerance settings
- ranking rules
- searchable attribute ordering ([meilisearch.com]1)
We should evaluate:
- enabling/tuning typo tolerance on transliteration fields
- adjusting typo thresholds (
minWordSizeForTypos) if needed
- ensuring transliteration fields participate in search and ranking correctly
5) Add observability for search quality (debug mode / logging)
For QA and tuning:
- log query
- detected script
- normalized query
- search passes run
- top N results returned
- which field matched (if available)
- ranking score (optional during dev/QA)
This will make it easier to iterate quickly and compare improvements.
(Meilisearch can return ranking scores when configured in search parameters, useful for debugging relevance tuning.) ([meilisearch.com]3)
Acceptance Criteria
Functional
Ranking behavior
Quality / Regression
Suggested Test Queries (Initial QA Set)
Roman exact/common spellings
jo mange thakur
tu data datar
har har naam nidhan hai
Roman fuzzy/mistyped
jooo menge thakoor
jo maange thakur
too data datar
Gurmukhi
- (full panktee exact)
- (partial panktee)
- (first-letter style query)
Meaning fallback
those who ask from You
giver of gifts (or other known meaning phrases)
Out of Scope (for this ticket)
- Full semantic search / embeddings
- ML-based phonetic transliteration correction
- Personalized search ranking
- Cross-language intent understanding beyond current indexed fields
Implementation Notes / Hints
-
Start with small controlled dataset (few known shabads) for tuning.
-
Compare before/after relevance using fixed benchmark queries.
-
Prefer incremental rollout:
- Add fields + indexing
- Add query normalization
- Add script-aware ranking / multi-pass merge
- Tune typo tolerance + thresholds
Why this matters
Users often remember:
- a few words,
- approximate roman spelling,
- a sound-alike version,
- or a meaning snippet.
Auto Detect should feel forgiving and intuitive — especially for Sangat searching from memory.
Summary
The current Auto Detect search in STTM-WEB is not reliably finding expected results when users:
jo mange thakur)jooo menge thakoor)This creates a poor user experience because users often search from memory, and the search should still find the expected shabad/panktee.
We currently use Meilisearch. Meilisearch already supports typo tolerance and configurable relevancy (ranking rules + searchable attribute order), so we should improve our indexing + query strategy to better support transliteration and fuzzy matching. ([meilisearch.com]1)
Problem Statement
Current issues observed
Full panktee search is weak
Roman transliteration queries often fail
Example:
jo mange thakurtu data datarUsers expect the known shabad to appear, but results are missing or ranked poorly.
Mistyped / phonetic queries fail
Example:
jooo menge thakoorSearch should still return the intended shabad.
Search seems overly dependent on “first letter of each shabad” style matching
Goal
Make Auto Detect search behave more like how humans search:
Desired Ranking Priority (Auto Detect)
For Roman queries (Latin script)
For Gurmukhi queries
Proposed Solution (Implementation Direction)
1) Expand indexed search fields (Meilisearch documents)
For each searchable unit (shabad / panktee / line), index multiple searchable fields, not just one.
Suggested fields (example names):
gurbani(original Gurmukhi text)gurbani_first_letters(existing/derived acrostic)transliteration(current transliteration)transliteration_normalized(normalized roman text for fuzzy matching)meaning/translation(English meaning / gloss)transliteration_aliases(common spellings if available)Why: Meilisearch relevancy depends heavily on which attributes are searchable and their order. Earlier searchable attributes are treated as more relevant. ([meilisearch.com]2)
2) Add query normalization (app-side) before sending to Meilisearch
Roman-input search quality will improve significantly if we normalize user input before querying.
Examples of normalization:
trim extra spaces
lowercase
collapse repeated characters:
jooo→joo/jo(configurable)normalize common phonetic variants:
thakoor→thakurmenge→mange(if rule-based mapping is safe)remove punctuation/noise
This can be done in a conservative way (don’t over-normalize).
3) Detect script type (Roman vs Gurmukhi) and run ranked search strategy
Add lightweight query classification:
Then run search in priority order (multi-pass or weighted merge):
Option A (recommended): Multi-pass search + merge in app layer
Run multiple queries (or searches against different fields) and merge results with explicit priority buckets.
Example for Roman query:
This gives us deterministic behavior and avoids fighting global index settings.
Option B: Single-pass search with tuned searchable attribute order
Possible, but harder to make behave differently for Roman vs Gurmukhi queries.
4) Tune Meilisearch typo tolerance and relevancy settings
Meilisearch supports:
We should evaluate:
minWordSizeForTypos) if needed5) Add observability for search quality (debug mode / logging)
For QA and tuning:
This will make it easier to iterate quickly and compare improvements.
(Meilisearch can return ranking scores when configured in search parameters, useful for debugging relevance tuning.) ([meilisearch.com]3)
Acceptance Criteria
Functional
jo mange thakurreturns the expected shabad in top results (ideally top 1–3)tu data datarreturns the expected shabad in top resultsjooo menge thakoorstill returns the intended shabad in resultsRanking behavior
Quality / Regression
Suggested Test Queries (Initial QA Set)
Roman exact/common spellings
jo mange thakurtu data datarhar har naam nidhan haiRoman fuzzy/mistyped
jooo menge thakoorjo maange thakurtoo data datarGurmukhi
Meaning fallback
those who ask from Yougiver of gifts(or other known meaning phrases)Out of Scope (for this ticket)
Implementation Notes / Hints
Start with small controlled dataset (few known shabads) for tuning.
Compare before/after relevance using fixed benchmark queries.
Prefer incremental rollout:
Why this matters
Users often remember:
Auto Detect should feel forgiving and intuitive — especially for Sangat searching from memory.