Skip to content

Unable to use character folding #41

@richin13

Description

@richin13

Description

First of all thanks for maintaining this useful library 👍🏼

I was trying to follow the docs on character folding but so far I've been unable to get it to work. The error I'm getting is this:

Traceback (most recent call last):
  File "/home/ricardo/src/sandbox/padron-parser/repro.py", line 12, in <module>
    writer.add_document(name=u"René Descartes")
  File "/home/ricardo/src/sandbox/padron-parser/.venv/lib/python3.11/site-packages/whoosh/writing.py", line 750, in add_document
    for tbytes, freq, weight, vbytes in items:
  File "/home/ricardo/src/sandbox/padron-parser/.venv/lib/python3.11/site-packages/whoosh/fields.py", line 164, in index
    for tstring, freq, wt, vbytes in word_values(value, ana, **kwargs):
  File "/home/ricardo/src/sandbox/padron-parser/.venv/lib/python3.11/site-packages/whoosh/formats.py", line 223, in word_values
    for t in tokens(value, analyzer, kwargs):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/src/sandbox/padron-parser/.venv/lib/python3.11/site-packages/whoosh/formats.py", line 125, in tokens
    gen = analyzer(value, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: CharsetFilter.__call__() got an unexpected keyword argument 'mode'

Here's the minimal reproducible example

from whoosh import analysis, fields, index
from whoosh.support.charset import accent_map

analyzer = analysis.CharsetFilter(accent_map)
index_path = "my_index"
schema = fields.Schema(
    name=fields.TEXT(analyzer=analyzer, stored=True),
)

ix = index.create_in(index_path, schema)
writer = ix.writer()
writer.add_document(name=u"René Descartes")
writer.add_document(name=u"Ñame Frito")
writer.commit()

I tried manually removing the mode item from the kwargs dict being passed in formats.py:125 but then got a similar error this time with KeyError: positions

Env details

  • Whoosh version: 2.7.4
  • Python version: 3.11.2
  • GNU/Linux (ArchLinux x86_64)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions