Skip to content

Use Unicode normalization? #22

@pnorman

Description

@pnorman

Unicode defines normalized forms for characters and character classes.

It might work to normalize strings to NFKD and remove any characters of class Mn (Nonspacing_Mark) (see table 12)

It might be necessary to specially handle conversions like ß to ss

See also python stack overflow answer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions