Skip to content
/ hustem Public

Create a better Snowball Hungarian stemmer .sbl config file via TDD

License

Notifications You must be signed in to change notification settings

damesek/hustem

Folders and files

NameName
Last commit message
Last commit date
Oct 28, 2023
Oct 28, 2023
Oct 28, 2023
Oct 28, 2023
Oct 28, 2023
Oct 28, 2023
Oct 28, 2023
Oct 28, 2023
Oct 28, 2023
Oct 28, 2023
Oct 28, 2023

Repository files navigation

HuStem: the hungarian stemmer

WIP

HuStem was created to work with Snowball's .sbl files. The goal is to update the quality of the Hungarian stemmer config file. To achieve this, I first try to identify the issues and then proceed with tests, following a TDD (Test-Driven Development) workflow.

Error rate:

{:hunspell 32,91%, :snowball 47,41%, :hunspell-mdb 32,91%}

This means Hunspell is 27% more accurate than Snowball. I tested two different dic/aff sources, but there was no difference in efficiency.

Snowball cli basics

Test the new Hungarian Snowball SBL file from Snowball root folder (src/resources/snowball-master)

make && echo "baglyokat" | ./stemwords -l hungarian

The hungarian words dict to check the results

grep "teremt" ../magyar-szavak.txt

Compile the Java sources

Todo: add as prep-task

clj -T:build clean
clj -T:build compile-java

License

Copyright © 2023 FIXME

This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.

This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.

About

Create a better Snowball Hungarian stemmer .sbl config file via TDD

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published