{"id":27056496,"url":"https://github.com/damesek/hustem","last_synced_at":"2026-05-01T13:32:00.195Z","repository":{"id":204159011,"uuid":"711232951","full_name":"damesek/hustem","owner":"damesek","description":"Create a better Snowball Hungarian stemmer .sbl config file via TDD","archived":false,"fork":false,"pushed_at":"2023-10-28T19:38:45.000Z","size":14044,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-05T10:17:44.547Z","etag":null,"topics":["clojure","error-rate","hunspell","snowball","stem"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/damesek.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-10-28T15:59:17.000Z","updated_at":"2023-10-28T19:39:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"7cddd01f-66f7-4730-95bd-eeee9c881d7d","html_url":"https://github.com/damesek/hustem","commit_stats":null,"previous_names":["damesek/hustem"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/damesek/hustem","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damesek%2Fhustem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damesek%2Fhustem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damesek%2Fhustem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damesek%2Fhustem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/damesek","download_url":"https://codeload.github.com/damesek/hustem/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/damesek%2Fhustem/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32499681,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","error-rate","hunspell","snowball","stem"],"created_at":"2025-04-05T10:17:51.625Z","updated_at":"2026-05-01T13:32:00.159Z","avatar_url":"https://github.com/damesek.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HuStem: the hungarian stemmer\n\nWIP\n\n\n\nHuStem was created to work with Snowball's .sbl files.\nThe goal is to update the quality of the Hungarian stemmer config file.\nTo achieve this, I first try to identify the issues and then proceed with tests, following a TDD (Test-Driven Development) workflow.\n\n## Error rate: \n\n```clojure\n{:hunspell 32,91%, :snowball 47,41%, :hunspell-mdb 32,91%}\n```\nThis means Hunspell is 27% more accurate than Snowball.\nI tested two different dic/aff sources, but there was no difference in efficiency.\n\n## Snowball cli basics\n\nTest the new Hungarian Snowball SBL file from Snowball root folder \n(src/resources/snowball-master)\n\n```bash\nmake \u0026\u0026 echo \"baglyokat\" | ./stemwords -l hungarian\n```\n\nThe hungarian words dict to check the results\n```bash \ngrep \"teremt\" ../magyar-szavak.txt\n```\n\n## Compile the Java sources\n\nTodo: add as prep-task\n\n```bash\nclj -T:build clean\nclj -T:build compile-java\n```\n\n\n\n\n## License\n\nCopyright © 2023 FIXME\n\nThis program and the accompanying materials are made available under the\nterms of the Eclipse Public License 2.0 which is available at\nhttp://www.eclipse.org/legal/epl-2.0.\n\nThis Source Code may also be made available under the following Secondary\nLicenses when the conditions for such availability set forth in the Eclipse\nPublic License, v. 2.0 are satisfied: GNU General Public License as published by\nthe Free Software Foundation, either version 2 of the License, or (at your\noption) any later version, with the GNU Classpath Exception which is available\nat https://www.gnu.org/software/classpath/license.html.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdamesek%2Fhustem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdamesek%2Fhustem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdamesek%2Fhustem/lists"}