{"id":19316588,"url":"https://github.com/tokenmill/snowball","last_synced_at":"2026-03-01T12:33:23.305Z","repository":{"id":34084033,"uuid":"37906625","full_name":"tokenmill/snowball","owner":"tokenmill","description":"Snowball version of the Porter stemmer for the Lithuanian language.","archived":false,"fork":false,"pushed_at":"2019-09-04T10:59:45.000Z","size":33,"stargazers_count":7,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-24T04:42:07.951Z","etag":null,"topics":["lithuanian-language","nlp","porter-stemmer","snowball","stemmer"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tokenmill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-06-23T08:27:37.000Z","updated_at":"2023-09-24T14:02:49.000Z","dependencies_parsed_at":"2022-07-18T01:10:52.870Z","dependency_job_id":null,"html_url":"https://github.com/tokenmill/snowball","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tokenmill/snowball","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fsnowball","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fsnowball/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fsnowball/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fsnowball/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tokenmill","download_url":"https://codeload.github.com/tokenmill/snowball/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fsnowball/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29969243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T11:43:06.159Z","status":"ssl_error","status_checked_at":"2026-03-01T11:43:03.887Z","response_time":124,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lithuanian-language","nlp","porter-stemmer","snowball","stemmer"],"created_at":"2024-11-10T01:11:57.516Z","updated_at":"2026-03-01T12:33:23.251Z","avatar_url":"https://github.com/tokenmill.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ca href=\"http://www.tokenmill.lt\"\u003e\n      \u003cimg src=\".github/tokenmill-logo.svg\" width=\"125\" height=\"125\" align=\"right\" /\u003e\n\u003c/a\u003e\n\n# snowball\n\nOld version of Snowball version of Porter stemmer for Lithuanian language is in the file `lithuanian.sbl`.\n\nNew version is in the file `conservative.sbl`.\n\nThe difference between the new and old versions is that the new one is less aggressive. This means that there should be fewer words that are overstemmed.\n\nThe new stemmer was created with search applications in mind. Therefore, nouns are considered as more important then adjectives, verbs, etc. This means that some suffixes, such as -ut- like in 'kalakutas', are left untouched during stemming. On the other hand, this leaves some adjectives understemmed, e.g. 'sveikutis -\u003e sveikut'. There will always be trade-offs.\n\n\nNOTE:\n\nCurrent stemmer version uses length of the string to prevent overstemming. Stemmer created with `snowball`* program extends `org.tartarus.snowball.SnowballProgram` class and gets length of the `current` string using Java's `current.length()` call.\n\nWhereas Lucene 4.10.1 implements `SnowballProgram` in such a way that attribute `current` is private, therefore `current.length()` doesn't compile for Lucene. Workaround is to substitute `current.length()` with `getCurrent().length()` on line 589.\n\n* `snowball` program was downloaded from [here](http://snowball.tartarus.org/dist/snowball_code.tgz).\n\n## License\n\nCopyright \u0026copy; 2019 [TokenMill UAB](http://www.tokenmill.lt).\n\nDistributed under the The Apache License, Version 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftokenmill%2Fsnowball","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftokenmill%2Fsnowball","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftokenmill%2Fsnowball/lists"}