{"id":21690786,"url":"https://github.com/reiver/go-porterstemmer","last_synced_at":"2025-10-18T18:17:24.453Z","repository":{"id":8757867,"uuid":"10439470","full_name":"reiver/go-porterstemmer","owner":"reiver","description":"A native Go clean room implementation of the Porter Stemming algorithm.","archived":false,"fork":false,"pushed_at":"2021-06-23T19:23:11.000Z","size":31,"stargazers_count":190,"open_issues_count":5,"forks_count":45,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-04-02T07:08:52.434Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://godoc.org/github.com/reiver/go-porterstemmer","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/reiver.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-06-02T17:18:12.000Z","updated_at":"2024-02-15T07:04:21.000Z","dependencies_parsed_at":"2022-09-26T17:41:36.622Z","dependency_job_id":null,"html_url":"https://github.com/reiver/go-porterstemmer","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reiver%2Fgo-porterstemmer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reiver%2Fgo-porterstemmer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reiver%2Fgo-porterstemmer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reiver%2Fgo-porterstemmer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/reiver","download_url":"https://codeload.github.com/reiver/go-porterstemmer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248008630,"owners_count":21032556,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-25T17:33:45.187Z","updated_at":"2025-10-18T18:17:19.434Z","avatar_url":"https://github.com/reiver.png","language":"Go","funding_links":[],"categories":["Go","Natural Language Processing","[](https://github.com/josephmisiti/awesome-machine-learning/blob/master/README.md#go)Go"],"sub_categories":["Tools","[Tools](#tools-1)","Speech Recognition"],"readme":"# Go Porter Stemmer\n\nA native Go clean room implementation of the Porter Stemming Algorithm.\n\nThis algorithm is of interest to people doing Machine Learning or\nNatural Language Processing (NLP).\n\nThis is NOT a port. This is a native Go implementation from the human-readable\ndescription of the algorithm.\n\nI've tried to make it (more) efficient by NOT internally using string's, but\ninstead internally using []rune's and using the same (array) buffer used by\nthe []rune slice (and sub-slices) at all steps of the algorithm.\n\nFor Porter Stemmer algorithm, see:\n\nhttp://tartarus.org/martin/PorterStemmer/def.txt      (URL #1)\n\nhttp://tartarus.org/martin/PorterStemmer/             (URL #2)\n\n# Departures\n\nAlso, since when I initially implemented it, it failed the tests at...\n\nhttp://tartarus.org/martin/PorterStemmer/voc.txt      (URL #3)\n\nhttp://tartarus.org/martin/PorterStemmer/output.txt   (URL #4)\n\n... after reading the human-readble text over and over again to try to figure out\nwhat the error I made was (and doing all sorts of things to debug it) I came to the\nconclusion that the some of these tests were wrong according to the human-readable\ndescription of the algorithm.\n\nThis led me to wonder if maybe other people's code that was passing these tests had\nrules that were not in the human-readable description. Which led me to look at the source\ncode here...\n\nhttp://tartarus.org/martin/PorterStemmer/c.txt        (URL #5)\n\n... When I looked there I noticed that there are some items marked as a \"DEPARTURE\",\nwhich differ from the original algorithm. (There are 2 of these.)\n\nI implemented these departures, and the tests at URL #3 and URL #4 all passed.\n\n## Usage\n\nTo use this Golang library, use with something like:\n\n    package main\n    \n    import (\n      \"fmt\"\n      \"github.com/reiver/go-porterstemmer\"\n    )\n    \n    func main() {\n      \n      word := \"Waxes\"\n      \n      stem := porterstemmer.StemString(word)\n      \n      fmt.Printf(\"The word [%s] has the stem [%s].\\n\", word, stem)\n    }\n\nAlternatively, if you want to be a bit more efficient, use []rune slices instead, with code like:\n\n    package main\n    \n    import (\n      \"fmt\"\n      \"github.com/reiver/go-porterstemmer\"\n    )\n    \n    func main() {\n      \n      word := []rune(\"Waxes\")\n      \n      stem := porterstemmer.Stem(word)\n      \n      fmt.Printf(\"The word [%s] has the stem [%s].\\n\", string(word), string(stem))\n    }\n\nAlthough NOTE that the above code may modify original slice (named \"word\" in the example) as a side\neffect, for efficiency reasons. And that the slice named \"stem\" in the example above may be a\nsub-slice of the slice named \"word\".\n\nAlso alternatively, if you already know that your word is already lowercase (and you don't need\nthis library to lowercase your word for you) you can instead use code like:\n\n    package main\n    \n    import (\n      \"fmt\"\n      \"github.com/reiver/go-porterstemmer\"\n    )\n    \n    func main() {\n      \n      word := []rune(\"waxes\")\n      \n      stem := porterstemmer.StemWithoutLowerCasing(word)\n      \n      fmt.Printf(\"The word [%s] has the stem [%s].\\n\", string(word), string(stem))\n    }\n\nAgain NOTE (like with the previous example) that the above code may modify original slice (named\n\"word\" in the example) as a side effect, for efficiency reasons. And that the slice named \"stem\"\nin the example above may be a sub-slice of the slice named \"word\".\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freiver%2Fgo-porterstemmer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Freiver%2Fgo-porterstemmer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freiver%2Fgo-porterstemmer/lists"}