https://github.com/a2800276/porter
porter stemmer
https://github.com/a2800276/porter
Last synced: 29 days ago
JSON representation
porter stemmer
- Host: GitHub
- URL: https://github.com/a2800276/porter
- Owner: a2800276
- License: mit
- Created: 2013-09-17T11:10:16.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2013-10-03T11:10:18.000Z (over 11 years ago)
- Last Synced: 2024-07-31T20:52:34.010Z (9 months ago)
- Language: Go
- Size: 383 KB
- Stars: 12
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-go - porter - This is a fairly straightforward port of Martin Porter's C implementation of the Porter stemming algorithm. (Natural Language Processing / Morphological Analyzers)
- zero-alloc-awesome-go - porter - This is a fairly straightforward port of Martin Porter's C implementation of the Porter stemming algorithm. (Natural Language Processing / Morphological Analyzers)
- awesome-cobol - porter - This is a fairly straighforward port of Martin Porter's C implementation of the Porter stemming alcobolrithm. (Natural Language Processing / Middlewares)
- awesome-go - porter - porter stemmer - ★ 8 (Natural Language Processing)
- awesome-go-extra - porter - 09-17T11:10:16Z|2013-10-03T11:10:18Z| (Bot Building / Morphological Analyzers)
- awesome-go-zh - porter
README
Porter Stemmer for Go
=====================This is a fairly straighforward port of Martin Porter's C implementation
of the Porter stemming algorithm. The C version this port is based on is
available for download here:
[http://tartarus.org/~martin/PorterStemmer/c_thread_safe.txt](http://tartarus.org/~martin/PorterStemmer/c_thread_safe.txt)The original algorithm is described in the paper:
M.F. Porter, 1980, An algorithm for suffix stripping, Program, 14(3) pp
130-137.While the internal implementation and interface is nearly identical to
the original implementation, the Go interface is much simplified. The
stemmer can be called as follows:import "porter"
...
stemmed := porter.Stem(word_to_stem)Installing
----------go get github.com/a2800276/porter
to use the stemmer when installed using goinstall, import:
import "github.com/a2800276/porter"
Limitations
-----------While the implementation is fairly robust, this is a work in progress.
In particular, a new interface will likely be provided to prevent
excessive conversions between `string`s and `[]byte`. Currently, on
calling `Stem` the string argument is converted to a byte slice which
the algorithm works on and is converted back into a string before
returning.Also, the implementation is not particularly robust at handling Unicode
input, currently, only bytes with the high bit set are ignored. It's up
to the caller to make sure the string contains only ASCII characters.
Since the algorithm itself operates on English words only, this doens't
restrict the functionality, but it is nuisance.TODO:
-----
* byte slice API to void roundtripping to string and back