Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Goodsign/snowball
Cgo binding for Snowball C library
https://github.com/Goodsign/snowball
Last synced: 20 days ago
JSON representation
Cgo binding for Snowball C library
- Host: GitHub
- URL: https://github.com/Goodsign/snowball
- Owner: goodsign
- License: other
- Created: 2012-12-11T12:42:19.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2017-06-27T08:13:41.000Z (over 7 years ago)
- Last Synced: 2024-07-31T01:28:15.349Z (4 months ago)
- Language: C
- Size: 219 KB
- Stars: 38
- Watchers: 2
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Description
====Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality. For more detailed info see http://snowball.tartarus.org/
Installing
====```
go get github.com/goodsign/snowball
go test github.com/goodsign/snowball (Must PASS)
```Done! Use it in your go files. (import 'github.com/goodsign/snowball')
Usage
====```go
stemmer, err := NewWordStemmer(algorithm, encoding)
if nil != err {
/*...handle error...*/
}
defer stemmer.Close()wordStem, err := stemmer.Stem(word)
if nil != err {
/*...handle error...*/
}/* Use wordStem */
```
Usage notes
-----------According to Snowball documentation:
```
Creating a stemmer is a relatively expensive operation - the expected
usage pattern is that a new stemmer is created when needed, used
to stem many words, and deleted after some time.
```Algorithms & encodings
----File **modules.txt** contains all the main algorithms for each language, in UTF-8, and also with
the most commonly used encoding.```
Language Encodings Algorithmsdanish UTF_8,ISO_8859_1 danish,da,dan
dutch UTF_8,ISO_8859_1 dutch,nl,dut,nld
english UTF_8,ISO_8859_1 english,en,eng
finnish UTF_8,ISO_8859_1 finnish,fi,fin
french UTF_8,ISO_8859_1 french,fr,fre,fra
german UTF_8,ISO_8859_1 german,de,ger,deu
hungarian UTF_8,ISO_8859_1 hungarian,hu,hun
italian UTF_8,ISO_8859_1 italian,it,ita
norwegian UTF_8,ISO_8859_1 norwegian,no,nor
portuguese UTF_8,ISO_8859_1 portuguese,pt,por
romanian UTF_8,ISO_8859_2 romanian,ro,rum,ron
russian UTF_8,KOI8_R russian,ru,rus
spanish UTF_8,ISO_8859_1 spanish,es,esl,spa
swedish UTF_8,ISO_8859_1 swedish,sv,swe
turkish UTF_8 turkish,tr,tur
```Thread-safety
====The original Snowball documentation says:
```
Stemmers are re-entrant, but not threadsafe. In other words, if
you wish to access the same stemmer object from multiple threads,
you must ensure that all access is protected by a mutex or similar
device.
```Thus this Go wrapper uses **sync.Mutex** for each stem operation, so it is thread safe.
Snowball Licence
==========The Snowball library is released under the [BSD Licence](http://opensource.org/licenses/bsd-license.php)
Licence
==========The goodsign/snowball binding is released under the [BSD Licence](http://opensource.org/licenses/bsd-license.php)
[LICENCE file](https://github.com/goodsign/libtextcat/blob/master/LICENCE)