Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/llimllib/segment
A golang implementation of Norvig's segmenter
https://github.com/llimllib/segment
Last synced: about 1 month ago
JSON representation
A golang implementation of Norvig's segmenter
- Host: GitHub
- URL: https://github.com/llimllib/segment
- Owner: llimllib
- License: wtfpl
- Created: 2013-07-27T14:59:37.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2013-07-29T17:42:17.000Z (over 11 years ago)
- Last Synced: 2024-11-30T02:50:51.416Z (about 1 month ago)
- Language: Go
- Size: 965 KB
- Stars: 14
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
A Go language implementation of the Norvig segmenter, given in [this pdf](http://norvig.com/ngrams/ch14.pdf).
Licensed WTFPL; please use this code in any way you would like.
### func MakeWordProb(reader io.Reader) func(string) float64
MakeWordProb makes a word probability function from a reader.
You can create your own word probability function if you want, this just provides a default implementation. The word probability function should take any word as an argument and return a float64 0 <= x <= 1
### func Segment(text string, wordprob func(string) float64) []string
Segment a string. Return the highest-scoring segmentation of that string given the word probability function wordprob.
# Example
```go
package mainimport (
"segment"
"fmt"
"os"
)func getFile(filename string) *os.File {
f, err := os.Open(filename)
if err != nil {
fmt.Println("Unable to read file", filename)
os.Exit(1)
}return f
}func main() {
wordp := segment.MakeWordProb(getFile("mobydick.txt"))
fmt.Println(segment.Segment("thereareshortpeopleeverywhere", wordp))
// Output:
// [there are short people everywhere]
}