Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/awsong/MMSEGo
Chinese word splitting algorithm MMSEG in GO
https://github.com/awsong/MMSEGo
Last synced: 2 months ago
JSON representation
Chinese word splitting algorithm MMSEG in GO
- Host: GitHub
- URL: https://github.com/awsong/MMSEGo
- Owner: awsong
- License: other
- Created: 2012-04-18T04:06:21.000Z (almost 13 years ago)
- Default Branch: darts
- Last Pushed: 2012-04-18T04:18:51.000Z (almost 13 years ago)
- Last Synced: 2024-08-03T23:07:00.521Z (6 months ago)
- Language: Go
- Homepage:
- Size: 683 KB
- Stars: 63
- Watchers: 5
- Forks: 14
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-go-zh - MMSEGo
README
MMSEGO
=====
This is a GO implementation of [MMSEG](http://technology.chtsai.org/mmseg/) which a Chinese word splitting algorithm.TO DO list
----------
* Documentation/comments
* BenchmarkUsage
---------
#Input Dictionary Format
```sh
Key\tFreq
```
Each key occupies one line. The file should be utf-8 encoded, please refer to [go-darts](https://github.com/awsong/go-darts)#Code example
```go
package mainimport (
"fmt"
"time"
"os"
"mmsego"
"bufio"
"log"
)func main() {
var s = new(mmsego.Segmenter)
s.Init("darts.lib")
if err != nil {
log.Fatal(err)
}t := time.Now()
offset := 0unifile, _ := os.Open("/tmp/a.txt")
uniLineReader := bufio.NewReaderSize(unifile, 4000)
line, bufErr := uniLineReader.ReadString('\n')
for nil == bufErr {
//takeWord := func(off int, length int){ fmt.Printf("%s ", string(line[off-offset:off-offset+length])) }
takeWord := func(off, length int){ }
s.Mmseg(line[:], offset, takeWord, nil, false)
offset += len(line)
line, bufErr = uniLineReader.ReadString('\n')
}
takeWord := func(off int, length int){ fmt.Printf("%s ", string(line[off-offset:off-offset+length])) }
s.Mmseg(line, offset, takeWord, nil, true)fmt.Printf("Duration: %v\n", time.Since(t))
}
```
LICENSE
-----------
Apache License 2.0