https://github.com/aurelius84/n-gram
A project of N-gram model comparing FMM/BMM
https://github.com/aurelius84/n-gram
bmm dp fmm ngram segment
Last synced: 6 months ago
JSON representation
A project of N-gram model comparing FMM/BMM
- Host: GitHub
- URL: https://github.com/aurelius84/n-gram
- Owner: Aurelius84
- Created: 2016-11-19T08:59:58.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2022-10-17T09:32:53.000Z (about 3 years ago)
- Last Synced: 2025-03-24T05:51:59.837Z (7 months ago)
- Topics: bmm, dp, fmm, ngram, segment
- Language: Python
- Size: 2.64 MB
- Stars: 17
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# N-gram
A project of N-gram model comparing FMM/BMM
Document:[CocoNLP](https://www.cnblogs.com/CocoML/p/12725988.html)### Usage
Firstly, you should download the data '199801.txt' from Internet and put it in the project dir.
Use as followed:
```
python statistic.py
```
And you will get result like this:
```
successfully to split corpus by train = 0.900000 test = 0.100000
the total number of words is:53260
The total number of bigram is : 403121.
successfully witten-Bell smoothing! smooth_value:1.3372788850370981e-05
the total number of punction is:47
召回率为:0.962036929819092
准确率为:0.9401303935308096
F值为:0.950957517059212
```### Result
|指标 | FMM | BMM | Unigram | Bigram|
|:------:|:------:|:------:|:------------:|:--:|
|准确率 |91.54%|92.13%| 93.20% | 94.01% |
|召回率 |94.66%|95.07%| 96.14% | 96.20% |
|F1值 |93.07%|93.58%| 94.64% | 95.10%|