https://github.com/aurelius84/n-gram

A project of N-gram model comparing FMM/BMM
https://github.com/aurelius84/n-gram

bmm dp fmm ngram segment

Last synced: 6 months ago
JSON representation

A project of N-gram model comparing FMM/BMM

Host: GitHub
URL: https://github.com/aurelius84/n-gram
Owner: Aurelius84
Created: 2016-11-19T08:59:58.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2022-10-17T09:32:53.000Z (about 3 years ago)
Last Synced: 2025-03-24T05:51:59.837Z (7 months ago)
Topics: bmm, dp, fmm, ngram, segment
Language: Python
Size: 2.64 MB
Stars: 17
Watchers: 1
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # N-gram

A project of N-gram model comparing FMM/BMM

Document:[CocoNLP](https://www.cnblogs.com/CocoML/p/12725988.html)

### Usage

Firstly, you should download the data '199801.txt' from Internet and put it in the project dir.

Use as followed:

```

python statistic.py

```

And you will get result like this:

```

successfully to split corpus by train = 0.900000 test = 0.100000

the total number of words is:53260

The total number of bigram is : 403121.

successfully witten-Bell smoothing! smooth_value:1.3372788850370981e-05

the total number of punction is:47

召回率为:0.962036929819092

准确率为:0.9401303935308096

F值为:0.950957517059212

```

### Result

|指标   |  FMM |  BMM | Unigram  | Bigram|

|:------:|:------:|:------:|:------------:|:--:|

|准确率 |91.54%|92.13%|   93.20%   |  94.01% |

|召回率 |94.66%|95.07%|   96.14%   |  96.20% |

|F1值  |93.07%|93.58%|   94.64%   |  95.10%|

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aurelius84/n-gram

Awesome Lists containing this project

README