Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pprzetacznik/nlp-zipf-mandelbrot
Zipf and Mandelbrot statistics for corpus
https://github.com/pprzetacznik/nlp-zipf-mandelbrot
Last synced: 26 days ago
JSON representation
Zipf and Mandelbrot statistics for corpus
- Host: GitHub
- URL: https://github.com/pprzetacznik/nlp-zipf-mandelbrot
- Owner: pprzetacznik
- Created: 2015-04-28T23:25:41.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-05-05T00:48:36.000Z (over 9 years ago)
- Last Synced: 2024-10-21T22:52:06.205Z (2 months ago)
- Language: Python
- Homepage:
- Size: 10.9 MB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Natural Language Processing - Create Zipf and Mandelbrot statistics for corpus
============Run:
~~~
pip install -r requirements.txt
python -m wordcount test/potop.txt test/odm.txt > stats.txt
~~~Please run `chart.r` file to generate chart and some numbers.
Sample chart:
![Zipf-Mandelbrot chart](/test/Rplot.png?raw=true)
Other stats:
~~~
Number of words which cover 50% of corpus: 145
Number of hapax legomena: 7716
~~~