https://github.com/mlang/wikiwordfreq
https://github.com/mlang/wikiwordfreq
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/mlang/wikiwordfreq
- Owner: mlang
- License: bsd-3-clause
- Created: 2014-09-29T09:38:07.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2019-07-10T07:06:47.000Z (almost 7 years ago)
- Last Synced: 2024-08-14T00:30:33.545Z (almost 2 years ago)
- Language: Haskell
- Size: 56.6 MB
- Stars: 7
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# wikiwc
A Wikipedia word frequency counter.
This project makes use of [Wikipedia_Extractor](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor)
to pre-process a full Mediawiki dump into basically plain text files.
It then parses these files into separate words, and counts the number
of occurences of each word.
## Usage
As a default, wikiwc downloads the german wikipedia.
```shell
$ make WIKILANG=en
```