Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ggerganov/ggwords
Generate language n-gram statistics
https://github.com/ggerganov/ggwords
language ngrams statistics
Last synced: 19 days ago
JSON representation
Generate language n-gram statistics
- Host: GitHub
- URL: https://github.com/ggerganov/ggwords
- Owner: ggerganov
- License: gpl-3.0
- Created: 2022-03-28T17:32:28.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-04-11T17:39:52.000Z (over 2 years ago)
- Last Synced: 2024-10-03T13:52:06.971Z (3 months ago)
- Topics: language, ngrams, statistics
- Language: C++
- Homepage:
- Size: 44.2 MB
- Stars: 17
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ggwords
Generate n-gram statistics by processing the contents of English books/texts.
## Usage
```bash
git clone https://github.com/ggerganov/ggwords
cd ggwords
mkdir build
cd build
cmake ..
make -j4./bin/analyze /path/to/metadata/books.txt /path/to/books/text
```## Sample data
The data in [./data](./data) was generated using https://github.com/pgcorpus/gutenberg