https://github.com/brannondorsey/markov
A small Markov chain generator 📃
https://github.com/brannondorsey/markov
Last synced: 5 months ago
JSON representation
A small Markov chain generator 📃
- Host: GitHub
- URL: https://github.com/brannondorsey/markov
- Owner: brannondorsey
- License: mit
- Created: 2020-02-18T03:54:54.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-02-25T01:53:16.000Z (over 5 years ago)
- Last Synced: 2024-06-20T15:02:11.470Z (over 1 year ago)
- Language: Go
- Homepage:
- Size: 18.6 KB
- Stars: 5
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
## Markov
A simple Markov chain generator. Originally created for the Language Modeling lecture in the [Introduction to Synthetic Media](https://github.com/runwayml/Intro-Synthetic-Media) class at ITP/NYU.
### Download
Binary releases can be downloaded from the [releases page](https://github.com/brannondorsey/markov/releases/). Be sure to unzip your file, and put the `markov` binary somewhere in your `$PATH` (like inside `/usr/local/bin/`).
### Usage
```bash
# Download the UCI News Aggregator Dataset (400K news headlines) as a sample corpus
wget https://github.com/brannondorsey/markov/releases/download/v0.1.0/uci-news-aggregator-dataset.txt# Build and cache an n-gram frequency histogram, then use it to generate text
markov --corpus uci-news-aggregator-dataset.txt --n-gram-length 3 --prompt "For the first time in a decade"
```Below is a summary of the full usage of the `markov` command.
```
Usage of markov:
-i, --corpus string The input corpus to build the n-gram histogram with.
-h, --help Show this screen.
-l, --lowercase Convert text to lowercase. Lowers the complexity of the sampling task, and may produce better results depending on the corpus.
-c, --max-characters int The maximum number of characters to generate. Fewer characters may be generated if the sequence encounters an n-gram that has no next n-grams in the dataset. (default 1000)
-n, --n-gram-length int The number of characters to use for each n-gram. (default 3)
-p, --prompt string The prompt to (optional). (default "hello")
```