https://github.com/krypt0nn/markov-chains
Haha funny text generator with markov chains
https://github.com/krypt0nn/markov-chains
Last synced: 19 days ago
JSON representation
Haha funny text generator with markov chains
- Host: GitHub
- URL: https://github.com/krypt0nn/markov-chains
- Owner: krypt0nn
- License: mit
- Created: 2024-04-22T11:01:15.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2024-04-22T11:18:46.000Z (about 2 years ago)
- Last Synced: 2024-04-22T12:26:24.479Z (about 2 years ago)
- Language: Rust
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Funny stoopid text generation app
We've randomly decided to play with [Markov chains](https://en.wikipedia.org/wiki/Markov_chain) on my discord server, and I decided to release my results in github.
> If you have any questions - refer to `markov-chains --help` or `markov-chains --help`.
## Simple example
1. Build the model from the input text files
> cargo run -- model from-scratch --messages inputs/json/kleden.txt --output outputs/models/kleden1.model
Accepted input files formats are json strings and plain text lines:
Json strings lines:
> "Selamat Pagi"\
> "Minecraft ugh"
Plain text lines:
> Political economy belongs to the category of the social sciences.\
> The basis of the life of society is material production.
2. Load model
> cargo run -- model load --model outputs/models/kleden1.model
There's a bunch of params you can change to play with the model. Most important ones are `--context-window` which configures the "intelligence" of the model, and `--min-length` which can force model to keep generating new text.
## Complex example
1. Generate messages bundle. Those are filtered lists of pre-processed words
> cargo run -- messages parse --path inputs/json/kleden.txt --output outputs/messages/kleden.bundle
>
> cargo run -- messages parse --path inputs/text/political-economy.txt --output outputs/messages/political-economy.bundle
>
> cargo run -- messages parse --path inputs/text/state-and-revolution.txt --output outputs/messages/state-and-revolution.bundle
2. Merge books to one messages set
> cargo run -- messages merge --path outputs/messages/political-economy.bundle --path outputs/messages/state-and-revolution.bundle --output outputs/messages/background.bundle
3. Create tokens from the messages sets
> cargo run -- tokens parse --path outputs/messages/kleden.bundle --output outputs/tokens/kleden.bundle
>
> cargo run -- tokens parse --path outputs/messages/background.bundle --output outputs/tokens/background.bundle
4. Merge tokens to the single bundle
> cargo run -- tokens merge --path outputs/tokens/background.bundle --path outputs/tokens/kleden.bundle --output outputs/tokens/tokens.bundle
5. Tokenize prepared messages bundles
> cargo run -- messages tokenize --messages outputs/messages/background.bundle --tokens outputs/tokens/tokens.bundle --output outputs/tokenized/background.bundle
>
> cargo run -- messages tokenize --messages outputs/messages/kleden.bundle --tokens outputs/tokens/tokens.bundle --output outputs/tokenized/kleden.bundle
6. Create new dataset from the background messages bundle
> cargo run -- dataset create --messages outputs/tokenized/background.bundle --tokens outputs/tokens/tokens.bundle --output outputs/datasets/kleden2.bundle
7. Extend this dataset with the kleden's messages bundle with bigger weight (10)
> cargo run -- dataset add-messages --path outputs/datasets/kleden2.bundle --messages outputs/tokenized/kleden.bundle --weight 10 --output outputs/datasets/kleden2.bundle
8. Build the model
> cargo run -- model build --dataset outputs/datasets/kleden2.bundle --output outputs/models/kleden2.model
9. Load model
> cargo run -- model load --model outputs/models/kleden2.model
Author: [Nikita Podvirnyi](https://github.com/krypt0nn)\
Licensed under [MIT](LICENSE)