https://github.com/susam/mvs
A minimum viable Markov gibberish generator in 32 lines of Python, inspired by the legendary Mark V. Shaney program of 1980s
https://github.com/susam/mvs
language-model markov-chain markov-model n-gram python
Last synced: 19 days ago
JSON representation
A minimum viable Markov gibberish generator in 32 lines of Python, inspired by the legendary Mark V. Shaney program of 1980s
- Host: GitHub
- URL: https://github.com/susam/mvs
- Owner: susam
- License: mit
- Created: 2025-12-12T02:26:58.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-12-12T19:47:59.000Z (about 1 month ago)
- Last Synced: 2025-12-13T09:40:21.850Z (29 days ago)
- Topics: language-model, markov-chain, markov-model, n-gram, python
- Language: Makefile
- Homepage:
- Size: 81.1 KB
- Stars: 8
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- License: LICENSE.md
Awesome Lists containing this project
- my-awesome-github-stars - susam/mvs - A minimum viable Markov gibberish generator in 32 lines of Python, inspired by the legendary Mark V. Shaney program of 1980s (Makefile)
README
Mark V. Shaney Junior Gibberish Generator
==========================================
**Mark V. Shaney Junior** is a minimal implementation of a Markov
gibberish generator inspired by the legendary Mark V. Shaney program
from the 1980s. Mark V. Shaney was a synthetic Usenet user that
posted various messages to the newsgroups using text generated by a
Markov chain program. See the Wikipedia article [Mark
V. Shaney][mvs-wiki] for more details about it.
The program [mvs][] available in this project consumes text via
standard input, builds an internal Markov model and then uses the
model to generate gibberish.
**[View Source][mvs]**
[mvs]: mvs
[mvs-wiki]: https://en.wikipedia.org/wiki/Mark_V._Shaney
Contents
--------
* [Source Code](#source-code)
* [Commentary](#commentary)
* [Get Started](#get-started)
* [Command Line Arguments](#command-line-arguments)
* [Gibberish](#gibberish)
* [Unprompted Gibberish](#unprompted-gibberish)
* [Prompted Gibberish](#unprompted-gibberish)
* [Personal Gibberish](#personal-gibberish)
* [Licence](#licence)
Source Code
-----------
Here is the complete source code of the gibberish generator
([mvs][]):
```python3
#!/usr/bin/env python3
import random
import sys
type Key = tuple[str, ...]
type Model = dict[Key, list[str]]
def train(text: str, n: int) -> Model:
words = text.split()
model: Model = {}
for i in range(len(words) - n):
key = tuple(words[i : i + n])
value = words[i + n]
model.setdefault(key, []).append(value)
return model
def generate(model: Model, length: int, prompt: Key) -> str:
key = prompt if prompt else random.choice(list(model.keys()))
output = list(key)
for _ in range(length - len(key)):
values = model.get(key)
if not values:
break
next_word = random.choice(values)
output.append(next_word)
key = *key[1:], next_word
return " ".join(output)
def main(n: int, length: int, prompt: Key) -> None:
model = train(sys.stdin.read(), n)
print(generate(model, length, prompt[:n]))
if __name__ == "__main__":
main(
int(sys.argv[1]) if len(sys.argv) > 1 else 2,
int(sys.argv[2]) if len(sys.argv) > 2 else 100,
tuple(sys.argv[3].split()) if len(sys.argv) > 3 else (),
)
```
This program implements a simple Markov text generator. It reads text
from standard input and records every sequence of *n* consecutive
words together with the words that follow them. From this data it
learns which words tend to come next after a given word sequence, with
more frequent sequences being more likely to be chosen while
generating text.
To generate text, the program starts from a random sequence or a user
provided prompt. It repeatedly selects a possible following word at
random, weighted by how often it appeared in the original text. The
result mimics the local patterns of the source material while drifting
into grammatically plausible but often meaningless gibberish.
Commentary
----------
This implementation is deliberately simple and inefficient. It keeps
all observed word sequences and their followers in memory, including
duplicates, which makes the model needlessly large. There is plenty
of scope for improvement, such as storing follower-word frequencies
instead of keeping duplicate entries or pruning rarely seen sequences.
More sophisticated techniques can be applied to produce more plausible
sounding text, including higher order n-grams, careful handling of
sentence boundaries and punctuation, addition of grammatical
constraints or probabilistic language models. What is provided here
is intended to serve as a minimum viable Markov text generator. Any
further enhancements are left as an exercise for the reader.
While this program is small, it is worth noting that this program is
not intended to be an exercise in code-golfing. Clarity takes
precedence over reducing the number of lines.
Given the overwhelming popularity of large language models (LLMs) in
2025, it is worth noting that this approach bears little resemblance
to LLMs. LLMs are trained on vast datasets using neural networks to
model language patterns across large spans of text. LLMs capture
global structure and long range dependencies. By contrast, Markov
text generators rely entirely on local word transition statistics and
have no model of global structure. Despite these limitations, the
Markov text generator shared in this project can serve as a simple
introduction to statistical language modelling. After all, Markov
chains can be thought of as the 'hello, world' of language models.
Get Started
-----------
To get started with the [mvs][] program in this project, clone or
download the repository to a system with Python 3 installed. Then run
the following command:
```sh
python3 mvs < book.txt
```
On most Unix or Linux systems, you can alternatively run:
```
./mvs < book.txt
```
This generates arbitrary gibberish based on the model it has built by
consuming the text in the file [book.txt](book.txt).
Command Line Arguments
----------------------
To keep this tool as minimal as possible, it does not come with any
command line options. In fact, it does not even have the `--help`
option. However, it supports a few command line arguments. Since
there is no help output from the tool, this section describes the
command line arguments for this tool.
Here is a synopsis of the command line arguments supported by this tool:
```
./mvs [N [LENGTH [PROMPT]]]
```
Here is a description of each argument:
- `N`
The order of the Markov model. This value specifies how many
consecutive words are used as the state when training the model.
For example, a value of 2 means the model uses two previous words
to predict the next one, which corresponds to a trigram model in
standard n-gram terminology. A value of 3 would use three
previous words (a 4-gram model) and so on. If not specified, this
defaults to 2.
- `LENGTH`
The maximum number of words to generate. Generation may stop
earlier if the model reaches a state for which no continuation
exists. If not specified, this defaults to 100.
- `PROMPT`
An optional starting prompt used to seed text generation. This
should be a single command line argument containing one or more
words separated by spaces, so it must be quoted when invoking the
program. If provided, only the first N words of the prompt are
used. If omitted, generation starts from a random state in the
model.
Here are some usage examples of these command line arguments:
1. Generate gibberish using a trigram model:
```sh
./mvs 3 < book.txt
```
2. Generate gibberish up to 250 words long:
```sh
./mvs 2 250 < book.txt
```
3. Use the words 'There is' to start the gibberish:
```sh
./mvs 2 100 'There is' < book.txt
```
Gibberish
---------
### Unprompted Gibberish
Here is an example of a gibberish produced by the program when no
prompt was supplied:
```
$ ./mvs < book.txt
Ghost again stood side by side in the stables; and the bedpost was his
own. The bed was warm, and tender; and the Ghost had entered. It was a
Turkey! He never could have listened to it can cheer and comfort you
in a voice that seldom rose above the warehouse door: Scrooge and
Marley. Sometimes people new to the postboy, who answered that a
bachelor was a genial shadowing forth of all her silken rustlings, and
her rapid flutterings past him, he seized the ruler with such severity
as the figure-head of an old gentleman in a little and
```
### Prompted Gibberish
Here is an example of text generated from the initial prompt 'At
last':
```
$ ./mvs 2 100 'At last' < book.txt
At last she said, amazed, "there is! Nothing is past hope, if such a
man whose name he had an expectation that the singer fled in terror,
leaving the keyhole to regale him with such favour, that he turned his
steps towards his door. "It's humbug still!" said Scrooge. "I am very
happy," said little Bob, the father, who came upon his knee; for in
the fire, and deep red curtains, ready to our calling, we're well
matched. Come into the most extravagant contortions: Scrooge's niece,
indignantly. Bless those women; they never do anything by halves. They
are all indescribable
```
### Personal Gibberish
Finally, I also ran the program on all posts I have written so far on
my website at . Here is what it generated:
```
$ make filter-website && ./mvs < susam.txt
while a query replace operation is approved by the user. The above
variable defines the build job. It can be incredibly useful while
working on assembly language and machine code. In fact, all internal
resources like the result to refine the search prompt changes from
bck-i-search: to fwd-i-search:. Now type C-SPC (i.e. ctrl+space) to
set a mark causes Emacs to use 32-bit registers like EBP, ESP,
etc. Thus the behaviour is undefined. Such code may behave differently
when compiled with the readily available GNU tools like the shape
of 8. Flipping "P" horizontally makes it a proper quine: cat $0
```
Apparently, this is what I would sound like if I ever took up speaking
gibberish!
Licence
-------
This is free and open source software. You can use, copy, modify,
merge, publish, distribute, sublicence and/or sell copies of it, under
the terms of the MIT Licence. See [LICENSE.md][L] for details.
This software is provided "AS IS", WITHOUT WARRANTY OF ANY KIND,
express or implied. See [LICENSE.md][L] for details.
[L]: LICENSE.md