https://github.com/nikvoronin/ngrams4alice
Using a Markov chain to generate readable nonsense in C#
https://github.com/nikvoronin/ngrams4alice
csharp dotnet generative-text markov-chain
Last synced: 2 months ago
JSON representation
Using a Markov chain to generate readable nonsense in C#
- Host: GitHub
- URL: https://github.com/nikvoronin/ngrams4alice
- Owner: nikvoronin
- License: mit
- Created: 2024-01-14T12:26:56.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-01-30T21:36:46.000Z (over 2 years ago)
- Last Synced: 2025-01-16T11:27:05.307Z (over 1 year ago)
- Topics: csharp, dotnet, generative-text, markov-chain
- Language: C#
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Ngrams For Alice
## Outputs
Lewis Carroll. Alice's adventures in Wonderland
```shell
> cat .\alice.txt | .\MarkovNgrams.exe
```
"Just at this moment Alice felt a little more conversation with her face like the look of the shepherd boy--and the sneeze of the creature, but on second thoughts she decided on going into the wood."
"There's no pleasing them! Alice replied in an agony of terror."
"I know I do! said Alice in a trembling voice to a work or any part of this electronic work, you must return the medium with your written explanation."
## Ngram internals
> [!NOTE]
> Define `Enable_PrintStatistics` constant as `true`, then rebuild project.
### Top 10 most-branching n-grams
```text
in the → United (10)
the terms → of (13)
terms of → this (10)
the Project → Gutenberg (22)
out of → the (11)
one of → the (14)
she said → to (17)
the White → Rabbit (10)
* * → * (54)
said to → herself, (10) | the (11)
said the → Caterpillar. (12) | King, (10) | Mock (19) | King. (10)
a minute → or (11)
the March → Hare (14) | Hare. (10)
the Mock → Turtle (28)
```
Q: Repeated but why?\
A: Words are chosen randomly. If words are repeated more often, they will be appear more often.
### Model statistics
```text
Prefixes (states): 19378
Unique transitions: 26592
Total observations: 29571
Unique words: 5974
Avg transitions / state: 1,37
Avg observations / state: 1,53
```
#### Prefixes (states)
Number of unique n-gram prefixes.
```text
["I","am"]
["You","are"]
["The","cat"]
```
#### Unique transitions
Number of unique transitions:
```text
["I","am"] → happy
["I","am"] → tired
```
#### Total observations
How many times transitions appear in the corpus (sum of frequencies).
Example:
```text
I am happy (5)
I am tired (2)
```
Total = 7
#### Unique words
The size of the model's vocabulary.
#### Avg transitions/state
How "branchy" the model is:
- ~1 -- text is almost deterministic
- 2-5 -- typical natural language
- 10+ -- very diverse corpus