An open API service indexing awesome lists of open source software.

https://github.com/nikvoronin/ngrams4alice

Using a Markov chain to generate readable nonsense in C#
https://github.com/nikvoronin/ngrams4alice

csharp dotnet generative-text markov-chain

Last synced: 2 months ago
JSON representation

Using a Markov chain to generate readable nonsense in C#

Awesome Lists containing this project

README

          

# Ngrams For Alice

## Outputs

Lewis Carroll. Alice's adventures in Wonderland

```shell
> cat .\alice.txt | .\MarkovNgrams.exe
```

"Just at this moment Alice felt a little more conversation with her face like the look of the shepherd boy--and the sneeze of the creature, but on second thoughts she decided on going into the wood."

"There's no pleasing them! Alice replied in an agony of terror."

"I know I do! said Alice in a trembling voice to a work or any part of this electronic work, you must return the medium with your written explanation."

## Ngram internals

> [!NOTE]
> Define `Enable_PrintStatistics` constant as `true`, then rebuild project.

### Top 10 most-branching n-grams

```text
in the → United (10)
the terms → of (13)
terms of → this (10)
the Project → Gutenberg (22)
out of → the (11)
one of → the (14)
she said → to (17)
the White → Rabbit (10)
* * → * (54)
said to → herself, (10) | the (11)
said the → Caterpillar. (12) | King, (10) | Mock (19) | King. (10)
a minute → or (11)
the March → Hare (14) | Hare. (10)
the Mock → Turtle (28)
```

Q: Repeated but why?\
A: Words are chosen randomly. If words are repeated more often, they will be appear more often.

### Model statistics

```text
Prefixes (states): 19378
Unique transitions: 26592
Total observations: 29571
Unique words: 5974
Avg transitions / state: 1,37
Avg observations / state: 1,53
```

#### Prefixes (states)

Number of unique n-gram prefixes.

```text
["I","am"]
["You","are"]
["The","cat"]
```

#### Unique transitions

Number of unique transitions:

```text
["I","am"] → happy
["I","am"] → tired
```

#### Total observations

How many times transitions appear in the corpus (sum of frequencies).

Example:

```text
I am happy (5)
I am tired (2)
```

Total = 7

#### Unique words

The size of the model's vocabulary.

#### Avg transitions/state

How "branchy" the model is:

- ~1 -- text is almost deterministic
- 2-5 -- typical natural language
- 10+ -- very diverse corpus