https://github.com/portasynthinca3/markov

Text generation library for Elixir/Erlang based on Markov chains
https://github.com/portasynthinca3/markov

context-awareness elixir erlang markov-chain nlp text-generation

Last synced: 6 months ago
JSON representation

Text generation library for Elixir/Erlang based on Markov chains

Host: GitHub
URL: https://github.com/portasynthinca3/markov
Owner: portasynthinca3
License: wtfpl
Created: 2021-09-06T13:58:21.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2023-10-19T02:59:22.000Z (almost 2 years ago)
Last Synced: 2024-01-27T02:42:50.792Z (over 1 year ago)
Topics: context-awareness, elixir, erlang, markov-chain, nlp, text-generation
Language: Elixir
Homepage:
Size: 1.46 MB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # Markov



Text generation library based on nth-order Markov chains

![Hex.pm](https://img.shields.io/hexpm/v/markov)

![Hex.pm](https://img.shields.io/hexpm/dw/markov)

## Features

  - **Token sanitation** (optional): ignores letter case and punctuation when switching states, but still keeps the output as-is

  - **Operation history** (optional): recalls the operations it was instructed to perform, incl. past training data

  - **Probability shifting** (optional): gives less frequent generation paths more chance to get used, which makes the output more original but may produce nonsense

  - **Tagging** (optional): you can tag your source data and alter the probabilities of tagged generation paths according to your rules

  - **Prompted generation** (optional) grants your model the ability to answer questions given to it provided that the training data consists mostly of Q&A pairs

  - **Managed disk storage** so you don't have to worry about storing and loading the models

  - **Transparent fragmentation** reduces RAM usage and loading times with huge models

## Usage

In `mix.exs`:

```elixir

defp deps do

  [{:markov, "~> 4.0"}]

end

```

Unlike Markov 1.x, this version has very strong opinions on how you should create and persist your models (that also differs from 2.x and 3.x).

Example workflow (click [here](https://hexdocs.pm/markov/api-reference.html) for full docs):

```elixir

# The model will be stored under this path

{:ok, model} = Markov.load("./model_path", sanitize_tokens: true, store_log: [:train])

# train using four strings

:ok = Markov.train(model, "hello, world!")

:ok = Markov.train(model, "example string number two")

:ok = Markov.train(model, "hello, Elixir!")

:ok = Markov.train(model, "fourth string")

# generate text

{:ok, text} = Markov.generate_text(model)

IO.puts(text)

# commit all changes and unload

Markov.unload(model)

# these will return errors because the model is unloaded

# Markov.generate_text(model)

# Markov.train(model, "hello, world!")

# load the model again

{:ok, model} = Markov.load("./model_path")

# enable probability shifting and generate text

:ok = Markov.configure(model, shift_probabilities: true)

{:ok, text} = Markov.generate_text(model)

IO.puts(text)

# print log

model |> Markov.read_log |> IO.inspect

# this will also write our new just-set option

Markov.unload(model)

```

## Credits

  - [The English dictionary in a CSV format](https://www.bragitoff.com/2016/03/english-dictionary-in-csv-format/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/portasynthinca3/markov

Awesome Lists containing this project

README