Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tadateruki/name-engine

A Rust library for computing Markov chains to generate random names based on pronunciation
https://github.com/tadateruki/name-engine

Last synced: 19 days ago
JSON representation

A Rust library for computing Markov chains to generate random names based on pronunciation

Host: GitHub
URL: https://github.com/tadateruki/name-engine
Owner: TadaTeruki
License: mpl-2.0
Created: 2024-01-16T12:59:06.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-03-23T22:50:58.000Z (8 months ago)
Last Synced: 2024-03-23T23:32:36.621Z (8 months ago)
Language: Rust
Homepage:
Size: 89.8 KB
Stars: 7
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# name-engine

*Preview: Generating English Place Names `examples/england_evaluated.rs`*

![placename-generation](https://github.com/TadaTeruki/name-engine/assets/69315285/ce7b6b1c-a8ad-477a-9b10-92a27cb6df1c)

`name-engine` is a basic library for computing Markov chains to generate names based on their pronunciation.

This can be used for various purposes, but primarily for generating place names.

## Algorithm

This library computes Markov chains from a dataset of names. The names must be separated by certain user-defined rules, such as syllables. Each of the separated units is treated as a state of the Markov chain.

The transition is defined as the connection between the pronunciations. For example:

- `ŋ` -> `w` in `Ringwood /ˈrɪŋwʊd/` [`(Ring /ˈrɪŋ/)` `(wood /wʊd/)`]

- `k` -> `ə` in `Beccles /ˈbɛkəlz/` [`(Becc /ˈbɛk/)` `(les /əlz/)`]

- `k` -> `ə` and `m` -> `s` in `Berkhamsted /ˈbɜːrkəmstɛd/` [`(Berk /ˈbɜːrk/)` `(ham /əm/)` `(sted /stɛd/)`]

With the data adove, the model can generate `Berkles` from `(Berk /ˈbɜːrk/)` and `(les /əlz/)` by tracking the transition `k` -> `ə`.

The probability of the transition is calculated from the frequency of the connection in the dataset.

## Features
This library does:
- **Create name generator** from dataset of separated names.
- Generate names using Markov chains.

This library DOES NOT:
- Read and parse data from a file.
- **Automatically separate original names according to specific rules, such as syllables.** You must prepare the dataset yourself.
- **Evaluate names.** If you want to generate better names, you must implement the evaluation function and filtering process by yourself.
- **Combine another parameters.** If you want to do, `NameGenerator::generate_verbose` is useful to implement it by yourself.

This library only does the minimal processing necessary to generate names. To create a more practical name generator, some additional processing like above will be required.

## Documentation

Run `cargo doc --open` to see the documentation.

If you want to try it out, see the examples in `examples/`. For the first step, `examples/japanese.rs` is suitable for reading.

## Installation

```sh
[dependencies]
name-engine = "0.1.0"
```

## Examples

#### Generate 100 place names of Hokkaido

```sh
$ cargo run --example hokkaido
```

```
中富 nakatomi
初威冠 shoikappu
上沢 kamizawa
```

#### Generate 100 place names of England

```sh
$ cargo run --example england
```

```
Stoneon /ˈstəʊnən/
Thatchingworth /ˈθætʃɪŋwɜːθ/
Brentgomley /ˈbrɛntɡʌmli/
```

#### Generate 100 place names of England (extracted better ones)

```sh
$ cargo run --example england_evaluated
```

```
Oltham Abbey /ˈoʊlθəm ˈæbi/
Downbury /ˈdaʊnbəri/
Farhead /ˈfɑːrhɛd/
```

#### Generate 100 place names of US (extracted better ones)

```sh
$ cargo run --example us_evaluated
```

```
Winfield /ˈwɪnfiːld/
Perton /ˈpɛrtən/
Kinbridge Falls /ˈkɪnbrɪdʒ fɔːlz/
```

### About the English and US place name data for the examples

For English and US place name data, some symbols are added for better results.
- [1] Spaces are replaced by `+` and treated as independent syllables.
- [2] For the syllable with capital letter, an asterisk `*` is added at the beginning of the pronunciation to become the first syllable of the name or the next syllable of `+`.
- [3] For the pronunciation of the previous syllable of `+`, an asterisk `*` is added at the end of the pronunciation to become the previous syllable of `+`.

**Example**
```
Tunbridge Wells /ˈtʌnbrɪdʒ ˈwɛlz/
(Tun, /*ˈtʌn/) (bridge, /brɪdʒ*/) (+, /+/) (Wells, /*ˈwɛlz/)
```
- `(Tun /ˈtʌn/)` -> `(Tun /*ˈtʌn/)` [2]
- `(bridge /brɪdʒ/)` -> `(bridge /brɪdʒ*/)` [3]
- `(+ /+/)` [1]
- `(Wells /ˈwɛlz/)` -> `(Wells /*ˈwɛlz/)` [2]

Moreover, some suffexes are treated as independent syllables, such as `minster` and `bridge`.

## Data Source

`examples/assets/hokkaido.csv`: Hokkaido Government Opendata CC-BY4.0（https://creativecommons.org/licenses/by/4.0/deed.ja）
Modified from the original data.

Source: https://www.pref.hokkaido.lg.jp/link/shichoson/aiueo.html

## License

This project is licensed under the Mozilla Public License v2.0. See the [LICENSE](LICENSE) file for details.

Note that if you use, copy or modify the code in `examples`, you do not need to worry about the copyleft restrictions. Feel free to use it!