https://github.com/mattlianje/loquax

NLP framework for phonology
https://github.com/mattlianje/loquax

digital-humanities functional-programming history linguistics nlp nlp-library nlp-parsing phonological-features phonology

Last synced: 7 months ago
JSON representation

NLP framework for phonology

Host: GitHub
URL: https://github.com/mattlianje/loquax
Owner: mattlianje
License: gpl-3.0
Created: 2023-02-10T04:43:34.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-06-21T03:09:43.000Z (over 1 year ago)
Last Synced: 2025-04-09T19:15:37.722Z (7 months ago)
Topics: digital-humanities, functional-programming, history, linguistics, nlp, nlp-library, nlp-parsing, phonological-features, phonology
Language: Python
Homepage:
Size: 7.61 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


    



# Loquax

Loquax, (Latin for "chatty"), is an extensible, zero-dependency, FP-style Python library for phonological analysis. Eventually the python package will be soft-deprecated and this repo will house the compiler for the Loquax DSL. The [loquax web client](https://nargothrond.xyz/loquax).

## Features

- Define your own languages/accents/dialects and analyze texts

- "Out of the box" Classical Latin support

- Syllabification and tokenization of corpora

- Phoneme analysis

- Scansion (long/short only)

- IPA transliteration

## Languages

| Language/Dialect       | IPA  | Syllabification | Scansion |

|------------------------|------|-----------------|----------|

| **Latin/Classical**    | ✓    | ✓               | ✓        |

| **Greek/Classical**    | X    | X               | X        |

## Quickstart

```shell

pip install loquax

``` 

```python

from loquax import Document

from loquax.languages import Latin

catilinarian_orations = Document("Quoūsque tandem abutēre, Catilīna, patientiā nostrā?", Latin)

print(catilinarian_orations.to_string(ipa=True, scansion=True))

# outputs:

# kʷɔ.uːs.kʷɛ    tan.dɛm    a.bʊ.teː.rɛ    ka.tɪ.liː.na    pa.tɪ.ɛn.tɪ.aː    nɔs.traː

#  u   -   u      -   u     u u   -  u     u  u   -  u     u  u  u  u  -      u   -

```

## Syllabification, Tokenization

```python

print(catilinarian_orations.tokens)

# outputs:

# [kʷɔ.uːs.kʷɛ, tan.dɛm, a.bʊ.teː.rɛ, ka.tɪ.liː.na, pa.tɪ.ɛn.tɪ.aː, nɔs.traː]

print(catilinarian_orations.tokens[0].syllables)

# outputs:

# [quo, ūs, que]

```

## Phoneme Analysis

Understand unique sounds and their roles within words relative to a `Language`

```python

from loquax.abstractions import Phoneme

from loquax.languages import Latin

r = Phoneme('r', Latin)

print(r.is_consonant and r.is_liquid)  # outputs: True

```

## Morphology

**The central problem of phonology** is that linguistic units have changing features depending on their context and neighbours. 

Loquax allows users to tackle this by defining their own morphisms. 

```python

from loquax.morphisms import Morphism, Rule, RuleSequence

from loquax.syllables import Syllable

from dataclasses import replace

long_position_morphism = Morphism[Syllable](

    target=Rule[Syllable](check_fn=lambda s: s.nucleus and s.coda and len(s.coda) >= 1),

    transformation=lambda s: replace(s, is_long=True),

    suffix=RuleSequence(

        [Rule[Syllable](check_fn=lambda s: s.coda and len(s.onset) >= 1)]

    ),

)

```

`MorphismStore` lets you organize your morphisms and to apply all transformations in your MorphismStore to a given syllable or phoneme sequence:

```python

from loquax.abstractions import MorphismStore

morphism_store = MorphismStore([morphism1, morphism2, morphism3])

syllables_sequence = [syllable1, syllable2, syllable3]

transformed_sequence = morphism_store.apply_all(syllables_sequence)

```

## Ipa

To convert text into the International Phonetic Alphabet for universal comprehension, 

you can use the `to_string` function with `ipa=True`:

```python

print(catilinarian_orations.to_string(ipa=True))

# outputs:

# kʷɔ.uːs.kʷɛ    tan.dɛm    a.bʊ.teː.rɛ    ka.tɪ.liː.na    pa.tɪ.ɛn.tɪ.aː    nɔs.traː

```

## Scansion

Scansion is the process of marking the stresses in a poem, and dividing the lines into feet. 

It's a critical part of the study and enjoyment of classical verse, like in Latin and Ancient Greek poetry. 

Loquax makes it easy to integrate scansion into your language analysis pipeline.

Currently only differentiation between long and short syllables is made

```python

print(catilinarian_orations.to_string(scansion=True))

# outputs:

# quo.ūs.que    tan.dem    a.bu.tē.re    ca.ti.lī.na    pa.ti.en.ti.ā    nos.trā

#  u  -   u      -   u     u u  -  u     u  u  -  u     u  u  u  u  -     u   -

```

## Extensibility

Loquax allows for extensibility, so you can build and customize your own language rules 

for unique or theoretical languages. Here's an example of how to define custom rules and apply them:

```python

from loquax.languages import Latin

from loquax.abstractions import (

    PhonemeSyllabificationRuleStore, Language, 

    Constants, Tokenizer, MorphismStore, 

    Syllable, Morphism, Phoneme

)

syllabification_rules = PhonemeSyllabificationRuleStore(...)

constants = Constants(...)

tokenizer = Tokenizer(...)

syllable_morphisms = MorphismStore[Syllable]([...])

phoneme_morphisms = MorphismStore[Phoneme]([...])

my_lang = Language(

    language_name='MyLang',

    iso_639_code='myl', 

    constants,

    syllabification_rules,

    syllable_morphisms,

    phoneme_morphisms,

    tokenizer,

)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mattlianje/loquax

Awesome Lists containing this project

README