Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/quadrismegistus/prosodic

Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
https://github.com/quadrismegistus/prosodic

finnish-language-analysis linguistics metrical-parser nlp poetry rhythm

Last synced: 6 days ago
JSON representation

Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.

Awesome Lists containing this project

README

        

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Prosodic 2\n",
"\n",
"[![Code coverage](https://codecov.io/gh/quadrismegistus/prosodic/branch/develop/graph/badge.svg)](https://codecov.io/gh/quadrismegistus/prosodic)\n",
"\n",
"Prosodic is a metrical-phonological parser written in Python. Currently, it can parse English and Finnish text, but adding additional languages is easy with a pronunciation dictionary or a custom python function. Prosodic was built by [Ryan Heuser](https://github.com/quadrismegistus), [Josh Falk](https://github.com/jsfalk), and [Arto Anttila](http://web.stanford.edu/~anttila/). Josh also maintains [another repository](https://github.com/jsfalk/prosodic1b), in which he has rewritten the part of this project that does phonetic transcription for English and Finnish. [Sam Bowman](https://github.com/sleepinyourhat) has contributed to the codebase as well, adding several new metrical constraints.\n",
"\n",
"\"Prosodic 2\", in this `develop` branch, is a near-total rewrite of Prosodic.\n",
"\n",
"Supports Python>=3.8."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Install python package\n",
"\n",
"For now, pip-install directly from this develop branch:\n",
"\n",
"```\n",
"pip install git+https://github.com/quadrismegistus/prosodic@develop\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Install espeak\n",
"\n",
"Install [espeak](https://espeak.sourceforge.net), free text-to-speak (TTS) software, to ‘sound out’ unknown words.\n",
"\n",
"* *Mac*: `brew install espeak`. (First install [homebrew](brew.sh) if not already installed.)\n",
"\n",
"* *Linux*: `apt-get install espeak`\n",
"\n",
"* *Windows*: Download and install from http://espeak.sourceforge.net/download.html."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"### Web app"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Prosodic has a new GUI (graphical user interface) in a web app. After installing, run:\n",
"\n",
"```\n",
"prosodic\n",
"```\n",
"\n",
"Then navigate to [http://127.0.0.1:5000/](http://127.0.0.1:5000/). It should look like this:\n",
"\n",
"\"prosodic-gui2\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Read texts"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# import prosodic\n",
"import prosodic\n",
"\n",
"# load a text\n",
"sonnet = prosodic.Text(\"\"\"\n",
"Those hours, that with gentle work did frame\n",
"The lovely gaze where every eye doth dwell,\n",
"Will play the tyrants to the very same\n",
"And that unfair which fairly doth excel;\n",
"For never-resting time leads summer on\n",
"To hideous winter, and confounds him there;\n",
"Sap checked with frost, and lusty leaves quite gone,\n",
"Beauty o’er-snowed and bareness every where:\n",
"Then were not summer’s distillation left,\n",
"A liquid prisoner pent in walls of glass,\n",
"Beauty’s effect with beauty were bereft,\n",
"Nor it, nor no remembrance what it was:\n",
"But flowers distill’d, though they with winter meet,\n",
"Leese but their show; their substance still lives sweet.\n",
"\"\"\")\n",
"\n",
"# can also load by filename\n",
"shaksonnets = prosodic.Text(fn='corpora/corppoetry_en/en.shakespeare.txt')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Stanzas, lines, words, syllables, phonemes\n",
"\n",
"Texts in prosodic are organized into a tree structure. The `.children` of a `Text` object is a list of `Stanza`'s, whose `.parent` objects point back to the `Text`. In turn, in each stanza's `.children` is a list of `Line`'s, whose `.parent`'s point back to the stanza; so on down the tree."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Text()\n",
"| Stanza(num=1)\n",
"| Line(num=1, txt='Those hours, that with gentle work did frame')\n",
"| WordToken(num=1, txt='Those', sent_num=1, sentpart_num=1)\n",
"| WordType(num=1, txt='Those', lang='en', num_forms=1)\n",
"| WordForm(num=1, txt='Those')\n",
"| Syllable(ipa='ðoʊz', num=1, txt='Those', is_stressed=False, is_heavy=True)\n",
"| Phoneme(num=1, txt='ð', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)\n",
"| Phoneme(num=3, txt='o', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=1, round=1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)\n",
"| Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)\n",
"| Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)\n",
"| WordToken(num=2, txt=' hours', sent_num=1, sentpart_num=1)\n",
"| WordType(num=1, txt='hours', lang='en', num_forms=2)\n",
"| WordForm(num=1, txt='hours')\n",
"| Syllable(ipa=\"'aʊ\", num=1, txt='ho', is_stressed=True, is_heavy=True, is_strong=True, is_weak=False)\n",
"| Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)\n",
"| Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)\n",
"| Syllable(ipa='ɛːz', num=2, txt='urs', is_stressed=False, is_heavy=True, is_strong=False, is_weak=True)\n",
"| Phoneme(num=2, txt='ɛː', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=-1, long=1, hitone=0, hireg=0)\n",
"| Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)\n",
"| WordForm(num=2, txt='hours')\n",
"| Syllable(ipa=\"'aʊrz\", num=1, txt='hours', is_stressed=True, is_heavy=True)\n",
"| Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)\n",
"| Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)\n",
"| Phoneme(num=4, txt='r', syl=-1, son=1, cons=1, cont=1, delrel=0, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=0, lo=0, back=0, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)\n",
"| Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)\n",
"| WordToken(num=3, txt=',', sent_num=1, sentpart_num=1)\n",
"| WordType(num=1, txt=',', lang='en', num_forms=0, is_punc=True)\n",
"| WordToken(num=4, txt=' that', sent_num=1, sentpart_num=1)\n",
"| WordType(num=1, txt='that', lang='en', num_forms=3)\n"
]
}
],
"source": [
"# Take a peek at this tree structure \n",
"# and the features particular entities have\n",
"sonnet.show(maxlines=30, incl_phons=True)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"

\n",
"\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" word_num_forms\n",
" syll_is_stressed\n",
" syll_is_heavy\n",
" syll_is_strong\n",
" syll_is_weak\n",
" word_is_punc\n",
" \n",
" \n",
" stanza_num\n",
" line_num\n",
" line_txt\n",
" sent_num\n",
" sentpart_num\n",
" wordtoken_num\n",
" wordtoken_txt\n",
" word_lang\n",
" wordform_num\n",
" syll_num\n",
" syll_txt\n",
" syll_ipa\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 1\n",
" 1\n",
" Those hours, that with gentle work did frame\n",
" 1\n",
" 1\n",
" 1\n",
" Those\n",
" en\n",
" 1\n",
" 1\n",
" Those\n",
" ðoʊz\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2\n",
" hours\n",
" en\n",
" 1\n",
" 1\n",
" ho\n",
" 'aʊ\n",
" 2\n",
" 1\n",
" 1\n",
" 1\n",
" 0\n",
" \n",
" \n",
" \n",
" 2\n",
" urs\n",
" ɛːz\n",
" 2\n",
" 0\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" 2\n",
" 1\n",
" hours\n",
" 'aʊrz\n",
" 2\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 3\n",
" ,\n",
" en\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 0\n",
" \n",
" \n",
" \n",
" \n",
" 1\n",
" \n",
" \n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" \n",
" \n",
" 14\n",
" Leese but their show; their substance still lives sweet.\n",
" 1\n",
" 1\n",
" 7\n",
" substance\n",
" en\n",
" 1\n",
" 2\n",
" tance\n",
" stəns\n",
" 1\n",
" 0\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" 8\n",
" still\n",
" en\n",
" 1\n",
" 1\n",
" still\n",
" 'stɪl\n",
" 1\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 9\n",
" lives\n",
" en\n",
" 1\n",
" 1\n",
" lives\n",
" 'lɪvz\n",
" 1\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 10\n",
" sweet\n",
" en\n",
" 1\n",
" 1\n",
" sweet\n",
" 'swiːt\n",
" 1\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 11\n",
" .\n",
" en\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 0\n",
" \n",
" \n",
" \n",
" \n",
" 1\n",
" \n",
" \n",
"\n",
"

195 rows × 6 columns

\n",
"
"
],
"text/plain": [
"Text()"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# take a peek at it in dataframe form\n",
"sonnet.df # by-syllable dataframe representation\n",
"sonnet # ...which will also be shown when text object displayed (in a notebook)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# you can loop over this directly if you want\n",
"for stanza in shaksonnets.stanzas:\n",
" for line in sonnet:\n",
" for wordtoken in line:\n",
" for wordtype in wordtoken:\n",
" for wordform in wordtype:\n",
" for syllable in wordform:\n",
" for phoneme in syllable:\n",
" # ...\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Shakespeare's sonnets have:\n",
" * 154 \"stanzas\" (in this text, each one a sonnet)\n",
" * 2,155 lines\n",
" * 20,317 wordtokens (including punctuation)\n",
" * 20,317 wordtypes (each token has one wordtype object)\n",
" * 17,601 wordforms (a word + IPA pronunciation; no punctuation)\n",
" * 21,915 syllables\n",
" * 63,614 phonemes\n",
"\n"
]
}
],
"source": [
"# or directly access components\n",
"print(f'''\n",
"Shakespeare's sonnets have:\n",
" * {len(shaksonnets.stanzas):,} \"stanzas\" (in this text, each one a sonnet)\n",
" * {len(shaksonnets.lines):,} lines\n",
" * {len(shaksonnets.wordtokens):,} wordtokens (including punctuation)\n",
" * {len(shaksonnets.wordtypes):,} wordtypes (each token has one wordtype object)\n",
" * {len(shaksonnets.wordforms):,} wordforms (a word + IPA pronunciation; no punctuation)\n",
" * {len(shaksonnets.syllables):,} syllables\n",
" * {len(shaksonnets.phonemes):,} phonemes\n",
"''')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" word_num_forms\n",
" syll_is_stressed\n",
" syll_is_heavy\n",
" syll_is_strong\n",
" syll_is_weak\n",
" word_is_punc\n",
" \n",
" \n",
" line_num\n",
" line_txt\n",
" sent_num\n",
" sentpart_num\n",
" wordtoken_num\n",
" wordtoken_txt\n",
" word_lang\n",
" wordform_num\n",
" syll_num\n",
" syll_txt\n",
" syll_ipa\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 1\n",
" Those hours, that with gentle work did frame\n",
" 1\n",
" 1\n",
" 1\n",
" Those\n",
" en\n",
" 1\n",
" 1\n",
" Those\n",
" ðoʊz\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2\n",
" hours\n",
" en\n",
" 1\n",
" 1\n",
" ho\n",
" 'aʊ\n",
" 2\n",
" 1\n",
" 1\n",
" 1\n",
" 0\n",
" \n",
" \n",
" \n",
" 2\n",
" urs\n",
" ɛːz\n",
" 2\n",
" 0\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" 2\n",
" 1\n",
" hours\n",
" 'aʊrz\n",
" 2\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 3\n",
" ,\n",
" en\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 0\n",
" \n",
" \n",
" \n",
" \n",
" 1\n",
" \n",
" \n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" \n",
" \n",
" 6\n",
" gentle\n",
" en\n",
" 1\n",
" 2\n",
" tle\n",
" təl\n",
" 1\n",
" 0\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" 7\n",
" work\n",
" en\n",
" 1\n",
" 1\n",
" work\n",
" 'wɛːk\n",
" 1\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 8\n",
" did\n",
" en\n",
" 1\n",
" 1\n",
" did\n",
" dɪd\n",
" 2\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2\n",
" 1\n",
" did\n",
" 'dɪd\n",
" 2\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 9\n",
" frame\n",
" en\n",
" 1\n",
" 1\n",
" frame\n",
" 'freɪm\n",
" 1\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"

15 rows × 6 columns

\n",
"
"
],
"text/plain": [
"Line(num=1, txt='Those hours, that with gentle work did frame')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# access lines\n",
"\n",
"# text.line{num} will return text.lines[num-1]\n",
"assert sonnet.line1 is sonnet.lines[0]\n",
"assert sonnet.line10 is sonnet.lines[9]\n",
"\n",
"# show the line\n",
"sonnet.line1"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[1mtokenizing\u001b[0m\u001b[36m @ 2023-12-15 14:14:17,991\u001b[0m\n",
"\u001b[34m\u001b[1m⎿ 0 seconds\u001b[0m\u001b[36m @ 2023-12-15 14:14:17,992\u001b[0m\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" word_num_forms\n",
" syll_is_stressed\n",
" syll_is_heavy\n",
" word_is_punc\n",
" syll_is_strong\n",
" syll_is_weak\n",
" \n",
" \n",
" line_txt\n",
" sent_num\n",
" sentpart_num\n",
" wordtoken_num\n",
" wordtoken_txt\n",
" word_lang\n",
" wordform_num\n",
" syll_num\n",
" syll_txt\n",
" syll_ipa\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" A horse, a horse, my kingdom for a horse!\n",
" 1\n",
" 1\n",
" 1\n",
" A\n",
" en\n",
" 1\n",
" 1\n",
" A\n",
" eɪ\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2\n",
" horse\n",
" en\n",
" 1\n",
" 1\n",
" horse\n",
" 'hɔːrs\n",
" 1\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 3\n",
" ,\n",
" en\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 0\n",
" \n",
" \n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" 4\n",
" a\n",
" en\n",
" 1\n",
" 1\n",
" a\n",
" eɪ\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 5\n",
" horse\n",
" en\n",
" 1\n",
" 1\n",
" horse\n",
" 'hɔːrs\n",
" 1\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" \n",
" \n",
" 8\n",
" kingdom\n",
" en\n",
" 1\n",
" 2\n",
" dom\n",
" dəm\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" 0\n",
" 1\n",
" \n",
" \n",
" 9\n",
" for\n",
" en\n",
" 1\n",
" 1\n",
" for\n",
" fɔːr\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 10\n",
" a\n",
" en\n",
" 1\n",
" 1\n",
" a\n",
" eɪ\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 11\n",
" horse\n",
" en\n",
" 1\n",
" 1\n",
" horse\n",
" 'hɔːrs\n",
" 1\n",
" 1\n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 12\n",
" !\n",
" en\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 0\n",
" \n",
" \n",
" 1\n",
" \n",
" \n",
" \n",
" \n",
"\n",
"

13 rows × 6 columns

\n",
"
"
],
"text/plain": [
"Line(txt='A horse, a horse, my kingdom for a horse!')"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# build lines directly\n",
"line_from_richardIII = prosodic.Line('A horse, a horse, my kingdom for a horse!')\n",
"line_from_richardIII"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Metrical parsing\n",
"\n",
"##### Parsing lines"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" parse_score\n",
" parse_is_bounded\n",
" meterpos_num_slots\n",
" *w_peak\n",
" *w_stress\n",
" *s_unstress\n",
" *unres_across\n",
" *unres_within\n",
" \n",
" \n",
" line_txt\n",
" parse_rank\n",
" parse_txt\n",
" parse_meter\n",
" parse_stress\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" A horse, a horse, my kingdom for a horse!\n",
" 1\n",
" a HORSE a HORSE my KING dom FOR a HORSE\n",
" -+-+-+-+-+\n",
" -+-+-+---+\n",
" 1.0\n",
" 0.0\n",
" 10\n",
" 0\n",
" 0\n",
" 1\n",
" 0\n",
" 0\n",
" \n",
" \n",
"\n",
"
"
],
"text/plain": [
"ParseList(line_txt='A horse, a horse, my kingdom for a horse!', num_parses=1)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# parse with default options by just reaching for best parse\n",
"plausible_parses = line_from_richardIII.parse()\n",
"plausible_parses"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
".violation { color:#f43838; }\n",
".meter_strong { text-decoration: overline;}\n",
".miniquote { margin-left:0em;margin-top:.5em;font-family:monospace; font-size:.8em;}\n",
".parse {font-family:monospace;}\n",
".stress_strong { text-decoration: underline; text-underline-offset: 3px; }\n",
".stress_strong.meter_strong { text-decoration: underline overline; text-underline-offset: 3px; }\n",
"

A horse a horse my kingdom for a horse
⎿ Parse(rank=1, meter='-+-+-+-+-+', stress='-+-+-+---+', score=1, is_bounded=0)
"
],
"text/plain": [
"Parse(line_txt='A horse, a horse, my kingdom for a horse!', txt='a HORSE a HORSE my KING dom FOR a HORSE', rank=1, meter='-+-+-+-+-+', stress='-+-+-+---+', score=1, is_bounded=0)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# see best parse\n",
"line_from_richardIII.best_parse"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" parse_score\n",
" parse_is_bounded\n",
" meterpos_num_slots\n",
" *w_peak\n",
" *s_unstress\n",
" \n",
" \n",
" line_txt\n",
" parse_rank\n",
" parse_txt\n",
" parse_meter\n",
" parse_stress\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" A horse, a horse, my kingdom for a horse!\n",
" 1\n",
" a HORSE a HORSE my KING dom FOR a HORSE\n",
" -+-+-+-+-+\n",
" -+-+-+---+\n",
" 1.0\n",
" 0.0\n",
" 10\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 2\n",
" a HORSE a HORSE my KING dom FOR a.horse\n",
" -+-+-+-+--\n",
" -+-+-+---+\n",
" 1.0\n",
" 0.0\n",
" 12\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 3\n",
" a HORSE a HORSE my KING dom.for A horse\n",
" -+-+-+--+-\n",
" -+-+-+---+\n",
" 1.0\n",
" 0.0\n",
" 12\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 4\n",
" a HORSE a HORSE my KING dom.for A.HORSE\n",
" -+-+-+--++\n",
" -+-+-+---+\n",
" 1.0\n",
" 0.0\n",
" 14\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 5\n",
" a HORSE a HORSE my KING.DOM for.a HORSE\n",
" -+-+-++--+\n",
" -+-+-+---+\n",
" 1.0\n",
" 0.0\n",
" 14\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 6\n",
" a HORSE a HORSE my KING dom FOR.A horse\n",
" -+-+-+-++-\n",
" -+-+-+---+\n",
" 2.0\n",
" 0.0\n",
" 12\n",
" 0\n",
" 2\n",
" \n",
" \n",
"\n",
"
"
],
"text/plain": [
"ParseList(line_txt='A horse, a horse, my kingdom for a horse!', num_parses=6)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# parse with different options\n",
"diff_parses = line_from_richardIII.parse(constraints=('w_peak','s_unstress'))\n",
"diff_parses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Parsing texts"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[1mparsing 14 lines [5x]\u001b[0m\u001b[36m @ 2023-12-15 14:17:43,563\u001b[0m\n",
"\u001b[1;34m│ stanza 01, line 14: LEESE but.their SHOW their SUBS tance STILL lives SWEET: 100%|\u001b[0;36m██████████\u001b[0;36m| 14/14 [00:00<00:00, 45.78it/s]\n",
"\u001b[34m\u001b[1m⎿ 0.3 seconds\u001b[0m\u001b[36m @ 2023-12-15 14:17:43,873\u001b[0m\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" parse_score\n",
" parse_is_bounded\n",
" meterpos_num_slots\n",
" *w_peak\n",
" *w_stress\n",
" *s_unstress\n",
" *unres_across\n",
" *unres_within\n",
" \n",
" \n",
" stanza_num\n",
" line_num\n",
" line_txt\n",
" parse_rank\n",
" parse_txt\n",
" parse_meter\n",
" parse_stress\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 1\n",
" 1\n",
" Those hours, that with gentle work did frame\n",
" 1\n",
" those HO urs THAT with GEN tle WORK did FRAME\n",
" -+-+-+-+-+\n",
" -+-+-+-+-+\n",
" 0.0\n",
" 0.0\n",
" 10\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 2\n",
" those HOURS that.with GEN tle WORK did FRAME\n",
" -+--+-+-+\n",
" -+--+-+-+\n",
" 0.0\n",
" 0.0\n",
" 11\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 3\n",
" those HOURS that.with GEN tle WORK did FRAME\n",
" -+--+-+-+\n",
" -+--+-+-+\n",
" 0.0\n",
" 0.0\n",
" 11\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 2\n",
" The lovely gaze where every eye doth dwell,\n",
" 1\n",
" the LO vely GAZE where E very EYE doth DWELL\n",
" -+-+-+-+-+\n",
" -+-+-+-+-+\n",
" 0.0\n",
" 0.0\n",
" 10\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 2\n",
" the LO vely GAZE where E ve.ry EYE doth DWELL\n",
" -+-+-+--+-+\n",
" -+-+-+--+-+\n",
" 1.0\n",
" 0.0\n",
" 13\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" 1\n",
" \n",
" \n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" \n",
" \n",
" 13\n",
" But flowers distill'd, though they with winter meet,\n",
" 1\n",
" but FLO wers DIS.TILL'D though THEY with WIN ter MEET\n",
" -+-++-+-+-+\n",
" -+--+-+-+-+\n",
" 2.0\n",
" 0.0\n",
" 13\n",
" 0\n",
" 0\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 2\n",
" but FLO wers.dis TILL'D though THEY with WIN ter MEET\n",
" -+--+-+-+-+\n",
" -+--+-+-+-+\n",
" 2.0\n",
" 0.0\n",
" 13\n",
" 0\n",
" 0\n",
" 0\n",
" 2\n",
" 0\n",
" \n",
" \n",
" 3\n",
" but FLO.WERS dis TILL'D though THEY with WIN ter MEET\n",
" -++-+-+-+-+\n",
" -+--+-+-+-+\n",
" 2.0\n",
" 0.0\n",
" 13\n",
" 0\n",
" 0\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 4\n",
" but FLO wers DIS till'd THOUGH they.with WIN ter MEET\n",
" -+-+-+--+-+\n",
" -+--+---+-+\n",
" 4.0\n",
" 0.0\n",
" 13\n",
" 1\n",
" 1\n",
" 2\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 14\n",
" Leese but their show; their substance still lives sweet.\n",
" 1\n",
" LEESE but.their SHOW their SUBS tance STILL lives SWEET\n",
" +--+-+-+-+\n",
" +--+-+-+++\n",
" 1.0\n",
" 0.0\n",
" 12\n",
" 0\n",
" 1\n",
" 0\n",
" 0\n",
" 0\n",
" \n",
" \n",
"\n",
"

37 rows × 8 columns

\n",
"
"
],
"text/plain": [
"ParseList(line_num=1, line_txt='Those hours, that with gentle work did frame', num_parses=37)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# small texts\n",
"sonnet.parse()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[1mparsing 2155 lines [5x]\u001b[0m\u001b[36m @ 2023-12-15 14:17:52,124\u001b[0m\n",
"\u001b[1;34m│ stanza 154, line 14: love's FI re HEATS.WA ter WA ter COOLS not LOVE : 100%|\u001b[0;36m██████████\u001b[0;36m| 2155/2155 [00:56<00:00, 38.03it/s]\n",
"\u001b[34m\u001b[1m⎿ 57.4 seconds\u001b[0m\u001b[36m @ 2023-12-15 14:18:49,496\u001b[0m\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" parse_score\n",
" parse_is_bounded\n",
" meterpos_num_slots\n",
" *w_peak\n",
" *w_stress\n",
" *s_unstress\n",
" *unres_across\n",
" *unres_within\n",
" \n",
" \n",
" stanza_num\n",
" line_num\n",
" line_txt\n",
" parse_rank\n",
" parse_txt\n",
" parse_meter\n",
" parse_stress\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 1\n",
" 1\n",
" FROM fairest creatures we desire increase,\n",
" 1\n",
" from FAI rest CREA tures WE de SIRE in CREASE\n",
" -+-+-+-+-+\n",
" -+-+-+-+-+\n",
" 0.0\n",
" 0.0\n",
" 10\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 2\n",
" from FAI rest CREA tures WE de SI re IN crease\n",
" -+-+-+-+-+-\n",
" -+-+-+-+-++\n",
" 1.0\n",
" 0.0\n",
" 11\n",
" 0\n",
" 1\n",
" 0\n",
" 0\n",
" 0\n",
" \n",
" \n",
" 3\n",
" from FAI rest CREA tures WE de SI re IN.CREASE\n",
" -+-+-+-+-++\n",
" -+-+-+-+-++\n",
" 1.0\n",
" 0.0\n",
" 13\n",
" 0\n",
" 0\n",
" 0\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 4\n",
" from FAI rest CREA tures WE de SI re.in CREASE\n",
" -+-+-+-+--+\n",
" -+-+-+-+--+\n",
" 2.0\n",
" 0.0\n",
" 13\n",
" 0\n",
" 0\n",
" 0\n",
" 2\n",
" 0\n",
" \n",
" \n",
" 2\n",
" That thereby beauty's rose might never die,\n",
" 1\n",
" that THE reby BEA uty's ROSE might NE ver DIE\n",
" -+-+-+-+-+\n",
" -+++-+-+-+\n",
" 1.0\n",
" 0.0\n",
" 10\n",
" 0\n",
" 1\n",
" 0\n",
" 0\n",
" 0\n",
" \n",
" \n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" ...\n",
" \n",
" \n",
" 154\n",
" 14\n",
" Love's fire heats water, water cools not love.\n",
" 2\n",
" love's FI re HEATS wa.ter WA ter COOLS not LOVE\n",
" -+-+--+-+-+\n",
" ++-++-+-+-+\n",
" 4.0\n",
" 0.0\n",
" 13\n",
" 1\n",
" 2\n",
" 0\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 3\n",
" love's FI.RE heats WA ter WA ter COOLS not LOVE\n",
" -++-+-+-+-+\n",
" ++-++-+-+-+\n",
" 4.0\n",
" 0.0\n",
" 13\n",
" 0\n",
" 2\n",
" 1\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 4\n",
" LOVE'S fire HEATS wa.ter WA ter COOLS not LOVE\n",
" +-+--+-+-+\n",
" ++++-+-+-+\n",
" 4.0\n",
" 0.0\n",
" 12\n",
" 1\n",
" 2\n",
" 0\n",
" 0\n",
" 1\n",
" \n",
" \n",
" 5\n",
" LOVE'S.FI re HEATS.WA ter WA ter COOLS not LOVE\n",
" ++-++-+-+-+\n",
" ++-++-+-+-+\n",
" 4.0\n",
" 0.0\n",
" 15\n",
" 0\n",
" 0\n",
" 0\n",
" 4\n",
" 0\n",
" \n",
" \n",
" 6\n",
" love's FI re HEATS wa TER wa TER cools NOT love\n",
" -+-+-+-+-+-\n",
" ++-++-+-+++\n",
" 9.0\n",
" 0.0\n",
" 11\n",
" 2\n",
" 5\n",
" 2\n",
" 0\n",
" 0\n",
" \n",
" \n",
"\n",
"

7277 rows × 8 columns

\n",
"
"
],
"text/plain": [
"ParseList(line_num=1, line_txt='FROM fairest creatures we desire increase,', num_parses=7277)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# and big texts\n",
"shaksonnets.parse()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}