https://github.com/cltk/latin_lexica_perseus

Lexica and lemmata for the Latin language, from various sources
https://github.com/cltk/latin_lexica_perseus

Last synced: about 1 year ago
JSON representation

Lexica and lemmata for the Latin language, from various sources

Host: GitHub
URL: https://github.com/cltk/latin_lexica_perseus
Owner: cltk
Created: 2014-10-11T16:52:27.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2016-06-20T19:02:42.000Z (about 10 years ago)
Last Synced: 2025-03-24T10:21:15.124Z (over 1 year ago)
Language: Python
Size: 23 MB
Stars: 5
Watchers: 6
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          #### latin\_english\_lexicon\_old.xml

The file latin\_english\_lexicon\_old.xml comes from Perseus, where it is was called 1999.04.0059.xml () and licensed under the [Mozilla Public License 1.1 (MPL 1.1)](http://www.mozilla.org/MPL/1.1/).

#### Parsing

To parse with Python, the following works:

```

from lxml import etree

from io import StringIO

import os

old_path = os.path.expanduser('~/cltk_data/latin/lexicon/latin_lexica_perseus/latin_english_lexicon_old.xml')

with open(old_path) as f:

    old_xml = f.read()

tree = etree.parse(StringIO(old_xml))

entries = tree.xpath('/TEI.2/text/body/div0/entryFree')

print(len(entries))  # 51594

for x in entries:

    print(x.get('key'))

    input()

```

#### latin-analyses.json

The file `latin-analyses.json` contains definitions for word present in latin text corpus. The definitions are scraped from the [Perseues](http://www.perseus.tufts.edu/hopper/morph) website using `scraper.py`

#### Scraping

```

python3 scraper.py   

```

The scraper tries to fetch definitions for words present in the input file at start of each line.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cltk/latin_lexica_perseus

Awesome Lists containing this project

README