https://github.com/cltk/latin_lexica_perseus
Lexica and lemmata for the Latin language, from various sources
https://github.com/cltk/latin_lexica_perseus
Last synced: about 1 year ago
JSON representation
Lexica and lemmata for the Latin language, from various sources
- Host: GitHub
- URL: https://github.com/cltk/latin_lexica_perseus
- Owner: cltk
- Created: 2014-10-11T16:52:27.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2016-06-20T19:02:42.000Z (about 10 years ago)
- Last Synced: 2025-03-24T10:21:15.124Z (over 1 year ago)
- Language: Python
- Size: 23 MB
- Stars: 5
- Watchers: 6
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
#### latin\_english\_lexicon\_old.xml
The file latin\_english\_lexicon\_old.xml comes from Perseus, where it is was called 1999.04.0059.xml () and licensed under the [Mozilla Public License 1.1 (MPL 1.1)](http://www.mozilla.org/MPL/1.1/).
#### Parsing
To parse with Python, the following works:
```
from lxml import etree
from io import StringIO
import os
old_path = os.path.expanduser('~/cltk_data/latin/lexicon/latin_lexica_perseus/latin_english_lexicon_old.xml')
with open(old_path) as f:
old_xml = f.read()
tree = etree.parse(StringIO(old_xml))
entries = tree.xpath('/TEI.2/text/body/div0/entryFree')
print(len(entries)) # 51594
for x in entries:
print(x.get('key'))
input()
```
#### latin-analyses.json
The file `latin-analyses.json` contains definitions for word present in latin text corpus. The definitions are scraped from the [Perseues](http://www.perseus.tufts.edu/hopper/morph) website using `scraper.py`
#### Scraping
```
python3 scraper.py
```
The scraper tries to fetch definitions for words present in the input file at start of each line.