https://github.com/willf/wiktionary_to_wordnik
Converts Wiktionary JSON entries to ones more like Wordnik's
https://github.com/willf/wiktionary_to_wordnik
Last synced: 11 months ago
JSON representation
Converts Wiktionary JSON entries to ones more like Wordnik's
- Host: GitHub
- URL: https://github.com/willf/wiktionary_to_wordnik
- Owner: willf
- License: apache-2.0
- Created: 2016-02-17T17:56:05.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-02-17T18:05:58.000Z (over 10 years ago)
- Last Synced: 2025-03-24T12:52:28.598Z (about 1 year ago)
- Language: Python
- Size: 7.81 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# wiktionary_to_wordnik.py
Converts Wiktionary JSON entries to ones more like Wordnik's
Requires:
import html2text
import requests
from pyquery import PyQuery as pq
`pyquery` is really only used for finding cross-references, which could be
done with a regular expression easily enough.
Assuming you have all the requirements:
cat words.txt | python3 wiktionary_to_wordnik.py > definitions.jsonl 2> definitions.errors
`definitions.jsonl` will contain a definition, in JSON format, one per line
`definitions.errors` will contain words that could not be retrieved
Here is an example new definition:
{
"word": "ablator",
"df": [
{
"src": "wiktionary",
"txt": "A material that ablates, vaporizes, wears away, burns off, erodes, or abrades.",
"pos": {
"name": "Noun"
},
"xref": [
"ablates",
"vaporizes"
]
}
]
}