https://github.com/dhui/extract_words
Extracts words and generate a word list with the given part of speech
https://github.com/dhui/extract_words
Last synced: 7 months ago
JSON representation
Extracts words and generate a word list with the given part of speech
- Host: GitHub
- URL: https://github.com/dhui/extract_words
- Owner: dhui
- License: apache-2.0
- Created: 2017-12-28T07:58:46.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2022-07-06T19:44:53.000Z (over 3 years ago)
- Last Synced: 2025-02-10T09:12:42.003Z (11 months ago)
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# extract_words
Extracts words and generates a word list with the given part of speech
# Data Sources
## Brown
Use `brown.py` to extract Brown corpus data. You can find the Brown corpus data
[here](https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip)
## MASC
Use `masc.py` to extract MASC data. You can find MASC data
[here](http://www.anc.org/data/masc/downloads/data-download/). Both v1 and v3 of the data are supported.
# Output
A list of words and counts (separated by spaces) ordered by descending occurrence