An open API service indexing awesome lists of open source software.

https://github.com/dhui/extract_words

Extracts words and generate a word list with the given part of speech
https://github.com/dhui/extract_words

Last synced: 7 months ago
JSON representation

Extracts words and generate a word list with the given part of speech

Awesome Lists containing this project

README

          

# extract_words
Extracts words and generates a word list with the given part of speech

# Data Sources

## Brown
Use `brown.py` to extract Brown corpus data. You can find the Brown corpus data
[here](https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip)
## MASC
Use `masc.py` to extract MASC data. You can find MASC data
[here](http://www.anc.org/data/masc/downloads/data-download/). Both v1 and v3 of the data are supported.

# Output
A list of words and counts (separated by spaces) ordered by descending occurrence