An open API service indexing awesome lists of open source software.

https://github.com/urschrei/lovecraft

A basic NLTK demo, using the collected works of H. P. Lovecraft as a corpus
https://github.com/urschrei/lovecraft

classification corpus frequency-count lovecraft matplotlib nltk

Last synced: 2 months ago
JSON representation

A basic NLTK demo, using the collected works of H. P. Lovecraft as a corpus

Awesome Lists containing this project

README

        

# Classifying and ranking text using NLTK and The Nameless Horror

This is a small demo showing basic NLTK functionality (tokenizing, classifying, frequency counting), using [The Collected Works of H.P. Lovecraft](http://gutenberg.net.au/ebooks06/0600031h.html) as a corpus.
The code ought to be fairly self-explanatory, however:

- The script will write a file, `results.pickle`, to your current working directory upon its first run, because classification is quite slow. This allows you to tune the [tag set](lhttp://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) to be used for frequency counting without having to wait for re-classification each time.
- There's a Jupyter notebook for interactive exploration

## Requirements

- Requests
- BeautifulSoup4
- NLTK
- Matplotlib >= 1.5.x

And for the Notebook:

- Pandas
- Jupyter

## License

MIT, copyright Stephan Hügel 2013

![Fhtagn!](fhtagn.png "Graph your terror!")