https://github.com/urschrei/lovecraft

A basic NLTK demo, using the collected works of H. P. Lovecraft as a corpus
https://github.com/urschrei/lovecraft

classification corpus frequency-count lovecraft matplotlib nltk

Last synced: 2 months ago
JSON representation

A basic NLTK demo, using the collected works of H. P. Lovecraft as a corpus

Host: GitHub
URL: https://github.com/urschrei/lovecraft
Owner: urschrei
Created: 2013-07-22T15:00:12.000Z (almost 12 years ago)
Default Branch: master
Last Pushed: 2017-10-13T18:35:01.000Z (over 7 years ago)
Last Synced: 2025-03-27T12:23:46.780Z (3 months ago)
Topics: classification, corpus, frequency-count, lovecraft, matplotlib, nltk
Language: Jupyter Notebook
Size: 6.27 MB
Stars: 14
Watchers: 3
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Classifying and ranking text using NLTK and The Nameless Horror

This is a small demo showing basic NLTK functionality (tokenizing, classifying, frequency counting), using [The Collected Works of H.P. Lovecraft](http://gutenberg.net.au/ebooks06/0600031h.html) as a corpus.
The code ought to be fairly self-explanatory, however:

- The script will write a file, `results.pickle`, to your current working directory upon its first run, because classification is quite slow. This allows you to tune the [tag set](lhttp://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) to be used for frequency counting without having to wait for re-classification each time.
- There's a Jupyter notebook for interactive exploration

## Requirements

- Requests
- BeautifulSoup4
- NLTK
- Matplotlib >= 1.5.x

And for the Notebook:

- Pandas
- Jupyter

## License

![Fhtagn!](fhtagn.png "Graph your terror!")

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/urschrei/lovecraft

Awesome Lists containing this project

README