Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ltfschoen/aind-nlp


https://github.com/ltfschoen/aind-nlp

artificial-intelligence beautifulsoup feature-extraction machine-learning modelling nanodegree natural-language-processing natural-language-toolkit nlp nltk text-processing udacity web-scraping

Last synced: 21 days ago
JSON representation

Awesome Lists containing this project

README

        

# AIND: Natural Language Processing

Coding exercises for the Natural Language Processing concentration, part of Udacity's Artificial Intelligence Nanodegree program.

## Setup

You need Python 3.6+, and the packages mentioned in `requirements.txt`. You can install them using:

```bash
pip install -r requirements.txt
```

## Data

Data files for exercises are included under `data/`, but some of the NLP libraries require additional data for performing tasks like
PoS tagging, lemmatization, etc. Specifically, `nltk` will throw an error if the required data is not installed. You can use the
following Python statement to open the NLTK downloader and select the desired package(s) to install:

```python
import nltk
nltk.download()
```

This opens a GUI. DO NOT download everthing. Required files include:

* Models > punkt (13MB)
* Corpora > stopwords (11kB)
* All Packages > averaged_perceptron_taggers (2.4MB)
* All Packages > maxent_ne_chunkers" (12.8MB)
* Corpora > Words (740kB)
* Corpora > wordnet (10.3MB)

For each of the above, select it and click "Download" ([explained here](https://stackoverflow.com/questions/26693736/nltk-and-stopwords-fail-lookuperror))

You can also download all available NLTK data packages, which includes a number of sample corpora as well, but that may take a while
(10+GB).

Note: Install GhostScript: `brew install ghostscript` to avoid error `NLTK was unable to find the gs file!` (reference: https://stackoverflow.com/questions/36942270/nltk-was-unable-to-find-the-gs-file)

## Run

To run any script file, use:

```bash
python
```

To open a notebook, use:

```bash
jupyter notebook
```

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Please refer to [Udacity Terms of Service](https://www.udacity.com/legal) for further information.