https://github.com/ltfschoen/aind-nlp

artificial-intelligence beautifulsoup feature-extraction machine-learning modelling nanodegree natural-language-processing natural-language-toolkit nlp nltk text-processing udacity web-scraping

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/ltfschoen/aind-nlp
Owner: ltfschoen
Created: 2017-07-01T13:21:27.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2017-07-02T23:51:54.000Z (about 8 years ago)
Last Synced: 2025-01-04T15:45:11.279Z (6 months ago)
Topics: artificial-intelligence, beautifulsoup, feature-extraction, machine-learning, modelling, nanodegree, natural-language-processing, natural-language-toolkit, nlp, nltk, text-processing, udacity, web-scraping
Language: Jupyter Notebook
Size: 620 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# AIND: Natural Language Processing

Coding exercises for the Natural Language Processing concentration, part of Udacity's Artificial Intelligence Nanodegree program.

## Setup

You need Python 3.6+, and the packages mentioned in `requirements.txt`. You can install them using:

```bash
pip install -r requirements.txt
```

## Data

Data files for exercises are included under `data/`, but some of the NLP libraries require additional data for performing tasks like
PoS tagging, lemmatization, etc. Specifically, `nltk` will throw an error if the required data is not installed. You can use the
following Python statement to open the NLTK downloader and select the desired package(s) to install:

```python
import nltk
nltk.download()
```

This opens a GUI. DO NOT download everthing. Required files include:

* Models > punkt (13MB)
* Corpora > stopwords (11kB)
* All Packages > averaged_perceptron_taggers (2.4MB)
* All Packages > maxent_ne_chunkers" (12.8MB)
* Corpora > Words (740kB)
* Corpora > wordnet (10.3MB)

For each of the above, select it and click "Download" ([explained here](https://stackoverflow.com/questions/26693736/nltk-and-stopwords-fail-lookuperror))

You can also download all available NLTK data packages, which includes a number of sample corpora as well, but that may take a while
(10+GB).

Note: Install GhostScript: `brew install ghostscript` to avoid error `NLTK was unable to find the gs file!` (reference: https://stackoverflow.com/questions/36942270/nltk-was-unable-to-find-the-gs-file)

## Run

To run any script file, use:

```bash
python
```

To open a notebook, use:

```bash
jupyter notebook
```

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Please refer to [Udacity Terms of Service](https://www.udacity.com/legal) for further information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ltfschoen/aind-nlp

Awesome Lists containing this project

README