https://github.com/devdhera/guide-to-nlp-with-python
A Simple NLP starter with the help of beautiful Python
https://github.com/devdhera/guide-to-nlp-with-python
natural-language-processing nltk python sklearn
Last synced: about 1 month ago
JSON representation
A Simple NLP starter with the help of beautiful Python
- Host: GitHub
- URL: https://github.com/devdhera/guide-to-nlp-with-python
- Owner: DevDHera
- License: mit
- Created: 2019-02-05T07:23:01.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-07T18:02:17.000Z (over 7 years ago)
- Last Synced: 2025-02-06T04:43:54.426Z (over 1 year ago)
- Topics: natural-language-processing, nltk, python, sklearn
- Language: Jupyter Notebook
- Homepage:
- Size: 248 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: News-Classifier.ipynb
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Guide to NLP using Python :sunglasses:
This repository focuses on **NLP (Natural Language Processing)** using the Python Language.
## Contents
`News-Classifier` notebook focus on categorizing news articles by processing the body of the content.
You can tweak the code a bit to improve the accuracy of the predictions by combining both Body and Title for processing.
## Steps to Run
First clone the repository.
```sh
git clone https://github.com/DevDHera/Guide-to-NLP-with-Python.git
```
Now open the juputer notebook and classify news articles to your choice.
All the data sets are included inside `dataset` directory.
## Tech Stack
Following are some of the packages we use to build our classifier.
* nltk - Stopwords, Stemming
* pandas - To read TSV, create dataframes
* matplotlib, seaborn - Data visualizations
* string - To remove punctuations
* sklearn - CountVectorizer, TfidfTransformer, MultinomialNB, Pipeline
## Code Samples
We import the data into notebook like below.
``` python
news = pd.read_csv('dataset/trainset.txt', sep='\t', names=['CLASS', 'TITLE', 'DATE', 'BODY'])
```
Also, we use pipelines to make our life easier :sleeping:.
``` python
pipeline = Pipeline([
('bow', CountVectorizer(analyzer=text_process)),
('tfidf', TfidfTransformer()),
('classifier', MultinomialNB())
])
```
Improve the classifier and :heart: share the knowledge :bouquet: :blush: