Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ninadpatil09/nlp-notebooks

Explore NLP tasks with Python using NLTK, SpaCy & scikit-learn: Tokenization, Normalization, NER, POS tagging, Encoding, Word embedding.
https://github.com/ninadpatil09/nlp-notebooks

natural-language-processing nlp nlp-machine-learning nltk python spacy

Last synced: 3 months ago
JSON representation

Explore NLP tasks with Python using NLTK, SpaCy & scikit-learn: Tokenization, Normalization, NER, POS tagging, Encoding, Word embedding.

Awesome Lists containing this project

README

        

# NLP-Notebooks

This repository contains notebooks showcasing various Natural Language Processing (NLP) tasks implemented using Python and popular NLP libraries such as NLTK, SpaCy, and scikit-learn. The notebooks cover a wide range of NLP tasks including tokenization, normalization (stemming and lemmatization), bags of words, named entity recognition (NER), part-of-speech (POS) tagging, different encoding techniques, word embedding using Word2Vec and GloVe, and TF-IDF (Term Frequency-Inverse Document Frequency).

## Notebooks

- [Tokenization](Tokenization.ipynb) : Notebook demonstrating tokenization techniques using NLTK and SpaCy.
- [Stemming](Stemming.ipynb) : Implemented stemming techniques with NLTK and SpaCy in Python
- [Lemmatization](Lemmatization.ipynb) : Explored lemmatization methods in Python using NLTK and SpaCy
- [Named Entity Recognition](NER.ipynb) : Performed Named Entity Recognition (NER) using NLTK and SpaCy in Python. Understand how to identify and extract named entities such as person names, organization names, locations, etc.
- [Part-of-Speech Tagging](POS_Tagging.ipynb) : Implemented POS tagging techniques with NLTK and SpaCy in Python. Learn how to assign grammatical categories to words in a text corpus, such as noun, verb, adjective, etc.
- [Stopwords](Stopwords.ipynb) : Demonstrated stopwords removal techniques using NLTK and SpaCy in Python. Understand how to filter out common words that do not carry significant meaning in text analysis tasks.

Encoding Techniques -
- [One Hot Encoding](OneHotEncoding.ipynb) : Performed OHE on text documents into binary vectors, demonstrated using NLTK and SpaCy in Python.
- [Bag of Words](BagofWords.ipynb) : Represented text documents as vectors based on word frequency, using NLTK and SpaCy in Python.
- [TF-IDF](TF_IDF.ipynb) : Assigns scores to words in documents based on their frequency (term frequency) and rarity (inverse document frequency), using NLTK and SpaCy in Python.

Word Embedding -
- [Word2Vec](Word2Vec.ipynb) : Implementated of Word2Vec in Python using both pretrained and scratch-built models.
- [Avg Word2Vec](AvgWord2Vec.ipynb) : Utilization of average Word2Vec embeddings in Python, demonstrating efficient word embedding techniques for natural language processing tasks.
- [GloVe](GloVe.ipynb) : Utilized Stanford's pre-trained GloVe model for efficient word embedding in natural language processing tasks.
- [FastText](FastText.ipynb) : Leveraged Gensim and the FastText library for effective text representation and classification using subword information and Skipgram architecture.

## Requirements

- Python 3
- Jupyter Notebook/Google Colab
- NLTK
- SpaCy
- Scikit learn
- Gensim

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.