Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ninadpatil09/nlp-notebooks
Explore NLP tasks with Python using NLTK, SpaCy & scikit-learn: Tokenization, Normalization, NER, POS tagging, Encoding, Word embedding.
https://github.com/ninadpatil09/nlp-notebooks
natural-language-processing nlp nlp-machine-learning nltk python spacy
Last synced: 3 months ago
JSON representation
Explore NLP tasks with Python using NLTK, SpaCy & scikit-learn: Tokenization, Normalization, NER, POS tagging, Encoding, Word embedding.
- Host: GitHub
- URL: https://github.com/ninadpatil09/nlp-notebooks
- Owner: ninadpatil09
- License: mit
- Created: 2024-04-02T06:46:51.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-04-09T12:48:35.000Z (10 months ago)
- Last Synced: 2024-10-14T04:02:54.916Z (3 months ago)
- Topics: natural-language-processing, nlp, nlp-machine-learning, nltk, python, spacy
- Language: Jupyter Notebook
- Homepage:
- Size: 80.1 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# NLP-Notebooks
This repository contains notebooks showcasing various Natural Language Processing (NLP) tasks implemented using Python and popular NLP libraries such as NLTK, SpaCy, and scikit-learn. The notebooks cover a wide range of NLP tasks including tokenization, normalization (stemming and lemmatization), bags of words, named entity recognition (NER), part-of-speech (POS) tagging, different encoding techniques, word embedding using Word2Vec and GloVe, and TF-IDF (Term Frequency-Inverse Document Frequency).
## Notebooks
- [Tokenization](Tokenization.ipynb) : Notebook demonstrating tokenization techniques using NLTK and SpaCy.
- [Stemming](Stemming.ipynb) : Implemented stemming techniques with NLTK and SpaCy in Python
- [Lemmatization](Lemmatization.ipynb) : Explored lemmatization methods in Python using NLTK and SpaCy
- [Named Entity Recognition](NER.ipynb) : Performed Named Entity Recognition (NER) using NLTK and SpaCy in Python. Understand how to identify and extract named entities such as person names, organization names, locations, etc.
- [Part-of-Speech Tagging](POS_Tagging.ipynb) : Implemented POS tagging techniques with NLTK and SpaCy in Python. Learn how to assign grammatical categories to words in a text corpus, such as noun, verb, adjective, etc.
- [Stopwords](Stopwords.ipynb) : Demonstrated stopwords removal techniques using NLTK and SpaCy in Python. Understand how to filter out common words that do not carry significant meaning in text analysis tasks.Encoding Techniques -
- [One Hot Encoding](OneHotEncoding.ipynb) : Performed OHE on text documents into binary vectors, demonstrated using NLTK and SpaCy in Python.
- [Bag of Words](BagofWords.ipynb) : Represented text documents as vectors based on word frequency, using NLTK and SpaCy in Python.
- [TF-IDF](TF_IDF.ipynb) : Assigns scores to words in documents based on their frequency (term frequency) and rarity (inverse document frequency), using NLTK and SpaCy in Python.
Word Embedding -
- [Word2Vec](Word2Vec.ipynb) : Implementated of Word2Vec in Python using both pretrained and scratch-built models.
- [Avg Word2Vec](AvgWord2Vec.ipynb) : Utilization of average Word2Vec embeddings in Python, demonstrating efficient word embedding techniques for natural language processing tasks.
- [GloVe](GloVe.ipynb) : Utilized Stanford's pre-trained GloVe model for efficient word embedding in natural language processing tasks.
- [FastText](FastText.ipynb) : Leveraged Gensim and the FastText library for effective text representation and classification using subword information and Skipgram architecture.## Requirements
- Python 3
- Jupyter Notebook/Google Colab
- NLTK
- SpaCy
- Scikit learn
- Gensim
## LicenseThis project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.