https://github.com/ninadpatil09/nlp-notebooks

Explore NLP tasks with Python using NLTK, SpaCy & scikit-learn: Tokenization, Normalization, NER, POS tagging, Encoding, Word embedding.
https://github.com/ninadpatil09/nlp-notebooks

natural-language-processing nlp nlp-machine-learning nltk python spacy

Last synced: 3 months ago
JSON representation

Explore NLP tasks with Python using NLTK, SpaCy & scikit-learn: Tokenization, Normalization, NER, POS tagging, Encoding, Word embedding.

Host: GitHub
URL: https://github.com/ninadpatil09/nlp-notebooks
Owner: ninadpatil09
License: mit
Created: 2024-04-02T06:46:51.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-09T12:48:35.000Z (about 1 year ago)
Last Synced: 2025-02-20T18:23:10.757Z (4 months ago)
Topics: natural-language-processing, nlp, nlp-machine-learning, nltk, python, spacy
Language: Jupyter Notebook
Homepage:
Size: 80.1 KB
Stars: 5
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

        # NLP-Notebooks

This repository contains notebooks showcasing various Natural Language Processing (NLP) tasks implemented using Python and popular NLP libraries such as NLTK, SpaCy, and scikit-learn. The notebooks cover a wide range of NLP tasks including tokenization, normalization (stemming and lemmatization), bags of words, named entity recognition (NER), part-of-speech (POS) tagging, different encoding techniques, word embedding using Word2Vec and GloVe, and TF-IDF (Term Frequency-Inverse Document Frequency).

## Notebooks

- [Tokenization](Tokenization.ipynb) : Notebook demonstrating tokenization techniques using NLTK and SpaCy.

- [Stemming](Stemming.ipynb) : Implemented stemming techniques with NLTK and SpaCy in Python

- [Lemmatization](Lemmatization.ipynb) : Explored lemmatization methods in Python using NLTK and SpaCy 

- [Named Entity Recognition](NER.ipynb) : Performed Named Entity Recognition (NER) using NLTK and SpaCy in Python. Understand how to identify and extract named entities such as person names, organization names, locations, etc.

- [Part-of-Speech Tagging](POS_Tagging.ipynb) : Implemented POS tagging techniques with NLTK and SpaCy in Python. Learn how to assign grammatical categories to words in a text corpus, such as noun, verb, adjective, etc.

- [Stopwords](Stopwords.ipynb) : Demonstrated stopwords removal techniques using NLTK and SpaCy in Python. Understand how to filter out common words that do not carry significant meaning in text analysis tasks.

  Encoding Techniques -

  - [One Hot Encoding](OneHotEncoding.ipynb) : Performed OHE on text documents into binary vectors, demonstrated using NLTK and SpaCy in Python.

  - [Bag of Words](BagofWords.ipynb) :  Represented text documents as vectors based on word frequency, using NLTK and SpaCy in Python.

  - [TF-IDF](TF_IDF.ipynb) : Assigns scores to words in documents based on their frequency (term frequency) and rarity (inverse document frequency), using NLTK and SpaCy in Python.

  

  Word Embedding -

  - [Word2Vec](Word2Vec.ipynb) : Implementated of Word2Vec in Python using both pretrained and scratch-built models.

  - [Avg Word2Vec](AvgWord2Vec.ipynb) : Utilization of average Word2Vec embeddings in Python, demonstrating efficient word embedding techniques for natural language processing tasks.

  - [GloVe](GloVe.ipynb) : Utilized Stanford's pre-trained GloVe model for efficient word embedding in natural language processing tasks.

  - [FastText](FastText.ipynb) :  Leveraged Gensim and the FastText library for effective text representation and classification using subword information and Skipgram architecture.

## Requirements

- Python 3

- Jupyter Notebook/Google Colab

- NLTK

- SpaCy

- Scikit learn

- Gensim

  

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ninadpatil09/nlp-notebooks

Awesome Lists containing this project

README