Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fyt3rp4til/lexicon-nlp-lab


https://github.com/fyt3rp4til/lexicon-nlp-lab

bag-of-words bag-of-words-model gensim gensim-word2vec lemmatization n-grams named-entity-recognition nltk parts-of-speech regex spacy spacy-word-embeddings stemming stop-words tf-idf word-embeddings

Last synced: 1 day ago
JSON representation

Awesome Lists containing this project

README

        

# 🌐 Lexicon-NLP-Lab

![NLP](https://img.shields.io/badge/NLP-Text_Processing-blue)
![Python](https://img.shields.io/badge/Python-3.8%2B-brightgreen)
![License: MIT](https://img.shields.io/badge/License-MIT-yellow)

Welcome to the **Lexicon**! This repository contains a comprehensive collection of Jupyter notebooks and datasets focused on various Natural Language Processing (NLP) tasks.

## 📂 Repository Structure

### 🔍 Data Preprocessing
- `1_Regex_for_information_extraction.ipynb` - Regular expressions for information extraction.
- `2_Spacy_vs_Nltk.ipynb` - Comparison between Spacy and NLTK for tokenization.
- `3_Spacy_Tokenize.ipynb` - Tokenization techniques using Spacy.
- `4_Spacy_Pipelines.ipynb` - Pipelines in Spacy: Stemming and Lemmatization.
- `5_Stemming_Lemmatization.ipynb` - Stemming and lemmatization methods.
- `5_Stemming_Lemmatization_2.ipynb` - Continuation of stemming, lemmatization, and POS tagging.
- `6_Parts_of_Speech_2.ipynb` - POS tagging, Bag of Words, and NER with Spacy.
- `6_Parts_of_Speech_in_Spacy.ipynb` - Detailed POS tagging with Spacy.

### 🏷️ Named Entity Recognition (NER)
- `7_NER.ipynb` - Named entity recognition with Spacy.
- `7_NER_2.ipynb` - Additional NER tasks and implementations.

### 🗃️ Bag of Words and N-Grams
- `8_Bag_of_Words_2_SentimentAnalysis.ipynb` - Sentiment analysis using Bag of Words.
- `8_Bag_of_Words_SpamClassifier.ipynb` - Spam classification with Bag of Words.
- `9_Stop_Words.ipynb` - Handling stop words in text preprocessing.
- `9_Stop_Words_2.ipynb` - Further exploration of stop words, Bag of Words, and N-grams.
- `10_Bag_of_N_Grams_2_Fake_News_Prediction.ipynb` - Fake news prediction using N-grams.
- `10_Bag_of_N_Grams_News_Classification.ipynb` - News classification with N-grams.

### 🔤 TF-IDF (Term Frequency-Inverse Document Frequency)
- `11_TF_IDF_2_EmotionDetection.ipynb` - Emotion detection using TF-IDF.
- `11_TF_IDF_TextClassification_Ecommerce_Goods.ipynb` - E-commerce goods classification using TF-IDF.

### 💡 Word Embeddings and Vectors
- `12_Overview_Spacy_Word_Vectors.ipynb` - Overview of word vectors using Spacy and Gensim.
- `13_Spacy_Word_Embeddings_News_Category_Classification.ipynb` - News category classification using Spacy word embeddings.
- `14_Nlp_Word_Vectors_Gensim_Overview.ipynb` - Overview of word vectors using Gensim.
- `15_Gensim_w2v_Google_Fake_News_Detection.ipynb` - Fake news detection with Gensim.

### 🚀 FastText Classifier
- `16_Fasttext_Indian_Food_Receipe_Classification.ipynb` - Classification of Indian food recipes using FastText.
- `17_Fasttext_Ecommerce_Classification.ipynb` - E-commerce classification using FastText.

### 🔧 Miscellaneous
- `cosine_similarity.ipynb` - Computing cosine similarity between text vectors.

## 📊 Datasets
- `Cleaned_Indian_Food_Dataset.csv` - Dataset for Indian food recipes classification.
- `Fake_Real_Data.csv` - Dataset containing fake and real news.
- `news_story.txt` - Text file with a sample news story.
- `spam.csv` - Spam dataset for classification tasks.
- `students.txt` - Additional text file for experimentation.