Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fyt3rp4til/lexicon-nlp-lab
https://github.com/fyt3rp4til/lexicon-nlp-lab
bag-of-words bag-of-words-model gensim gensim-word2vec lemmatization n-grams named-entity-recognition nltk parts-of-speech regex spacy spacy-word-embeddings stemming stop-words tf-idf word-embeddings
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/fyt3rp4til/lexicon-nlp-lab
- Owner: FYT3RP4TIL
- Created: 2024-08-31T16:21:41.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-09-04T20:17:52.000Z (4 months ago)
- Last Synced: 2024-10-09T11:43:16.026Z (2 months ago)
- Topics: bag-of-words, bag-of-words-model, gensim, gensim-word2vec, lemmatization, n-grams, named-entity-recognition, nltk, parts-of-speech, regex, spacy, spacy-word-embeddings, stemming, stop-words, tf-idf, word-embeddings
- Language: Jupyter Notebook
- Homepage:
- Size: 59.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- Changelog: news_dataset.json
Awesome Lists containing this project
README
# 🌐 Lexicon-NLP-Lab
![NLP](https://img.shields.io/badge/NLP-Text_Processing-blue)
![Python](https://img.shields.io/badge/Python-3.8%2B-brightgreen)
![License: MIT](https://img.shields.io/badge/License-MIT-yellow)Welcome to the **Lexicon**! This repository contains a comprehensive collection of Jupyter notebooks and datasets focused on various Natural Language Processing (NLP) tasks.
## 📂 Repository Structure
### 🔍 Data Preprocessing
- `1_Regex_for_information_extraction.ipynb` - Regular expressions for information extraction.
- `2_Spacy_vs_Nltk.ipynb` - Comparison between Spacy and NLTK for tokenization.
- `3_Spacy_Tokenize.ipynb` - Tokenization techniques using Spacy.
- `4_Spacy_Pipelines.ipynb` - Pipelines in Spacy: Stemming and Lemmatization.
- `5_Stemming_Lemmatization.ipynb` - Stemming and lemmatization methods.
- `5_Stemming_Lemmatization_2.ipynb` - Continuation of stemming, lemmatization, and POS tagging.
- `6_Parts_of_Speech_2.ipynb` - POS tagging, Bag of Words, and NER with Spacy.
- `6_Parts_of_Speech_in_Spacy.ipynb` - Detailed POS tagging with Spacy.### 🏷️ Named Entity Recognition (NER)
- `7_NER.ipynb` - Named entity recognition with Spacy.
- `7_NER_2.ipynb` - Additional NER tasks and implementations.### 🗃️ Bag of Words and N-Grams
- `8_Bag_of_Words_2_SentimentAnalysis.ipynb` - Sentiment analysis using Bag of Words.
- `8_Bag_of_Words_SpamClassifier.ipynb` - Spam classification with Bag of Words.
- `9_Stop_Words.ipynb` - Handling stop words in text preprocessing.
- `9_Stop_Words_2.ipynb` - Further exploration of stop words, Bag of Words, and N-grams.
- `10_Bag_of_N_Grams_2_Fake_News_Prediction.ipynb` - Fake news prediction using N-grams.
- `10_Bag_of_N_Grams_News_Classification.ipynb` - News classification with N-grams.### 🔤 TF-IDF (Term Frequency-Inverse Document Frequency)
- `11_TF_IDF_2_EmotionDetection.ipynb` - Emotion detection using TF-IDF.
- `11_TF_IDF_TextClassification_Ecommerce_Goods.ipynb` - E-commerce goods classification using TF-IDF.
### 💡 Word Embeddings and Vectors
- `12_Overview_Spacy_Word_Vectors.ipynb` - Overview of word vectors using Spacy and Gensim.
- `13_Spacy_Word_Embeddings_News_Category_Classification.ipynb` - News category classification using Spacy word embeddings.
- `14_Nlp_Word_Vectors_Gensim_Overview.ipynb` - Overview of word vectors using Gensim.
- `15_Gensim_w2v_Google_Fake_News_Detection.ipynb` - Fake news detection with Gensim.### 🚀 FastText Classifier
- `16_Fasttext_Indian_Food_Receipe_Classification.ipynb` - Classification of Indian food recipes using FastText.
- `17_Fasttext_Ecommerce_Classification.ipynb` - E-commerce classification using FastText.### 🔧 Miscellaneous
- `cosine_similarity.ipynb` - Computing cosine similarity between text vectors.## 📊 Datasets
- `Cleaned_Indian_Food_Dataset.csv` - Dataset for Indian food recipes classification.
- `Fake_Real_Data.csv` - Dataset containing fake and real news.
- `news_story.txt` - Text file with a sample news story.
- `spam.csv` - Spam dataset for classification tasks.
- `students.txt` - Additional text file for experimentation.