Projects in Awesome Lists tagged with stopwords
A curated list of projects in awesome lists tagged with stopwords .
https://github.com/sing1ee/elasticsearch-jieba-plugin
jieba analysis plugin for elasticsearch
dict elasticsearch elasticsearch-jieba-plugin jieba stopwords
Last synced: 23 Jan 2026
https://github.com/stopwords-iso/stopwords-iso
All languages stopwords collection
Last synced: 20 Feb 2026
https://github.com/mihaivalentin/lunr-languages
A collection of languages stemmers and stopwords for Lunr Javascript library
language-stemmer localization lunr lunr-languages stemmer stopwords
Last synced: 22 Oct 2025
https://github.com/MihaiValentin/lunr-languages
A collection of languages stemmers and stopwords for Lunr Javascript library
language-stemmer localization lunr lunr-languages stemmer stopwords
Last synced: 03 Apr 2025
https://github.com/lining0806/textmining
Python文本挖掘系统 Research of Text Mining System
jieba sklearn stopwords text-mining tf-idf user-dict
Last synced: 07 Apr 2025
https://github.com/alir3z4/stop-words
List of common stop words in various languages.
Last synced: 18 Jul 2025
https://github.com/Alir3z4/stop-words
List of common stop words in various languages.
Last synced: 08 Jul 2025
https://github.com/mohataher/arabic-stop-words
Largest list of Arabic stop words on Github. أكبر قائمة لمستبعدات الفهرسة العربية على جيت هاب
arabic-language arabic-nlp stopwords
Last synced: 27 Mar 2025
https://github.com/igorbrigadir/stopwords
Default English stopword lists from many different sources
en-stopwords english-stopwords natural-language-processing nlp stopwords
Last synced: 06 Apr 2025
https://github.com/kharazi/persian-stopwords
Persian (Farsi) Stop Words List
farsi natural-language-processing persian stopwords
Last synced: 09 Apr 2025
https://github.com/milaan9/python_natural_language_processing
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
bag-of-words inversedocumentfrequency ipython-notebook lemmatization named-entity-recognition nlp partofspeech-tagger python4datascience python4everybody sentence-segmentation stemming stopwords termfrequency tf-idf tokenization tutor-milaan9 vocabulary-matching
Last synced: 09 Apr 2025
https://github.com/milaan9/Python_Natural_Language_Processing
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
bag-of-words inversedocumentfrequency ipython-notebook lemmatization named-entity-recognition nlp partofspeech-tagger python4datascience python4everybody sentence-segmentation stemming stopwords termfrequency tf-idf tokenization tutor-milaan9 vocabulary-matching
Last synced: 28 Aug 2025
https://github.com/biolab/orange3-text
🍊 :page_facing_up: Text Mining add-on for Orange3
bag-of-words lemmatization newspapers nltk orange sentiment-analysis stemming stopwords text text-analysis text-mining twitter
Last synced: 14 Aug 2025
https://github.com/trinker/lexicon
A data package containing lexicons and dictionaries for text analysis
hash lexicon lookup names-frequent r stopwords text-dictionaries text-mining
Last synced: 22 Aug 2025
https://github.com/voku/stop-words
PHP | A collection of stop words for e.g. search-functions.
hacktoberfest php stop-words stopwords
Last synced: 08 Apr 2025
https://github.com/ziaa/persian-stopwords-collection
A collection of Persian stopwords - فهرست کلمات ایست فارسی
persian persian-stopwords stoplist stopwords
Last synced: 05 Mar 2026
https://github.com/ziaa/Persian-stopwords-collection
A collection of Persian stopwords - فهرست کلمات ایست فارسی
persian persian-stopwords stoplist stopwords
Last synced: 15 May 2025
https://github.com/yihleego/trie
📒 An Aho-Corasick algorithm based string-searching utility for Go. It supports tokenization, ignoring case, replacing text. So you can use it to find keywords in an article, filter sensitive words, etc.
aho-corasick go java keywords sensitive stopwords string-searching
Last synced: 15 Jul 2025
https://github.com/hantang/data-corpus
语料数据和词库收集:中文、英文停用词,情感分析,分类词典,敏感词库(违禁词,审查词)。stop words, sentiment analysis, thesaurus, censorship/sensitive word
corpus nlp stopwords thesaurus
Last synced: 13 Feb 2026
https://github.com/yihleego/trie4j
📒 An Aho-Corasick algorithm based string-searching utility for Java. It supports tokenization, ignoring case, replacing text. So you can use it to find keywords in an article, filter sensitive words, etc.
aho-corasick go java keywords sensitive stopwords string-searching
Last synced: 15 Jul 2025
https://github.com/mustafaturan/omnicat-bayes
Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
naive-bayes-classifier ruby sentiment-analysis stopwords text-classification tokenizer
Last synced: 01 Sep 2025
https://github.com/cmccomb/rust-stop-words
Common stop words in a variety of languages
languages natural-language-procressing nlp nltk rust-crate stopwords
Last synced: 12 Dec 2025
https://github.com/eklem/stopword-trainer
A module for creating stopword lists for any language, based on a set of documents.
document-processing information-retrieval nlp stopwords stopwords-removal
Last synced: 05 Jul 2025
https://github.com/dohliam/more-stoplists
stoplists for African languages generated from the ASP corpus
africa african-languages afrikaans corpus corpus-linguistics frequency-lists hausa lugbarati sesotho somali stoplist stoplists stopwords swahili yoruba zulu
Last synced: 25 Oct 2025
https://github.com/dohliam/hawaiian-corpus
Data from a corpus of written Hawaiian
bigrams corpora corpus corpus-data corpus-linguistics frequency frequency-list hawaii hawaiian hawaiian-electronic-library hawaiian-language n-grams ngram olelo-hawaii stoplist stopwords ulukau
Last synced: 05 Jan 2026
https://github.com/icflorescu/postgresql-tsearch-utils
A collection of files and patterns to improve PostgreSQL text search
engine i18n internationalization postgresql stopwords text text-search unaccent
Last synced: 08 Jul 2025
https://github.com/iamkankan/natural-language-processing-nlp-tutorial
NLP tutorials and guidelines to learn efficiently
bigrams bow cbow glove lemmatization one-hot-encoding stemming stopwords tf-idf-vectorizer tokenization unigram word-embeddings word2vec
Last synced: 08 Jan 2026
https://github.com/kavgan/stop-words
Stop word lists
natural-language-processing nlp stopwords text-mining
Last synced: 28 Jan 2026
https://github.com/lykmapipo/mongoose-taggable
mongoose plugin to add tags and taggable behaviour.
keywords lykmapipo mongoose mongoose-plugin stopwords taggable tags
Last synced: 27 Oct 2025
https://github.com/orsinium-labs/stopwords
🙅 Go package for detecting and removing stopwords from text.
go golang stopwords text-processing tokenizer
Last synced: 07 May 2025
https://github.com/cvcio/go-plagiarism
Plagiarism detection using stopwords n-grams
algorithm golang n-grams plagiarism plagiarism-detection stopwords
Last synced: 07 May 2025
https://github.com/nano-bot01/fake-news-prediction-system-
Fake News Prediction System using logistic regression, stopwords, nltk
ankit-nainwal classification dataframe fake-news fake-news-classification fake-news-detection logistic-regression machine-learning ml ml-mini-project nano-bot01 nltk pandas python seaborn stopwords text-classification tfidfvectorizer
Last synced: 19 Oct 2025
https://github.com/openderocknlp/extract-lemmatized-nonstop-words
Extracts a pure list of stemmed words of a text filtered by stop words
javascript lemma nlp npm stemming stopwords tokenizer
Last synced: 06 Oct 2025
https://github.com/loony-bean/stopwords-rs
Stopwords from popular text processing frameworks
Last synced: 06 Apr 2026
https://github.com/nano-bot01/sms-spam-classifier-web-app-using-machine-learning
SMS Spam Classifier Web Application which is used to classify spam and ham in text messages we receive in phones
bag-of-words classification deep-learning machine-learning ml nltk pickle python sms spam-classification spam-detection spam-filtering spam-messages spam-prevention stopwords streamlit streamlit-cloud streamlit-webapp supervised-learning
Last synced: 22 Oct 2025
https://github.com/abdullahashfaqvirk/NLP-Workshops
Embark on your NLP journey by learning essential techniques through a series of notebooks designed to kickstart your career in this field.
lemmatization named-entity-recognition nlp nltk notebooks pos-tagging python stemming stopwords tokenization workshops
Last synced: 27 Sep 2025
https://github.com/harshit7962/cse3024-web-mining
Lab Assignments of Course Web Mining (CSE-3024)
centrality cse-3024 decsion-tree encoding-decoding k-means-clustering nltk page-rank prestige random-forest stopwords web-mining web-scraping
Last synced: 03 Apr 2025
https://github.com/raghavendranhp/dynamic-hotel-recommendation-system-using-nlp
Developing a Python-based system for personalized hotel recommendations. The goal is to match user descriptions with hotel features, enhancing user satisfaction and decision-making in the hospitality industry.
ast lemmatization machine-learning nltk-python numpy pandas stopwords wordtoken-python
Last synced: 09 May 2026
https://github.com/pharo-ai/stopwords
Load the stopwords that you need in Pharo
nlp nlp-machine-learning pharo pharo-smalltalk stopwords tf-idf
Last synced: 13 Feb 2026
https://github.com/dohliam/corpus-tools
A collection of scripts for working with multilingual text corpora
corpora corpus corpus-linguistics frequency language linguistics ngram ngrams ruby salience stoplist stopwords
Last synced: 21 Mar 2025
https://github.com/arssite/naturalinguisticprogramming
Repo Related to Natural Language Processing and Social Media Analytics.
deep-learning lemmatization named-entity-recognition natural-language-processing social-network-analysis socialmediaanalytics stemming stopwords tokenization
Last synced: 27 Feb 2026
https://github.com/rekram1-node/tokenizer
Natural Language Processing (NLP) Tokenization Libary designed for English. Fast, Lean, Customizable. Tokenizes text, replaces abbreviations, replaces contractions, lowercases words, optionally you can remove stop words as well
blazingly-fast contractions customization fast go golang machine-learning minimal natural-language-processing nlp speed stopwords token tokenization tokenizer
Last synced: 10 Apr 2025
https://github.com/harsh0713/sms-spam-classification
The "SMS Spam Classification" project aims to develop a machine learning model to automatically identify and classify SMS messages as either spam or legitimate (ham).
bernoulli gaussian-naive-bayes jupyter-notebook multinomial-naive-bayes nltk-python punkt python sklearn-library stopwords streamlit string
Last synced: 18 Feb 2026
https://github.com/geekquad/text-learning
Basic usage of NLTK. Implementation of concepts like Stemmer, TfIdf, and text.CountVectors
corpus countvectorizer nltk sklearn stopwords tfidf
Last synced: 15 May 2026
https://github.com/kmock930/natural-language-processing
This project contains codes and paperwork based on the course CSI5386 at University of Ottawa (delivered by Professor Dr. Diana Inkpen).
bert bigram-modeling corpus-linguistics distilbert fasttext-embeddings glove-embeddings hugging-face-transformers large-language-models lemmatizer logistic-regression macro-micro-f1 natural-language-processing paraphrase-minilm pos-tagging roberta-large sbert stopwords text-embedding-ada-002 universal-sentence-encoder word-tokenizer
Last synced: 12 Jul 2025
https://github.com/abdelrahman-amen/web_scraping-and-text_processing-nlp
Web scraping involves extracting data from websites. Text processing techniques like tokenization, stemming, lemmatization, and removing stopwords refine raw text for analysis.
beautifulsoup csv nltk porterstemmer python stopwords tokenization
Last synced: 02 May 2026
https://github.com/antononcube/raku-lingua-stopwordsiso
Raku package for stop words of different languages and stop words deletion. Provides corresponding CLI scripts.
nlp stopwords stopwords-removal
Last synced: 24 Jun 2025
https://github.com/guo-yong-zhi/stopwords.jl
A julia package contains a collection of stop words for multiple languages.
Last synced: 22 Jul 2025
https://github.com/harisali-git/naturallanguageprocessing
natural-language-processing nlp python python3 stopwords text token
Last synced: 28 Mar 2025
https://github.com/jersongb22/datascience_mlops_movierecommendations_project
Simulating a Data Scientist's role in a startup aggregating streaming platforms. Building movie queries and ML-based recommendation system with MLOps focus. ML model web app deployed with Render.
data-science fastapi machine-learning matplotlib pandas python render scikit-learn stopwords
Last synced: 10 Apr 2026
https://github.com/aryanbalaji/geospatialanalysis
Conducted an extensive geospatial analysis on Zomato's customer data, utilizing GIS tools like Folium and Plotly to map customer density, restaurant locations. Conducted cluster analysis providing actionable insights for optimizing restaurant marketing
crosstab folium-maps freqdist heatmap-visualization matplotlib-pyplot numpy-python pandas-dataframe regexp stopwords tokenizer
Last synced: 12 Aug 2025
https://github.com/realeroberto/stopwords-nap
Neapolitan stopwords collection.
italian linguistics neapolitan stopwords
Last synced: 20 May 2026
https://github.com/ndamulelonemakh/our-stopwords
Auto-generated stopwords for South African Bantu Languages
african-languages africanlp dataset low-resource-languages natural-language-processing nlp stopwords tshivenda
Last synced: 06 Oct 2025
https://github.com/eklem/stopword-sami
Sami stopword lists for natural language processing. Examples on use could be search engines, machine learning and chatbots.
lule-sami nlp northern-sami southern-sami stopwords
Last synced: 22 Jan 2026
https://github.com/m-rishab/patient-condition_classification
*Patient condition classification*, which predicts the medical issue of a sentence and recommends drugs to prevent or treat that issue, involves the use of natural language processing (NLP) and machine learning techniques to analyze text input and provide relevant medical information.
flask nlp nltk python3 recommender-system stopwords text-classification wordcount
Last synced: 04 May 2026
https://github.com/elifftosunn/bert-bank-model
It is a Turkish BERT-based model that will analyze people's bank complaints and classify them according to one of eight categories.
countvectorizer doc2vec f1-score huggingface huggingface-transformer huggingface-transformers nlp nltk python3 scikit-learn stopwords tagged tfidf-transformer train-test-split word-tokenizer wordnetlemmatizer
Last synced: 12 May 2026
https://github.com/gopireddy99/daily_ad_nlp_assignments
AD Training classes in NLP - Daily Assignments
cleaning-text regularexpression stemming stopwords textprocessing tokenization
Last synced: 12 Aug 2025
https://github.com/youssef155/sentiment_analysis
Sentiment Analysis For Restaurant Reviews
flask jupyter-notebook nlp pkl-model python stemming stopwords text-cleaning
Last synced: 12 May 2026
https://github.com/gehad-ahmed30/natural-language-processing
This repository showcases a collection of practical NLP projects, ranging from sentiment analysis to spam detection. The implementations leverage both Machine Learning (ML) and Deep Learning (DL) approaches to explore various natural language processing tasks and techniques.
deep-learning lstm machine-learning naive-bayes nlp nltk preprocessing stopwords tokenization
Last synced: 05 Oct 2025
https://github.com/kplanisphere/grimm-text-processor
Laboratory 1 - Retrieval Information
data-preprocessing educational-project information-retrieval lowercase-conversion nltk punctuation-removal python stopwords text-processing tokenization
Last synced: 24 Jun 2025
https://github.com/ecrmnn/norwegian-stop-words
natural-language-processing nlp stopwords
Last synced: 16 Jul 2025
https://github.com/bramblexu/jp-stopword-filter
A lightweight Python library designed to filter stopwords from Japanese text based on customizable rules.
japanese machine-learning nlp python stopwords
Last synced: 14 Mar 2025
https://github.com/ewdlop/nlpnote
NLP(Natural Language Processing) Note. https://en.wikipedia.org/wiki/Natural_language_processing
attention-mechanism bag-of-words bert entity-recognition gpt holonymy-meronymy hypernymy-hyponymy inverted-index language-model large-language-models lemmatization n-gram natural-language-processing part-of-speech-tagging sentiment-analysis sequence-to-sequence stemming stopwords tf-idf word-embeddings
Last synced: 09 Aug 2025
https://github.com/sayande01/fake_news_detection_logisticregression
This project detects fake news using Logistic Regression with NLP techniques, including NLTK stopword removal, Porter Stemmer for text normalization, and TF-IDF vectorization for feature extraction. It achieves high accuracy and precision, offering a reliable solution to combat misinformation.
logistic-regression nltk porter-stemmer stopwords tf-idf-vectorizer
Last synced: 06 Apr 2025
https://github.com/ahmedabdalkreem/sentiment-analysis
This project performs sentiment analysis on a Twitter dataset, aiming to classify tweets into positive, negative, or neutral sentiments. Sentiment analysis is crucial for understanding public opinion on various topics, brands, or events based on social media data.
bert-model lematization matplotlib nlp nltk numpy pandas python3 sentiment-analysis stopwords streamlit
Last synced: 08 Apr 2026
https://github.com/elifftosunn/textdataclean
Kirli veri çekildiğinde ön işleme adımlarına gerek kalmadan model eğitimi için hazır hale getirmek amacıyla yapılan uygulamadır.
corpus deasciifier morphological-analysis ngram nltk numpy pandas sentence-embedding sentence-tokenizer stemmer stopwords string turkish turkish-sentence-tokenizer word-tokenizer
Last synced: 20 May 2026
https://github.com/yeremi/stopwords
A lightweight and efficient PHP library tailored for developers working on Natural Language Processing (NLP) tasks in Brazilian Portuguese.
elasticsearch extract-information fulltext-search indexing-querying natural-language-processing php portuguese search-engine snowball stemming stop stop-words stopwords
Last synced: 09 Feb 2026
https://github.com/open-technology-foundation/stopwords.bash
Pure Bash stopwords filter from input text. Faster than python for texts < 2000 words
Last synced: 18 May 2026
https://github.com/pantpujan017/nenglish-stopwords-chat-analysis
nepali stop words
chat-analysis code-mixed-language messenger-nlp nenglish nepali-nlp social-media-nlp stopwords text-preprocessing viber-chat whatsapp-chat
Last synced: 20 Jul 2025
https://github.com/jersongb22/moocs_dataanalytics_project
Analysis of Udemy, edX, and Coursera datasets for a tech startup entering the online course market, aiming to understand their impact on demand. Publicly accessible Power BI report published on web.
matplotlib pandas powerbi python stopwords zebrabi
Last synced: 13 Jul 2025
https://github.com/nickenshidqia/natural_language_processing_of_books_using_python
Build The Natural Language Processing of Books to analyze books programmatically using Python and extracting valuable insights
natural-language-processing nltk-python python regex stopwords
Last synced: 29 Apr 2026
https://github.com/rachakondaganesh/using-nlp-online-and-retail-order-review-project
analyzed customer reviews from online and retail orders. Performed sentiment analysis, keyword extraction, and topic modeling to identify trends, satisfaction drivers, and pain points. Used Python (NLTK, spaCy) and visualization tools to present actionable insights for improving customer experience and product strategy.
bivariate-analysis lambda matplotlib-pyplot nlp-machine-learning pandas sent-tokenize stemming stopwords trivariate-analysis unicode-data univariate-analysis word-tokenization
Last synced: 12 Apr 2026
https://github.com/asayem172153/chat_parse_from_txt_and_summarize
Chat Log Analyzer with TF-IDF & NLTK A Python script to analyze chat logs between User and AI. It extracts messages, preprocesses them using NLTK (tokenization, POS tagging, lemmatization), and computes TF-IDF scores to summarize the most relevant topics discussed
lammatization nltk python stopwords tf-idf
Last synced: 05 Jul 2025
https://github.com/ilirhushi/mysql-swedish-stopwords
full-text full-text-search mysql stopwords swedish
Last synced: 08 Apr 2025
https://github.com/atheeralzhrani/nlp_projects
NLP projects, which I worked on utilising different natural language processing libraries's.
nlp-datasets nltk-library rnn-lstm rnn-pytorch rnn-tensorflow spacy-nlp stemming stopwords tokenization
Last synced: 19 Aug 2025
https://github.com/niteshchawla/nlp-content-classification
The goal of this project is to use a bunch of news articles extracted from the companies’ internal database and categorize them into several categories like politics, technology, sports, business and entertainment based on their content.
bag-of-words lemmatization multiclass-classification natural-language-processing stopwords text-classification text-processing tf-idf tokenization
Last synced: 04 Oct 2025