An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with stopwords

A curated list of projects in awesome lists tagged with stopwords .

https://github.com/stopwords-iso/stopwords-iso

All languages stopwords collection

language stopwords

Last synced: 20 Feb 2026

https://github.com/mihaivalentin/lunr-languages

A collection of languages stemmers and stopwords for Lunr Javascript library

language-stemmer localization lunr lunr-languages stemmer stopwords

Last synced: 22 Oct 2025

https://github.com/MihaiValentin/lunr-languages

A collection of languages stemmers and stopwords for Lunr Javascript library

language-stemmer localization lunr lunr-languages stemmer stopwords

Last synced: 03 Apr 2025

https://github.com/lining0806/textmining

Python文本挖掘系统 Research of Text Mining System

jieba sklearn stopwords text-mining tf-idf user-dict

Last synced: 07 Apr 2025

https://github.com/alir3z4/stop-words

List of common stop words in various languages.

language stopwords

Last synced: 18 Jul 2025

https://github.com/Alir3z4/stop-words

List of common stop words in various languages.

language stopwords

Last synced: 08 Jul 2025

https://github.com/mohataher/arabic-stop-words

Largest list of Arabic stop words on Github. أكبر قائمة لمستبعدات الفهرسة العربية على جيت هاب

arabic-language arabic-nlp stopwords

Last synced: 27 Mar 2025

https://github.com/igorbrigadir/stopwords

Default English stopword lists from many different sources

en-stopwords english-stopwords natural-language-processing nlp stopwords

Last synced: 06 Apr 2025

https://github.com/Donatello-za/rake-php-plus

A keyword and phrase extraction library based on the Rapid Automatic Keyword Extraction algorithm (RAKE).

extract keyword language php phrases stopwords

Last synced: 04 Apr 2025

https://github.com/kharazi/persian-stopwords

Persian (Farsi) Stop Words List

farsi natural-language-processing persian stopwords

Last synced: 09 Apr 2025

https://github.com/milaan9/python_natural_language_processing

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

bag-of-words inversedocumentfrequency ipython-notebook lemmatization named-entity-recognition nlp partofspeech-tagger python4datascience python4everybody sentence-segmentation stemming stopwords termfrequency tf-idf tokenization tutor-milaan9 vocabulary-matching

Last synced: 09 Apr 2025

https://github.com/milaan9/Python_Natural_Language_Processing

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

bag-of-words inversedocumentfrequency ipython-notebook lemmatization named-entity-recognition nlp partofspeech-tagger python4datascience python4everybody sentence-segmentation stemming stopwords termfrequency tf-idf tokenization tutor-milaan9 vocabulary-matching

Last synced: 28 Aug 2025

https://github.com/trinker/lexicon

A data package containing lexicons and dictionaries for text analysis

hash lexicon lookup names-frequent r stopwords text-dictionaries text-mining

Last synced: 22 Aug 2025

https://github.com/voku/stop-words

PHP | A collection of stop words for e.g. search-functions.

hacktoberfest php stop-words stopwords

Last synced: 08 Apr 2025

https://github.com/ziaa/persian-stopwords-collection

A collection of Persian stopwords - فهرست کلمات ایست فارسی

persian persian-stopwords stoplist stopwords

Last synced: 05 Mar 2026

https://github.com/ziaa/Persian-stopwords-collection

A collection of Persian stopwords - فهرست کلمات ایست فارسی

persian persian-stopwords stoplist stopwords

Last synced: 15 May 2025

https://github.com/yihleego/trie

📒 An Aho-Corasick algorithm based string-searching utility for Go. It supports tokenization, ignoring case, replacing text. So you can use it to find keywords in an article, filter sensitive words, etc.

aho-corasick go java keywords sensitive stopwords string-searching

Last synced: 15 Jul 2025

https://github.com/hantang/data-corpus

语料数据和词库收集:中文、英文停用词,情感分析,分类词典,敏感词库(违禁词,审查词)。stop words, sentiment analysis, thesaurus, censorship/sensitive word

corpus nlp stopwords thesaurus

Last synced: 13 Feb 2026

https://github.com/yihleego/trie4j

📒 An Aho-Corasick algorithm based string-searching utility for Java. It supports tokenization, ignoring case, replacing text. So you can use it to find keywords in an article, filter sensitive words, etc.

aho-corasick go java keywords sensitive stopwords string-searching

Last synced: 15 Jul 2025

https://github.com/mustafaturan/omnicat-bayes

Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)

naive-bayes-classifier ruby sentiment-analysis stopwords text-classification tokenizer

Last synced: 01 Sep 2025

https://github.com/cmccomb/rust-stop-words

Common stop words in a variety of languages

languages natural-language-procressing nlp nltk rust-crate stopwords

Last synced: 12 Dec 2025

https://github.com/koheiw/marimo

A multi-lingual stopwords lists

stopwords text-mining

Last synced: 08 Mar 2026

https://github.com/eklem/stopword-trainer

A module for creating stopword lists for any language, based on a set of documents.

document-processing information-retrieval nlp stopwords stopwords-removal

Last synced: 05 Jul 2025

https://github.com/icflorescu/postgresql-tsearch-utils

A collection of files and patterns to improve PostgreSQL text search

engine i18n internationalization postgresql stopwords text text-search unaccent

Last synced: 08 Jul 2025

https://github.com/lykmapipo/mongoose-taggable

mongoose plugin to add tags and taggable behaviour.

keywords lykmapipo mongoose mongoose-plugin stopwords taggable tags

Last synced: 27 Oct 2025

https://github.com/orsinium-labs/stopwords

🙅 Go package for detecting and removing stopwords from text.

go golang stopwords text-processing tokenizer

Last synced: 07 May 2025

https://github.com/cvcio/go-plagiarism

Plagiarism detection using stopwords n-grams

algorithm golang n-grams plagiarism plagiarism-detection stopwords

Last synced: 07 May 2025

https://github.com/openderocknlp/extract-lemmatized-nonstop-words

Extracts a pure list of stemmed words of a text filtered by stop words

javascript lemma nlp npm stemming stopwords tokenizer

Last synced: 06 Oct 2025

https://github.com/n8brooks/snowball

⛄ Snowball stemmers for Deno.

deno languages nlp snowball stopwords

Last synced: 21 Aug 2025

https://github.com/loony-bean/stopwords-rs

Stopwords from popular text processing frameworks

nlp rust stopwords

Last synced: 06 Apr 2026

https://github.com/abdullahashfaqvirk/NLP-Workshops

Embark on your NLP journey by learning essential techniques through a series of notebooks designed to kickstart your career in this field.

lemmatization named-entity-recognition nlp nltk notebooks pos-tagging python stemming stopwords tokenization workshops

Last synced: 27 Sep 2025

https://github.com/raghavendranhp/dynamic-hotel-recommendation-system-using-nlp

Developing a Python-based system for personalized hotel recommendations. The goal is to match user descriptions with hotel features, enhancing user satisfaction and decision-making in the hospitality industry.

ast lemmatization machine-learning nltk-python numpy pandas stopwords wordtoken-python

Last synced: 09 May 2026

https://github.com/217heidai/stopwords

搜集中、英文停用词并整合

stop-words stopwords

Last synced: 15 May 2025

https://github.com/pharo-ai/stopwords

Load the stopwords that you need in Pharo

nlp nlp-machine-learning pharo pharo-smalltalk stopwords tf-idf

Last synced: 13 Feb 2026

https://github.com/dohliam/corpus-tools

A collection of scripts for working with multilingual text corpora

corpora corpus corpus-linguistics frequency language linguistics ngram ngrams ruby salience stoplist stopwords

Last synced: 21 Mar 2025

https://github.com/rekram1-node/tokenizer

Natural Language Processing (NLP) Tokenization Libary designed for English. Fast, Lean, Customizable. Tokenizes text, replaces abbreviations, replaces contractions, lowercases words, optionally you can remove stop words as well

blazingly-fast contractions customization fast go golang machine-learning minimal natural-language-processing nlp speed stopwords token tokenization tokenizer

Last synced: 10 Apr 2025

https://github.com/harsh0713/sms-spam-classification

The "SMS Spam Classification" project aims to develop a machine learning model to automatically identify and classify SMS messages as either spam or legitimate (ham).

bernoulli gaussian-naive-bayes jupyter-notebook multinomial-naive-bayes nltk-python punkt python sklearn-library stopwords streamlit string

Last synced: 18 Feb 2026

https://github.com/geekquad/text-learning

Basic usage of NLTK. Implementation of concepts like Stemmer, TfIdf, and text.CountVectors

corpus countvectorizer nltk sklearn stopwords tfidf

Last synced: 15 May 2026

https://github.com/abdelrahman-amen/web_scraping-and-text_processing-nlp

Web scraping involves extracting data from websites. Text processing techniques like tokenization, stemming, lemmatization, and removing stopwords refine raw text for analysis.

beautifulsoup csv nltk porterstemmer python stopwords tokenization

Last synced: 02 May 2026

https://github.com/antononcube/raku-lingua-stopwordsiso

Raku package for stop words of different languages and stop words deletion. Provides corresponding CLI scripts.

nlp stopwords stopwords-removal

Last synced: 24 Jun 2025

https://github.com/guo-yong-zhi/stopwords.jl

A julia package contains a collection of stop words for multiple languages.

julia stop-words stopwords

Last synced: 22 Jul 2025

https://github.com/jersongb22/datascience_mlops_movierecommendations_project

Simulating a Data Scientist's role in a startup aggregating streaming platforms. Building movie queries and ML-based recommendation system with MLOps focus. ML model web app deployed with Render.

data-science fastapi machine-learning matplotlib pandas python render scikit-learn stopwords

Last synced: 10 Apr 2026

https://github.com/aryanbalaji/geospatialanalysis

Conducted an extensive geospatial analysis on Zomato's customer data, utilizing GIS tools like Folium and Plotly to map customer density, restaurant locations. Conducted cluster analysis providing actionable insights for optimizing restaurant marketing

crosstab folium-maps freqdist heatmap-visualization matplotlib-pyplot numpy-python pandas-dataframe regexp stopwords tokenizer

Last synced: 12 Aug 2025

https://github.com/realeroberto/stopwords-nap

Neapolitan stopwords collection.

italian linguistics neapolitan stopwords

Last synced: 20 May 2026

https://github.com/eklem/stopword-sami

Sami stopword lists for natural language processing. Examples on use could be search engines, machine learning and chatbots.

lule-sami nlp northern-sami southern-sami stopwords

Last synced: 22 Jan 2026

https://github.com/m-rishab/patient-condition_classification

*Patient condition classification*, which predicts the medical issue of a sentence and recommends drugs to prevent or treat that issue, involves the use of natural language processing (NLP) and machine learning techniques to analyze text input and provide relevant medical information.

flask nlp nltk python3 recommender-system stopwords text-classification wordcount

Last synced: 04 May 2026

https://github.com/elifftosunn/bert-bank-model

It is a Turkish BERT-based model that will analyze people's bank complaints and classify them according to one of eight categories.

countvectorizer doc2vec f1-score huggingface huggingface-transformer huggingface-transformers nlp nltk python3 scikit-learn stopwords tagged tfidf-transformer train-test-split word-tokenizer wordnetlemmatizer

Last synced: 12 May 2026

https://github.com/gehad-ahmed30/natural-language-processing

This repository showcases a collection of practical NLP projects, ranging from sentiment analysis to spam detection. The implementations leverage both Machine Learning (ML) and Deep Learning (DL) approaches to explore various natural language processing tasks and techniques.

deep-learning lstm machine-learning naive-bayes nlp nltk preprocessing stopwords tokenization

Last synced: 05 Oct 2025

https://github.com/bramblexu/jp-stopword-filter

A lightweight Python library designed to filter stopwords from Japanese text based on customizable rules.

japanese machine-learning nlp python stopwords

Last synced: 14 Mar 2025

https://github.com/sayande01/fake_news_detection_logisticregression

This project detects fake news using Logistic Regression with NLP techniques, including NLTK stopword removal, Porter Stemmer for text normalization, and TF-IDF vectorization for feature extraction. It achieves high accuracy and precision, offering a reliable solution to combat misinformation.

logistic-regression nltk porter-stemmer stopwords tf-idf-vectorizer

Last synced: 06 Apr 2025

https://github.com/ahmedabdalkreem/sentiment-analysis

This project performs sentiment analysis on a Twitter dataset, aiming to classify tweets into positive, negative, or neutral sentiments. Sentiment analysis is crucial for understanding public opinion on various topics, brands, or events based on social media data.

bert-model lematization matplotlib nlp nltk numpy pandas python3 sentiment-analysis stopwords streamlit

Last synced: 08 Apr 2026

https://github.com/carloocchiena/web_pages_words_counter

Scrape thru a list of urls and extract to excel the most common words, after some cleaning.

linkedin nltk python scraper stopwords

Last synced: 21 Mar 2025

https://github.com/elifftosunn/textdataclean

Kirli veri çekildiğinde ön işleme adımlarına gerek kalmadan model eğitimi için hazır hale getirmek amacıyla yapılan uygulamadır.

corpus deasciifier morphological-analysis ngram nltk numpy pandas sentence-embedding sentence-tokenizer stemmer stopwords string turkish turkish-sentence-tokenizer word-tokenizer

Last synced: 20 May 2026

https://github.com/yeremi/stopwords

A lightweight and efficient PHP library tailored for developers working on Natural Language Processing (NLP) tasks in Brazilian Portuguese.

elasticsearch extract-information fulltext-search indexing-querying natural-language-processing php portuguese search-engine snowball stemming stop stop-words stopwords

Last synced: 09 Feb 2026

https://github.com/open-technology-foundation/stopwords.bash

Pure Bash stopwords filter from input text. Faster than python for texts < 2000 words

bash nltk stopwords

Last synced: 18 May 2026

https://github.com/jersongb22/moocs_dataanalytics_project

Analysis of Udemy, edX, and Coursera datasets for a tech startup entering the online course market, aiming to understand their impact on demand. Publicly accessible Power BI report published on web.

matplotlib pandas powerbi python stopwords zebrabi

Last synced: 13 Jul 2025

https://github.com/nickenshidqia/natural_language_processing_of_books_using_python

Build The Natural Language Processing of Books to analyze books programmatically using Python and extracting valuable insights

natural-language-processing nltk-python python regex stopwords

Last synced: 29 Apr 2026

https://github.com/rachakondaganesh/using-nlp-online-and-retail-order-review-project

analyzed customer reviews from online and retail orders. Performed sentiment analysis, keyword extraction, and topic modeling to identify trends, satisfaction drivers, and pain points. Used Python (NLTK, spaCy) and visualization tools to present actionable insights for improving customer experience and product strategy.

bivariate-analysis lambda matplotlib-pyplot nlp-machine-learning pandas sent-tokenize stemming stopwords trivariate-analysis unicode-data univariate-analysis word-tokenization

Last synced: 12 Apr 2026

https://github.com/asayem172153/chat_parse_from_txt_and_summarize

Chat Log Analyzer with TF-IDF & NLTK A Python script to analyze chat logs between User and AI. It extracts messages, preprocesses them using NLTK (tokenization, POS tagging, lemmatization), and computes TF-IDF scores to summarize the most relevant topics discussed

lammatization nltk python stopwords tf-idf

Last synced: 05 Jul 2025

https://github.com/atheeralzhrani/nlp_projects

NLP projects, which I worked on utilising different natural language processing libraries's. 

nlp-datasets nltk-library rnn-lstm rnn-pytorch rnn-tensorflow spacy-nlp stemming stopwords tokenization

Last synced: 19 Aug 2025

https://github.com/niteshchawla/nlp-content-classification

The goal of this project is to use a bunch of news articles extracted from the companies’ internal database and categorize them into several categories like politics, technology, sports, business and entertainment based on their content.

bag-of-words lemmatization multiclass-classification natural-language-processing stopwords text-classification text-processing tf-idf tokenization

Last synced: 04 Oct 2025