An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with stemming

A curated list of projects in awesome lists tagged with stemming .

https://github.com/milaan9/python_natural_language_processing

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

bag-of-words inversedocumentfrequency ipython-notebook lemmatization named-entity-recognition nlp partofspeech-tagger python4datascience python4everybody sentence-segmentation stemming stopwords termfrequency tf-idf tokenization tutor-milaan9 vocabulary-matching

Last synced: 09 Apr 2025

https://github.com/milaan9/Python_Natural_Language_Processing

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

bag-of-words inversedocumentfrequency ipython-notebook lemmatization named-entity-recognition nlp partofspeech-tagger python4datascience python4everybody sentence-segmentation stemming stopwords termfrequency tf-idf tokenization tutor-milaan9 vocabulary-matching

Last synced: 28 Aug 2025

https://github.com/words/stemmer

Fast Porter stemmer implementation

natural-language porter stemmer stemming

Last synced: 12 Dec 2025

https://github.com/tokenmill/beagle

Beagle helps you identify keywords, phrases, regexes, and complex search queries of interest in streams of text documents.

clojure java lucene luwak nlp real-time-search stemming stored-query-engine stream-search

Last synced: 23 Jul 2025

https://github.com/trinker/textstem

Tools for fast text stemming & lemmatization

lemmatization r stemming text-mining

Last synced: 16 Mar 2025

https://github.com/wooorm/stmr.c

Porter Stemmer algorithm in C

porter stemmer stemming

Last synced: 19 Apr 2025

https://github.com/bastienbot/nlp-js-tools-french

POS Tagger, lemmatizer and stemmer for french language in javascript

lemmatization lemmatizer nlp postagging postgresql stemmer stemming tokenization tokenizer

Last synced: 01 Aug 2025

https://github.com/words/lancaster-stemmer

Lancaster stemming algorithm

lancaster natural-language stemmer stemming

Last synced: 05 Apr 2026

https://github.com/master/spark-stemming

Spark MLlib wrapper for the Snowball framework

nlp snowball spark stemming

Last synced: 04 Jul 2025

https://github.com/stephanj/bm25

A BM25 Java implementation using streams, stop words and stemming.

bm25 llm nlp rerank stemming

Last synced: 13 Oct 2025

https://github.com/liderman/rustemmer

Golang implementation Porter Stemming for Russian language

fast golang package porter russian stemmer stemmers stemming

Last synced: 13 Jul 2025

https://github.com/fzn0x/idnaive

🧠 A Simple Node.js Naive Bayes Library.

hacktoberfest javascript multi-language naive-bayes stemming

Last synced: 23 Mar 2025

https://github.com/kangfend/bahasa

Natural language toolkit for Indonesian Language (Bahasa)

bahasa indonesia natural-language-processing nlp nlp-python python sastrawi stemmer stemming

Last synced: 21 Jan 2026

https://github.com/dbklim/uk_stemmer

A small modification of the stemmer for the Ukrainian language (https://github.com/Amice13/ukr_stemmer)

natural-language-processing nlp stemmer stemmers stemming stemming-algorithm uk ukr ukrainian ukrainian-morphology

Last synced: 29 Apr 2025

https://github.com/nadar/stemming

PHP Stemming Collection

languages php stemmer stemmers stemming

Last synced: 13 Apr 2025

https://github.com/scaraux/swift-porter-stemmer-2

:snowman: A Swift wrapper over the Porter Stemmer 2 / libstemmer

porter-stemmer-algorithm porter-stemmer-v2 snowball stemming swift

Last synced: 11 Jun 2025

https://github.com/dariasmyr/fts-engine

A modular full-text search engine in Go with instant indexing, pluggable indexers, and configurable pre-search filters.

fulltext-search fuzzy-search ngram-analysis ngrams stemming trie

Last synced: 01 Apr 2026

https://github.com/ksdkamesh99/phony-news-classifier

Phony News Classifier is a repository which contains analysis of a natural language processing application i.e fake news classifier with the help of various text preprocessing strategies like bag of words,tfidf vectorizer,lemmatization,Stemming with Naive bayes and other deep learning RNN (LSTM) and maintaining the detailed accuracy below

bag-of-words deep-learning fake-news lemmatization lstm-neural-networks multinomial-naive-bayes naive-bayes-classifier natural-language-processing python3 stemming tfidfvectorizer

Last synced: 12 May 2025

https://github.com/mtumilowicz/elasticsearch7-ngrams-fuzzy-shingles-stemming-workshop

Gentle introduction to basic elasticsearch constructs boosting search: ngrams, shingles, stemmers, suggesters and fuzzy queries.

edge-ngram elasticsearch fuzzy-query fuzzy-search kibana ngram search-as-you-type shingles stemmer stemming suggester workshop workshop-materials

Last synced: 11 Apr 2025

https://github.com/labrijisaad/twitter-sentiment-analysis-with-python

I aim in this project to analyze the sentiment of tweets provided from the Sentiment140 dataset by developing a machine learning sentiment analysis model involving the use of classifiers. The performance of these classifiers is then evaluated using accuracy and F1 scores.

accuracy-score bernoulli-naive-bayes confusion-matrix f1-score lemmatization logistic-regression machine-learning nlp roc-auc-curve sentiment-analysis sentiment140-dataset stemming support-vector-machine tokenization twitter-sentiment-analysis

Last synced: 08 Apr 2025

https://github.com/singhpratyush/index-search-query

Inverted Index, Query Formulation and Ranking from Scratch in Python

indexing multithreading pipenv python query query-building ranking searching stemming

Last synced: 12 Apr 2025

https://github.com/assem-ch/snowball-sublime-syntax

Snowball framework syntax definition for Sublime Text 3

snowball stemming sublime-text syntax-highlighting

Last synced: 19 Feb 2026

https://github.com/prakharjadaun/feature-extraction-for-spam-email-detection

Implemented Preprocessing steps, Feature Extraction techniques and Naive Bayes Classifier in C++. Moreover, we have also implemented all the steps using python for comparative analysis.

bag-of-words-cpp email-spam-classifier naive-bayes-classifier-cpp nlp-machine-learning stemming text-classification

Last synced: 07 May 2025

https://github.com/wooorm/stmr

Porter Stemmer CLI

porter stemmer stemming

Last synced: 19 Apr 2025

https://github.com/shaadclt/twitter-hashtag-analysis

This project provides a website that allows users to analyze real-time tweets from Twitter based on a specific hashtag. The website includes a tweet sentiment analyzer to determine the sentiment (positive, negative, or neutral) of the collected tweets.

lemmization logistic-regression nltk stemming textblob wordcloud

Last synced: 10 Apr 2025

https://github.com/mrrefactoring/multilingual-stemmer

A NodeJS webasembly implementation of some popular snowball stemming algorithms

javascript nodejs stemmer stemmers stemming stemming-algorithm webassembly

Last synced: 16 Dec 2025

https://github.com/maxpatiiuk/porter-stemming

TypeScript implementation of the Porter Stemmer algorithm

porter stemmer stemming

Last synced: 22 Mar 2025

https://github.com/antonbaumann/german-go-stemmer

An efficient implementation of the German porter-stemming algorithm in Golang.

language-processing nlp porter-stemmer snowball stemming stemming-algorithm

Last synced: 05 Mar 2026

https://github.com/openderocknlp/extract-lemmatized-nonstop-words

Extracts a pure list of stemmed words of a text filtered by stop words

javascript lemma nlp npm stemming stopwords tokenizer

Last synced: 06 Oct 2025

https://github.com/anishlearnstocode/nlp-playground

Small code snippets written in Python covering fundamental concepts in NLP used in all major NLP projects.

lemmatization natural-language-processing nlp porter-stemmer stemming

Last synced: 10 Apr 2025

https://github.com/fardinhash/chatbot-deep-learning

This Chatbot completed with combination of Deep Learning, Natural Language Toolkit(NLTK), PyTorch mode. And highest accuracy achieved here.

ai-chatbot chatbot deep-learning lemmatization machine-learning ml natural-language-processing natural-language-toolkit nlp nltk python pytorch pytorch-model stemming tokenization

Last synced: 22 Sep 2025

https://github.com/burhanharoon/urdu-stemmer

A simple python based Urdu stemmer which tries to find a stem word from a list of affixes.

python python3 stemming stemming-algorithm urdu urdu-language urdu-nlp urdu-text-processsing

Last synced: 19 Apr 2026

https://github.com/mmahmoodictbd/solr-analysis-bn

Solr / Lucene Bangla Analyzer, Stem Filter, Stemmer.

bangla bengali solr solr-plugin solr-search stemmer stemming

Last synced: 26 Mar 2025

https://github.com/krisharul26/text-classification-dbpedia-ontology-classes-using-lstm

Text classification is the task of assigning a set of predefined categories to free text. Text classifiers can be used to organize, structure, and categorize pretty much anything. For example, new articles can be organized by topics, support tickets can be organized by urgency, chat conversations can be organized by language, brand mentions can be organized by sentiment, and so on.

attention-mechanism bagofwords flask-application gensim-doc2vec gensim-word2vec glove-embeddings lemmatization lstm-neural-networks nlp-machine-learning nltk-python restapi-framework rnn-tensorflow stemming tensorflow2 word2vec-embeddinngs word2vec-model

Last synced: 22 Jan 2026

https://github.com/abdullahashfaqvirk/NLP-Workshops

Embark on your NLP journey by learning essential techniques through a series of notebooks designed to kickstart your career in this field.

lemmatization named-entity-recognition nlp nltk notebooks pos-tagging python stemming stopwords tokenization workshops

Last synced: 27 Sep 2025

https://github.com/mrseanryan/data-type-predictor

Given the name of a property or attribute like 'BrandName' or 'AmountReceived', try to predict a data type like String, Boolean, Integer...

ai data-classification data-types nlp stemming

Last synced: 08 Nov 2025

https://github.com/donderom/stemerge

A collection of stemmers in Erlang

erlang nlp stemming

Last synced: 17 Jan 2026

https://github.com/juliatext/snowball.jl

Snowball stemming algorithms

nlp stemming

Last synced: 20 Feb 2026

https://github.com/shutterstock/stemming-exceptions

A collection of stemming exceptions for different languages.

iso natlang stemming

Last synced: 28 Jan 2026

https://github.com/eilvelia/porter2.js

Fastest JavaScript implementation of the porter2 stemming algorithm

english porter snowball stemmer stemming

Last synced: 29 Apr 2025

https://github.com/putuwaw/linggapy

Library for Stemming Balinese Text Language

balinese nlp python stemmer stemming thesis

Last synced: 20 Feb 2026

https://github.com/cosmoduende/r-twitter

Explore your Twitter activity with R: Sentiment Analysis and Data Visualization. How to analyze your Twitter account (or any account), discover your habits and sentiments with the "rtweet" package and NLP.

data-analysis data-visualization lemmatization nlp nlp-library nlp-resources nltk nltk-library r-package r-programming r-studio rtweet stemming twitter twitter-api twitter-data twitter-data-analysis twitter-data-extraction twitter-sentiment-analysis udpipe

Last synced: 10 Oct 2025

https://github.com/hernanmd/libstemmer

Pharo uFFI wrapper for the Porter Stemmer algorithm

pharo pharo-smalltalk porter-stemmer smalltalk stemming stemming-algorithm

Last synced: 27 Feb 2026

https://github.com/tomsquest/lucene-stemmers

Stem words like Lucene (port of Lucene' stemmers to JavaScript)

lucene stem stemmer stemming

Last synced: 24 Jun 2026

https://github.com/aquilax/go-stemmer

Bulgarian language stemmer library in go for the BULSTEM rules

bulgarian language stemming

Last synced: 15 Mar 2025

https://github.com/aarryasutar/hate_speech_detection

This project aims to detect hate speech on Twitter using advanced NLP and machine learning techniques, exploring feature extraction methods like TF-IDF and sentiment analysis, and evaluating models such as Logistic Regression and SVM.

confusion-matrix doc2vec gensim logistic-regression matplotlib naive-bayes nltk numpy pandas python random-forest scikit-learn seaborn stemming stopwords-removal svm tf-idf-vectorizer tokenization vader word-cloud

Last synced: 09 Apr 2026

https://github.com/iiiioreo/data-cleaning-w-gui

AIO Data Cleaning: Python application using Tkinter for text file manipulation, featuring functions such as case conversion, lemmatization, stemming, and more.

ai datacleaning lemmatization python stemming text-editor text-to-pdf tkinter

Last synced: 30 Mar 2025

https://github.com/mishamyrt/seshat

🔎 Search engine

php search-engine stemming

Last synced: 24 Apr 2025

https://github.com/hangsbreaker/stemming-ind

Javascript, PHP, Python Stemming Bahasa Indonesia

javascript nodejs php stem stemmer stemming stemming-algorithm

Last synced: 07 May 2026

https://github.com/sayande01/natural_language_processing

This repository contains Jupyter notebooks and Python scripts that cover foundational concepts and practical implementations of NLP preprocessing techniques. Each topic is accompanied by clear explanations and code examples using the Natural Language Toolkit (NLTK) library.

bag-of-words natural-language-processing nltk stemming word2vec

Last synced: 06 Apr 2025

https://github.com/04bhavyaa/sms-spam-classification-system

A Machine Learning project that identifies whether a given message is spam or not. It uses Natural Language Processing (NLP) techniques (Stemming and TF-IDF Vectorization) for text transformation and a trained Multinomial Naive Bayes Classifier for predictions.

bernoulli-naive-bayes nlp-machine-learning nltk-library spam-classification stemming streamlit tfidf-vectorizer

Last synced: 24 Apr 2026

https://github.com/chandkund/sms-spam-detection

The goal is to develop a classification model that can accurately differentiate between spam and non-spam messages. This is crucial for applications like email filtering, SMS spam detection, and improving overall user experience by reducing the influx of unwanted or malicious content.

matplotlib nlp-machine-learning numpy pandas seaborn stemming tfidf-vectorizer tokenization

Last synced: 19 Jan 2026

https://github.com/jigyasag18/fake-news-prediction-project

The Fake News Prediction App Repository offers a machine learning project that focuses on identifying the authenticity of news articles as fake or real. It uses a dataset of 20,000 articles and employs methods such as TF-IDF vectorization and the Porter stemming algorithm, achieving around 97% classification accuracy with logistic regression model.

data datapreprocessing logistic-regression machine-learning machine-learning-algorithms numpy pandas prediction stemming vectorization

Last synced: 08 Jun 2026

https://github.com/atharvapathak/customer_sentiment_analysis

Customer sentiment analysis is the process of using natural language processing (NLP) and machine learning techniques to analyze and understand the feelings, opinions, and attitudes expressed by customers in textual data, such as reviews, feedback, and social media posts.

cnn naive-bayes nlp nltk spacy stemming text-mining tokenization

Last synced: 21 Feb 2026

https://github.com/mitica/root-name

Extracts root name of a name.

name root root-name stemmer stemming

Last synced: 12 May 2026

https://github.com/fusi3/natural_language_coursework

Assessing the impact of different pre-processing techniques for classifying the sentiment of movie reviews

bag-of-words latent-semantic-analysis lemmatization multilayer-perceptron nlp sentiment-analysis stemming support-vector-machines tfidf

Last synced: 18 Mar 2025

https://github.com/yeremi/stopwords

A lightweight and efficient PHP library tailored for developers working on Natural Language Processing (NLP) tasks in Brazilian Portuguese.

elasticsearch extract-information fulltext-search indexing-querying natural-language-processing php portuguese search-engine snowball stemming stop stop-words stopwords

Last synced: 09 Feb 2026

https://github.com/cyberfantics/naturallanguageprocessing

A comprehensive repository for the Natural Language Processing course, featuring lecture notes, slides, and practical implementations of key NLP concepts using Python and popular libraries.

chatbots hacktoberfest lemmatization nltk nltk-python spacy-nlp stemming tokenization transformer

Last synced: 12 Jun 2026

https://github.com/mayankmittal29/duplifinder-quora-clone-catcher

An advanced system for detecting semantically duplicate question pairs using cutting-edge NLP techniques. Combines traditional ML models (XGBoost, SVM, Random Forest) with deep learning architectures (BiLSTM, Siamese Networks, Transformers) and contextual embeddings (BERT, RoBERTa). Features engineered using token similarity, fuzzy matching, and em

bert bilstm cross-validation eda fastext fuzzy-matching glove numpy pandas python3 quora-question-pairs random-forest roberta seaborn stemming svm tf-idf transformers word2vec xgboost

Last synced: 15 Apr 2026

https://github.com/tggo/steblo

Zero-dependency rule-based Ukrainian stemmer in pure Go

bleve golang nlp stemmer stemming text-processing ukrainian ukrainian-language

Last synced: 03 Jun 2026

https://github.com/abinashsahoo007/project-resume-classification

The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.

corpus count-vectorizer label-encoding lemmitization machine-learning nltk part-of-speech-tagging resume-classification spacy stemming text-mining text-preprocessing textract tfidf-vectorizer tokenization wordcloud

Last synced: 02 Feb 2026

https://github.com/somjit101/nlp-stackeroverflow-tag-prediction

A multi-class classification problem where the objective is to read a question posted on the popular reference website, StackOverflow and predict the primary topics it deals with, i.e. tags which the question will be associated with.

bag-of-words countvectorizer logistic-regression multi-class-classification multiclass-logistic-regression natural-language-processing nlp one-vs-rest onevsrestclassifier stackoverflow-tags stemming text-mining tf-idf tfidf-vectorizer word-cloud

Last synced: 05 Jun 2026

https://github.com/atheeralzhrani/nlp_projects

NLP projects, which I worked on utilising different natural language processing libraries's. 

nlp-datasets nltk-library rnn-lstm rnn-pytorch rnn-tensorflow spacy-nlp stemming stopwords tokenization

Last synced: 19 Aug 2025

https://github.com/sdpdas/sm_sentiment_analysis

Using Natural Language Processing (NLP) and pandas, numpy, scikit-learn for classification and applying logistic regression as it is a supervised model, lastly NLTK. Pickle library used for saving and running the model anywhere.

logistic-regression machine-learning nlp scikit-learn sentiment-analysis stemming vectorizer

Last synced: 03 Jan 2026

https://github.com/gfyoung/stemming

Web Application for Counting Words in a Document

flask-application python stemming

Last synced: 15 Mar 2025

https://github.com/arya-io/nlp-explorer

NLP Explorer is an interactive Streamlit app that lets users explore various NLP techniques like Tokenization, POS Tagging, Stemming, Lemmatization, and NER. It provides real-time analysis of text, making it a great tool for learning and experimenting with NLP concepts.

datascience lemmatization machinelearning naturallanguageprocessing ner nlp nltk postagging python stemming streamlit textanalysis textprocessing tokenization

Last synced: 01 May 2026

https://github.com/kajuberdut/porter2

A python wrapper around surgebase's porter2 implementation.

nlp snowball stemming stemming-porters

Last synced: 20 May 2026

https://github.com/fnando/stemmers

Stemming and language detection bindings for Ruby

gem language-detection ruby stemming

Last synced: 30 Jun 2025

https://github.com/jigyasag18/fake-news-prediction-app

The Fake News Prediction App Repository offers a machine learning project that focuses on identifying the authenticity of news articles as fake or real. It uses a dataset of 20,000 articles and employs methods such as TF-IDF vectorization and the Lemmatization algorithm, achieving ~95% classification accuracy with random forest classifier model

data datapreprocessing logistic-regression machine-learning machine-learning-algorithms numpy pandas prediction stemming streamlit streamlit-webapp vectorization

Last synced: 11 Apr 2026

https://github.com/nurfawaiq/ir-stemming-nazief

Information Retrieval - Stemming Nazief

information-retrieval stemming

Last synced: 18 Mar 2025

https://github.com/rachakondaganesh/using-nlp-online-and-retail-order-review-project

analyzed customer reviews from online and retail orders. Performed sentiment analysis, keyword extraction, and topic modeling to identify trends, satisfaction drivers, and pain points. Used Python (NLTK, spaCy) and visualization tools to present actionable insights for improving customer experience and product strategy.

bivariate-analysis lambda matplotlib-pyplot nlp-machine-learning pandas sent-tokenize stemming stopwords trivariate-analysis unicode-data univariate-analysis word-tokenization

Last synced: 12 Apr 2026