An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with text-analysis

A curated list of projects in awesome lists tagged with text-analysis .

https://github.com/abilzerian/llm-prompt-library

My personal prompt library for various LLMs + scripts & tools. Suitable for models from Deepseek, OpenAI, Claude, Meta, Mistral, Google, Grok, and others.

adaptive-learning meta-prompting multimodal prompt prompt-engineering prompt-evaluation prompt-generator prompt-injection prompt-learning prompt-management prompt-optimization prompt-template prompt-toolkit prompt-tuning promptengineering prompting rag text-analysis

Last synced: 20 Apr 2025

https://github.com/opensemanticsearch/open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

annotation faceted-search fulltext-search investigative-journalism journalism named-entity-recognition ocr ontologies osint python research-tool search search-engine search-interface semantic skos text-analysis text-mining thesaurus ui

Last synced: 16 May 2025

https://github.com/greyblake/whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/

ai algorithm classifier detect-language language language-recognition nlp rust rustlang text-analysis text-classification text-classifier whatlang

Last synced: 13 May 2025

https://github.com/abilzerian/LLM-Prompt-Library

Advanced Code and Text Manipulation Prompts for Various LLMs. Suitable for Siri, GPT-4o, Claude, Llama3, Gemini, and other high-performance open-source LLMs.

ai apple-intelligence artificial-intelligence chatbot chatgpt chatgpt-api gpt gpt-3 gpt-4 machine-learning openai prompt prompt-engineering prompt-injection prompt-toolkit prompting prompts python siri text-analysis

Last synced: 27 Mar 2025

https://github.com/wyounas/homer

Homer, a text analyser in Python, can help make your text more clear, simple and useful for your readers.

nlp nlp-library python python-library python-script python3 text-analysis

Last synced: 09 Apr 2026

https://github.com/yooper/php-text-analysis

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

nlp php php-language php-text-analysis text-analysis tokenization

Last synced: 12 Feb 2026

https://github.com/fhamborg/giveme5w1h

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

5w 5w1h answering event-detection event-extraction fivew fivewoneh news news-articles nlp nlp-library question question-answering text-analysis

Last synced: 26 Jan 2026

https://github.com/fhamborg/Giveme5W1H

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

5w 5w1h answering event-detection event-extraction fivew fivewoneh news news-articles nlp nlp-library question question-answering text-analysis

Last synced: 22 Mar 2025

https://github.com/fbkarsdorp/python-course

Tutorial and introduction into programming with Python for the humanities and social sciences

humanities python-course teaching text-analysis

Last synced: 15 Mar 2025

https://github.com/Mathux/TEMOS

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions", ECCV 2022 (Oral)

human-motion motion-generation text-analysis

Last synced: 03 Apr 2025

https://github.com/sansan0/bilibili-comment-analyzer

🎯 哔哩哔哩(bilibili)评论区数据可视化分析软件-- up主可用于指导自己的题材选择,明确自己的粉丝群体

bangumi bilibili comments dashboard data-visualization geo-mapping gui-application heatmap interactive-visualization macos python sentiment-analysis social-media-analytics spider text-analysis trend-analysis web-scraping windows wordcloud

Last synced: 21 Jan 2026

https://github.com/5j9/wikitextparser

A Python library to parse MediaWiki WikiText

mediawiki parsing python text-analysis

Last synced: 15 May 2025

https://github.com/emilhvitfeldt/smltar

Manuscript of the book "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge

bookdown supervised-machine-learning text-analysis

Last synced: 16 May 2025

https://github.com/EmilHvitfeldt/smltar

Manuscript of the book "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge

bookdown supervised-machine-learning text-analysis

Last synced: 14 Jul 2025

https://github.com/trinker/textclean

Tools for cleaning and normalizing text data

data-munging emoticons r regex text-analysis text-cleaning

Last synced: 05 Apr 2025

https://github.com/textvec/textvec

Text vectorization tool to outperform TFIDF for classification tasks

machine-learning natural-language-processing nlp python text-analysis text-classification text-processing tf-idf

Last synced: 05 Apr 2025

https://github.com/trinker/qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

qdap quantitative-discourse-analysis text-analysis text-mining text-plotting

Last synced: 05 Apr 2025

https://github.com/karolzak/support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

ai artificial-intelligence azure azure-app-service azure-machine-learning azure-web-app-service azure-webapp classification classifier machine-learning ml model numpy pandas python text-analysis text-classification text-mining text-processing web-service

Last synced: 04 Oct 2025

https://github.com/rstudio-conf-2020/applied-ml

Code and Resources for "Applied Machine Learning"

classification machine-learning regression text-analysis tidymodels

Last synced: 11 Feb 2026

https://github.com/emilhvitfeldt/r-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 18 Jan 2026

https://github.com/EmilHvitfeldt/R-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 13 Jul 2025

https://github.com/stanfordnlp/stanza-old

Stanford NLP group's shared Python tools.

natural-language-processing nlp python text-analysis text-processing

Last synced: 02 Aug 2025

https://github.com/eellak/nlpbuddy

A text analysis application for performing common NLP tasks through a web dashboard interface and an API

fasttext gensim natural-language-processing spacy text-analysis text-classification

Last synced: 12 Apr 2025

https://github.com/brucewlee/lftk

[BEA @ ACL 2023] General-purpose tool for linguistic features extraction; Tested on readability assessment, essay scoring, fake news detection, hate speech detection, etc.

bea-workshop feature-extraction handcrafted-features linguistic-features natural-language-processing python readability-scores reading-time spacy text-analysis word-difficulty

Last synced: 12 Apr 2025

https://github.com/johnbumgarner/wordhoard

This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.

antonyms bag-of-words definitions dictionary homophones hypernyms hyponyms lexicon nlp python python3 synonyms text-analysis textual-analysis wordlists wordnet wordnets wordsearch

Last synced: 14 Jan 2026

https://github.com/dondealban/learning-stm

Learning structural topic modeling using the stm R package.

automated-content-analysis machine-learning stm text-analysis topic-modeling

Last synced: 15 Mar 2025

https://github.com/quanteda/stopwords

Multilingual Stopword Lists in R

r text-analysis

Last synced: 12 Dec 2025

https://github.com/dhowe/ritajs-v2

RiTa: generative language tools

generative-text natural-language rita text-analysis

Last synced: 12 May 2025

https://github.com/jbgruber/lexisnexistools

:newspaper: Working with newspaper data from 'LexisNexis'

r r-package rstats text-analysis

Last synced: 07 Apr 2025

https://github.com/forTEXT/catma

Computer Assisted Text Markup and Analysis

annotations digital-humanities java text-analysis text-markup webapp

Last synced: 15 Apr 2025

https://github.com/dhowe/rita

Website, documentation and examples for RiTa

generative-text natural-language rita text-analysis text-generation

Last synced: 07 Apr 2025

https://github.com/remram44/taguette

Free and open source qualitative research tool -- MIRROR OF GITLAB REPOSITORY

hacktoberfest highlighting notes qualitative-analysis research-tool tagging tags text-analysis

Last synced: 07 Apr 2025

https://github.com/zaratsian/spark

Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References

machine-learning nlp pyspark spark text-analysis

Last synced: 11 Apr 2025

https://github.com/kgjerde/corporaexplorer

An R package for dynamic exploration of text collections

corpora corpus r shiny text-analysis

Last synced: 22 Oct 2025

https://github.com/apache/uima-uimaj

Apache UIMA Java SDK

apache java text-analysis uima

Last synced: 08 Apr 2025

https://github.com/juba/rainette

R implementation of the Reinert text clustering method

r text-analysis text-classification

Last synced: 04 Oct 2025

https://github.com/jonclayden/ore

An R interface to the Onigmo regular expression library

r regex regular-expressions text-analysis

Last synced: 13 Apr 2025

https://github.com/koheiw/newsmap

Semi-supervised algorithm for geographical document classification

machine-learning news-stories quanteda text-analysis

Last synced: 08 Sep 2025

https://github.com/koheiw/LSX

Semi-supervised algorithm for document scaling

lsa quanteda sentiment-analysis text-analysis

Last synced: 13 Jul 2025

https://github.com/koheiw/lsx

Semi-supervised algorithm for document scaling

lsa quanteda sentiment-analysis text-analysis

Last synced: 05 Apr 2025

https://github.com/pjhampton/woolly

The Text Mining Elixir

text-analysis text-mining

Last synced: 07 May 2025

https://github.com/ropensci/jstor

Import journal data from DfR (JSTOR)

jstor peer-reviewed r r-package rstats text-analysis text-mining

Last synced: 22 Oct 2025

https://github.com/codewithdark-git/darkgpt

DarkGPT Chat Explorer is an interactive web application that allows users to engage in conversations with various GPT (Generative Pre-trained Transformer) models in real-time. This repository contains the source code for the application.

app chatbot database gemini gemini-ai gemini-pro-vision gen-ai google gpt huggingface-transformers image-generation latest python pytorch sqlite3 text-analysis text-classification text-summarization transformer

Last synced: 30 Aug 2025

https://github.com/oneai-nlp/oneai-python

Python SDK for One AI APIs. One AI is an NLP-as-a-service platform. Our APIs enables language comprehension in context, transforming texts from any source into structured data to use in code.

ai api api-rest artificial-intelligence language language-ai natural-language-processing natural-language-understanding nlp oneai python python-library python3 rest-api summarization summary text text-analysis text-classification text-processing

Last synced: 14 Feb 2026

https://github.com/bank-of-england/occupationcoder

Given a job title and job description, the algorithm assigns a standard occupational classification (SOC) code to the job.

bankofengland boe economics jobs python soc text-analysis tf-idf vacancies

Last synced: 04 Apr 2026

https://github.com/mit-lcp/bloatectomy

A python package for removing duplicate text in clinical notes or other documents

fda mimic mimic-iii nlp-resources plagarism plagiarism-evaluation python-3 python3 text-analysis text-mining text-processing

Last synced: 13 Apr 2025

https://github.com/prakharrathi25/text-analytics-tool

This is an application that automates the process of text analysis with a user-friendly GUI. 📱 It has been implemented using Python and deployed with the Streamlit package.

hacktoberfest machine-learning natural-language-processing nlp python sentiment-analysis streamlit-webapp text-analysis text-classification

Last synced: 07 May 2025

https://github.com/leslie-huang/stylest

R package for estimating speaker style distinctiveness in texts. Install it from CRAN!

classification r text-analysis

Last synced: 25 Apr 2025

https://github.com/moment-of-peace/EventForecast

Time series prediction and text analysis using Keras LSTM, plus clustering, association rules mining

association-rules clustering keras lstm text-analysis time-series

Last synced: 11 May 2025

https://github.com/apache/uima-uimafit

Apache UIMA uimaFIT

apache java text-analysis uima

Last synced: 10 Apr 2025

https://github.com/koheiw/workshop-ijta

Rによる日本語テキスト分析入門

japanese-language quanteda r text-analysis

Last synced: 05 Apr 2025

https://github.com/twardoch/split-markdown4gpt

A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.

data-preprocessing gpt gpt-3 gpt-35-turbo gpt-35-turbo-16k gpt-4 markdown markdown-processing mistletoe natural-language-processing nlp openai openai-gpt python split-text summarization text-analysis text-processing text-summarization text-tokenization

Last synced: 08 Jul 2025

https://github.com/microsoft/autobrewml

With AutoBrewML Framework the time it takes to get production-ready ML models with great ease and efficiency highly accelerates.

anomaly-detection azure-automl cleansing-data data-science datavisualization machine-learning microsoft nlp-machine-learning responsible-ml sampling-strategies text-analysis text-classification text-summarization

Last synced: 12 Mar 2026

https://github.com/dbklim/russian_subtitles_dataset

Preprocessing of the dataset of 347 subtitles for the TV series (thanks to Taiga Corpus) to build a word2vec model, JamSpell model, neural network training, chat bot training or in any other NLP task.

bot cnn corpus dataset lstm machine-learning ml natural-language-processing nlp nlu rnn russian subtitles text text-analysis text-processing word2vec

Last synced: 29 Apr 2025

https://github.com/lithika-damnod/russ

Get instant answers to your questions about any text with Russ - an AI-powered reading companion that analyzes and summarizes any text you provide and answer questions based on the information in the passage

chatgpt collaborate react saas text-analysis text-summarization ui ux

Last synced: 07 Sep 2025

https://github.com/pablobarbera/big-data-upf

RECSM-UPF Summer School: Social Media and Big Data Research

big-data facebook rstudio scraping-websites social-media social-network-analysis text-analysis twitter

Last synced: 02 Jan 2026

https://github.com/dmytrovoytko/sublimetext-translate

🌐 Translation plugin (multi-engine, fast, flexible) for SublimeText 3 & 4, works without API keys, works in China

hacktoberfest plugin python readability sublime-text text-analysis translation

Last synced: 11 Apr 2025

https://github.com/smkrv/ha-text-ai

Cutting-edge AI solution for Home Assistant. Multi-LLM provider support to transform your smart home experience with intelligent, adaptive automation.

ai anthropic anthropic-claude artificial-intelligence deepseek deepseek-api gpt hacs hacs-custom hacs-integration home-assistant home-assistant-integration homeassistant llm natural-language-processing openai-api openrouter openrouter-api sonnet text-analysis

Last synced: 15 Apr 2025

https://github.com/chainsawriot/rectr

💒 Reproducible Extraction of Cross-lingual Topics using R

r text-analysis topic-model

Last synced: 29 Oct 2025

https://github.com/quanteda/quanteda.corpora

A collection of corpora for quanteda

quanteda text-analysis

Last synced: 05 Apr 2025

https://github.com/knime/knime-textprocessing

KNIME - Text Processing Extension (Labs)

knime nlp-machine-learning text-analysis text-processing workflow

Last synced: 21 Jan 2026

https://github.com/apache/uima-ruta

Apache UIMA Ruta

apache java ruta text-analysis uima

Last synced: 19 Oct 2025

https://github.com/dario-github/notion-nlp

Read the text from a Notion database and perform NLP analysis.

flomo nlp notion notion-api notion-database python text-analysis text-summarization tf-idf

Last synced: 16 Mar 2026

https://github.com/nlpie/biomedicus

BioMedICUS: A biomedical and clinical NLP engine.

biomedical-informatics health-informatics natural-language-processing nlp text-analysis

Last synced: 16 Oct 2025

https://github.com/sandsmark/scp-wiki

Mirror of the scp wiki, approx. 20 million words. If you just want the text for e. g. training some version of GPT download the latest release (half the size without the git history).

dataset scp scp-foundation text-analysis text-generation text-mining text-processing wikidot

Last synced: 04 Jan 2026

https://github.com/gjtorikian/what_you_say

Natural language detection library. Written in Rust, wrapped in Ruby.

text-analysis text-classification

Last synced: 11 Sep 2025

https://github.com/dhchenx/rsnltk

Rust-based Natural Language Toolkit using Python Bindings

human-language natural-language-processing nlp-in-rust rsnltk rust-text-analysis stanza text-analysis

Last synced: 24 Jul 2025

https://github.com/apache/uima-uimacpp

C++ support for Apache UIMA

apache java text-analysis uima

Last synced: 10 Apr 2025

https://github.com/masurii/fbscrapeideas

Modern CLI tool for scraping & analyzing Facebook groups using Playwright & Gemini AI. Features self-healing selectors, session security, and local offline analysis.

academic-research ai cli data-extraction data-mining facebook-scraper gemini-api idea-generation nlp python selenium text-analysis

Last synced: 28 Apr 2026

https://github.com/darkliquid/textstats

Generate information about text including syllable counts and Flesch-Kincaid, Gunning-Fog, Coleman-Liau, SMOG and Automated Readability scores.

automated-readability-scores coleman-liau dale-chall flesch-kincaid go smog syllable-counts text-analysis

Last synced: 25 Jan 2026

https://github.com/koheiw/newspapers

R package to import articles from newspaper databases

r text-analysis

Last synced: 05 Apr 2025

https://github.com/chainsawriot/textplex

Calculate textual complexity using the algorithm by Tolochko & Boomgaarden (2019).

r text-analysis

Last synced: 31 Aug 2025

https://github.com/jonathanraiman/ciseau

:rocket: Tokenize and clean strings in Python

natural-language-processing python text text-analysis tokenizer xml

Last synced: 11 Jul 2025

https://github.com/direct-phonology/dphon

uncover old chinese textual parallels based on sound

chinese-traditional nlp phonology python text-analysis

Last synced: 07 Apr 2025