Projects in Awesome Lists tagged with text-analysis
A curated list of projects in awesome lists tagged with text-analysis .
https://github.com/obsei/obsei
Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .
anonymization artificial-intelligence business-process-automation customer-engagement customer-support issue-tracking-system low-code lowcode natural-language-processing nlp process-automation python sentiment-analysis social-listening social-network-analysis text-analysis text-analytics text-classification workflow workflow-automation
Last synced: 14 Jan 2026
https://github.com/abilzerian/llm-prompt-library
My personal prompt library for various LLMs + scripts & tools. Suitable for models from Deepseek, OpenAI, Claude, Meta, Mistral, Google, Grok, and others.
adaptive-learning meta-prompting multimodal prompt prompt-engineering prompt-evaluation prompt-generator prompt-injection prompt-learning prompt-management prompt-optimization prompt-template prompt-toolkit prompt-tuning promptengineering prompting rag text-analysis
Last synced: 20 Apr 2025
https://github.com/opensemanticsearch/open-semantic-search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
annotation faceted-search fulltext-search investigative-journalism journalism named-entity-recognition ocr ontologies osint python research-tool search search-engine search-interface semantic skos text-analysis text-mining thesaurus ui
Last synced: 16 May 2025
https://github.com/greyblake/whatlang-rs
Natural language detection library for Rust. Try demo online: https://whatlang.org/
ai algorithm classifier detect-language language language-recognition nlp rust rustlang text-analysis text-classification text-classifier whatlang
Last synced: 13 May 2025
https://github.com/abilzerian/LLM-Prompt-Library
Advanced Code and Text Manipulation Prompts for Various LLMs. Suitable for Siri, GPT-4o, Claude, Llama3, Gemini, and other high-performance open-source LLMs.
ai apple-intelligence artificial-intelligence chatbot chatgpt chatgpt-api gpt gpt-3 gpt-4 machine-learning openai prompt prompt-engineering prompt-injection prompt-toolkit prompting prompts python siri text-analysis
Last synced: 27 Mar 2025
https://github.com/meta-toolkit/meta
A Modern C++ Data Sciences Toolkit
c-plus-plus graph-algorithms inverted-index language-modeling nlp nlp-parsing pos-tag search-engine text-analysis text-analytics text-classification word-embeddings
Last synced: 15 Mar 2025
https://github.com/wyounas/homer
Homer, a text analyser in Python, can help make your text more clear, simple and useful for your readers.
nlp nlp-library python python-library python-script python3 text-analysis
Last synced: 09 Apr 2026
https://github.com/graphbrain/graphbrain
Language, Knowledge, Cognition
artificial-intelligence cognitive-science computational-social-science hypergraphs knowledge knowledge-base knowledge-graph knowledge-representation natural-language-processing natural-language-understanding nlp philosophy python text-analysis text-mining
Last synced: 07 Apr 2025
https://github.com/yooper/php-text-analysis
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
nlp php php-language php-text-analysis text-analysis tokenization
Last synced: 12 Feb 2026
https://github.com/programminghistorian/jekyll
Jekyll-based static site for The Programming Historian
api data-management data-manipulation data-mining dh digital-humanities exhibits linked-open-data mapping multi-lingual network-analysis open-educational-resources open-source pedagogy programming-historian python r-studio scraping text-analysis web-scraping
Last synced: 14 Mar 2025
https://github.com/fhamborg/giveme5w1h
Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
5w 5w1h answering event-detection event-extraction fivew fivewoneh news news-articles nlp nlp-library question question-answering text-analysis
Last synced: 26 Jan 2026
https://github.com/fhamborg/Giveme5W1H
Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
5w 5w1h answering event-detection event-extraction fivew fivewoneh news news-articles nlp nlp-library question question-answering text-analysis
Last synced: 22 Mar 2025
https://github.com/get-woke/woke
Detect non-inclusive language in your source code.
ci codereview conscious-language go golang inclusive inclusive-coding inclusive-language inclusive-lint inclusive-linter lint linter linting static-analysis text-analysis
Last synced: 08 Apr 2025
https://github.com/fbkarsdorp/python-course
Tutorial and introduction into programming with Python for the humanities and social sciences
humanities python-course teaching text-analysis
Last synced: 15 Mar 2025
https://github.com/airbnb/artificial-adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
adversarial-examples black-box-attacks black-box-benchmarking classification data-mining data-science machine-learning metrics python python2 python3 spam spam-classification spam-detection spam-filtering text text-analysis text-classification text-mining text-processing
Last synced: 08 Oct 2025
https://github.com/Mathux/TEMOS
Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions", ECCV 2022 (Oral)
human-motion motion-generation text-analysis
Last synced: 03 Apr 2025
https://github.com/sansan0/bilibili-comment-analyzer
🎯 哔哩哔哩(bilibili)评论区数据可视化分析软件-- up主可用于指导自己的题材选择,明确自己的粉丝群体
bangumi bilibili comments dashboard data-visualization geo-mapping gui-application heatmap interactive-visualization macos python sentiment-analysis social-media-analytics spider text-analysis trend-analysis web-scraping windows wordcloud
Last synced: 21 Jan 2026
https://github.com/shcherbak-ai/contextgem
ContextGem: Effortless LLM extraction from documents
ai contract-analysis data-extraction document-intelligence docx docx2md docx2txt generative-ai legaltech llm llm-extraction llm-framework llm-pipeline llms nlp prompt-engineering text-analysis unstructured-data
Last synced: 13 May 2025
https://github.com/5j9/wikitextparser
A Python library to parse MediaWiki WikiText
mediawiki parsing python text-analysis
Last synced: 15 May 2025
https://github.com/textpipe/textpipe
Textpipe: clean and extract metadata from text
language-identification named-entities named-entity-recognition nlp text-analysis text-processing
Last synced: 07 Apr 2025
https://github.com/jboynyc/textnets
Text analysis with networks.
computational-social-science network-analysis nlp sociology text-analysis text-as-data visualization
Last synced: 21 Feb 2026
https://github.com/ryanjgallagher/shifterator
Interpretable data visualizations for understanding how texts differ at the word level
computational-social-science data-visualization digital-humanities information-theory natural-language-processing sentiment-analysis text-analysis text-as-data
Last synced: 13 Jul 2025
https://github.com/emilhvitfeldt/smltar
Manuscript of the book "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge
bookdown supervised-machine-learning text-analysis
Last synced: 16 May 2025
https://github.com/EmilHvitfeldt/smltar
Manuscript of the book "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge
bookdown supervised-machine-learning text-analysis
Last synced: 14 Jul 2025
https://github.com/trinker/textclean
Tools for cleaning and normalizing text data
data-munging emoticons r regex text-analysis text-cleaning
Last synced: 05 Apr 2025
https://github.com/textvec/textvec
Text vectorization tool to outperform TFIDF for classification tasks
machine-learning natural-language-processing nlp python text-analysis text-classification text-processing tf-idf
Last synced: 05 Apr 2025
https://github.com/trinker/qdap
Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
qdap quantitative-discourse-analysis text-analysis text-mining text-plotting
Last synced: 05 Apr 2025
https://github.com/karolzak/support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
ai artificial-intelligence azure azure-app-service azure-machine-learning azure-web-app-service azure-webapp classification classifier machine-learning ml model numpy pandas python text-analysis text-classification text-mining text-processing web-service
Last synced: 04 Oct 2025
https://github.com/rstudio-conf-2020/applied-ml
Code and Resources for "Applied Machine Learning"
classification machine-learning regression text-analysis tidymodels
Last synced: 11 Feb 2026
https://github.com/mycroftai/padatious
A neural network intent parser
intent intent-classification language-detection language-processing text-analysis text-processing
Last synced: 05 Apr 2025
https://github.com/emilhvitfeldt/r-text-data
List of textual data sources to be used for text mining in R
data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext
Last synced: 18 Jan 2026
https://github.com/EmilHvitfeldt/R-text-data
List of textual data sources to be used for text mining in R
data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext
Last synced: 13 Jul 2025
https://github.com/stanfordnlp/stanza-old
Stanford NLP group's shared Python tools.
natural-language-processing nlp python text-analysis text-processing
Last synced: 02 Aug 2025
https://github.com/dirkhovy/text_analysis_for_social_science
Code for the CUP Elements on text analysis in Python for social scientists
analysis classification-models clustering data-analysis embeddings neural-networks prediction predictive-modeling social-sciences text-analysis text-classification topic-modeling
Last synced: 15 Mar 2025
https://github.com/biolab/orange3-text
🍊 :page_facing_up: Text Mining add-on for Orange3
bag-of-words lemmatization newspapers nltk orange sentiment-analysis stemming stopwords text text-analysis text-mining twitter
Last synced: 14 Aug 2025
https://github.com/eellak/nlpbuddy
A text analysis application for performing common NLP tasks through a web dashboard interface and an API
fasttext gensim natural-language-processing spacy text-analysis text-classification
Last synced: 12 Apr 2025
https://github.com/brucewlee/lftk
[BEA @ ACL 2023] General-purpose tool for linguistic features extraction; Tested on readability assessment, essay scoring, fake news detection, hate speech detection, etc.
bea-workshop feature-extraction handcrafted-features linguistic-features natural-language-processing python readability-scores reading-time spacy text-analysis word-difficulty
Last synced: 12 Apr 2025
https://github.com/johnbumgarner/wordhoard
This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.
antonyms bag-of-words definitions dictionary homophones hypernyms hyponyms lexicon nlp python python3 synonyms text-analysis textual-analysis wordlists wordnet wordnets wordsearch
Last synced: 14 Jan 2026
https://github.com/dondealban/learning-stm
Learning structural topic modeling using the stm R package.
automated-content-analysis machine-learning stm text-analysis topic-modeling
Last synced: 15 Mar 2025
https://github.com/dhowe/ritajs-v2
RiTa: generative language tools
generative-text natural-language rita text-analysis
Last synced: 12 May 2025
https://github.com/jbgruber/lexisnexistools
:newspaper: Working with newspaper data from 'LexisNexis'
r r-package rstats text-analysis
Last synced: 07 Apr 2025
https://github.com/forTEXT/catma
Computer Assisted Text Markup and Analysis
annotations digital-humanities java text-analysis text-markup webapp
Last synced: 15 Apr 2025
https://github.com/andrewtavis/kwx
BERT, LDA, and TFIDF based keyword extraction in Python
bert data-analysis data-science data-visualization keyword-extraction latent-dirichlet-allocation lda machine-learning multilingual natural-language-processing nlp open-source python python3 text-analysis text-classification text-mining tfidf topic-modeling unsupervised-learning
Last synced: 14 Apr 2025
https://github.com/dhowe/rita
Website, documentation and examples for RiTa
generative-text natural-language rita text-analysis text-generation
Last synced: 07 Apr 2025
https://github.com/remram44/taguette
Free and open source qualitative research tool -- MIRROR OF GITLAB REPOSITORY
hacktoberfest highlighting notes qualitative-analysis research-tool tagging tags text-analysis
Last synced: 07 Apr 2025
https://github.com/zaratsian/spark
Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References
machine-learning nlp pyspark spark text-analysis
Last synced: 11 Apr 2025
https://github.com/hasinhayder/javascript-text-expander
Expands texts as you type, naturally
javascript javascript-plugin text-analysis text-processing
Last synced: 02 Jul 2025
https://github.com/kgjerde/corporaexplorer
An R package for dynamic exploration of text collections
corpora corpus r shiny text-analysis
Last synced: 22 Oct 2025
https://github.com/apache/uima-uimaj
Apache UIMA Java SDK
apache java text-analysis uima
Last synced: 08 Apr 2025
https://github.com/juba/rainette
R implementation of the Reinert text clustering method
r text-analysis text-classification
Last synced: 04 Oct 2025
https://github.com/jonclayden/ore
An R interface to the Onigmo regular expression library
r regex regular-expressions text-analysis
Last synced: 13 Apr 2025
https://github.com/koheiw/newsmap
Semi-supervised algorithm for geographical document classification
machine-learning news-stories quanteda text-analysis
Last synced: 08 Sep 2025
https://github.com/koheiw/LSX
Semi-supervised algorithm for document scaling
lsa quanteda sentiment-analysis text-analysis
Last synced: 13 Jul 2025
https://github.com/koheiw/lsx
Semi-supervised algorithm for document scaling
lsa quanteda sentiment-analysis text-analysis
Last synced: 05 Apr 2025
https://github.com/zayedrais/documentsearchengine
Document Search Engine project with TF-IDF abd Google universal sentence encoder model
data-science deep-learning document-search document-similarity juypter machine-learning python python-text-analysis semantic-search semantic-search-engine tensorflow tensorflow-models tensorflow-tutorials text-analysis text-search text-semantic-similarity tfidf tfidf-text-analysis tfidf-vectorizer universal-sentence-encoder
Last synced: 02 May 2025
https://github.com/ropensci/jstor
Import journal data from DfR (JSTOR)
jstor peer-reviewed r r-package rstats text-analysis text-mining
Last synced: 22 Oct 2025
https://github.com/codewithdark-git/darkgpt
DarkGPT Chat Explorer is an interactive web application that allows users to engage in conversations with various GPT (Generative Pre-trained Transformer) models in real-time. This repository contains the source code for the application.
app chatbot database gemini gemini-ai gemini-pro-vision gen-ai google gpt huggingface-transformers image-generation latest python pytorch sqlite3 text-analysis text-classification text-summarization transformer
Last synced: 30 Aug 2025
https://github.com/oneai-nlp/oneai-python
Python SDK for One AI APIs. One AI is an NLP-as-a-service platform. Our APIs enables language comprehension in context, transforming texts from any source into structured data to use in code.
ai api api-rest artificial-intelligence language language-ai natural-language-processing natural-language-understanding nlp oneai python python-library python3 rest-api summarization summary text text-analysis text-classification text-processing
Last synced: 14 Feb 2026
https://github.com/rosette-api/python
Babel Street Analytics Client Library for Python
categorization entity-extraction fuzzy-matching language-detection language-identification lemmatization machine-learning morphology name-generation name-similarity name-translation natural-language-processing nlp python relation-extraction sentiment-analysis text text-analysis text-mining tokenization
Last synced: 04 Apr 2025
https://github.com/bank-of-england/occupationcoder
Given a job title and job description, the algorithm assigns a standard occupational classification (SOC) code to the job.
bankofengland boe economics jobs python soc text-analysis tf-idf vacancies
Last synced: 04 Apr 2026
https://github.com/giacbrd/python-dandelion-eu
A python client for connecting to all the services provided by https://dandelion.eu
api api-client api-wrapper entity-extraction entity-linking language-detection machine-learning python semantic-analysis semantic-similarity sentiment-analysis text-analysis text-classification text-mining text-similarity wikification wikipedia wikipedia-api
Last synced: 17 Mar 2025
https://github.com/mit-lcp/bloatectomy
A python package for removing duplicate text in clinical notes or other documents
fda mimic mimic-iii nlp-resources plagarism plagiarism-evaluation python-3 python3 text-analysis text-mining text-processing
Last synced: 13 Apr 2025
https://github.com/prakharrathi25/text-analytics-tool
This is an application that automates the process of text analysis with a user-friendly GUI. 📱 It has been implemented using Python and deployed with the Streamlit package.
hacktoberfest machine-learning natural-language-processing nlp python sentiment-analysis streamlit-webapp text-analysis text-classification
Last synced: 07 May 2025
https://github.com/leslie-huang/stylest
R package for estimating speaker style distinctiveness in texts. Install it from CRAN!
classification r text-analysis
Last synced: 25 Apr 2025
https://github.com/moment-of-peace/EventForecast
Time series prediction and text analysis using Keras LSTM, plus clustering, association rules mining
association-rules clustering keras lstm text-analysis time-series
Last synced: 11 May 2025
https://github.com/apache/uima-uimafit
Apache UIMA uimaFIT
apache java text-analysis uima
Last synced: 10 Apr 2025
https://github.com/koheiw/workshop-ijta
Rによる日本語テキスト分析入門
japanese-language quanteda r text-analysis
Last synced: 05 Apr 2025
https://github.com/dpalmasan/trunajod2.0
An easy-to-use library to extract indices from texts.
coherence cohesion entity-graph lexical-diversity natural-language-processing readability-metrics semantic-measurements spacy spacy-extensions text-analysis text-mining text-processing ttr type-token-ratio
Last synced: 09 Apr 2025
https://github.com/notesjor/corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
big-data cleaning-data cooccurrence corpus-linguistics corpus-processing data-minig data-mining data-science datajournalism journalism linguistics natural-language-processing natural-language-understanding nlp sdk tagger text-analysis text-mining text-processing visualization
Last synced: 17 Jan 2026
https://github.com/rileynwong/spotify-analysis
Data analysis on my monthly playlists
audio-features data-analysis data-scraping lyrics machine-learning natural-language-processing nlp nlp-machine-learning sentiment-analysis spotify-analysis supervised-learning supervised-machine-learning text text-analysis
Last synced: 12 Apr 2025
https://github.com/twardoch/split-markdown4gpt
A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.
data-preprocessing gpt gpt-3 gpt-35-turbo gpt-35-turbo-16k gpt-4 markdown markdown-processing mistletoe natural-language-processing nlp openai openai-gpt python split-text summarization text-analysis text-processing text-summarization text-tokenization
Last synced: 08 Jul 2025
https://github.com/microsoft/autobrewml
With AutoBrewML Framework the time it takes to get production-ready ML models with great ease and efficiency highly accelerates.
anomaly-detection azure-automl cleansing-data data-science datavisualization machine-learning microsoft nlp-machine-learning responsible-ml sampling-strategies text-analysis text-classification text-summarization
Last synced: 12 Mar 2026
https://github.com/dbklim/russian_subtitles_dataset
Preprocessing of the dataset of 347 subtitles for the TV series (thanks to Taiga Corpus) to build a word2vec model, JamSpell model, neural network training, chat bot training or in any other NLP task.
bot cnn corpus dataset lstm machine-learning ml natural-language-processing nlp nlu rnn russian subtitles text text-analysis text-processing word2vec
Last synced: 29 Apr 2025
https://github.com/lithika-damnod/russ
Get instant answers to your questions about any text with Russ - an AI-powered reading companion that analyzes and summarizes any text you provide and answer questions based on the information in the passage
chatgpt collaborate react saas text-analysis text-summarization ui ux
Last synced: 07 Sep 2025
https://github.com/pablobarbera/big-data-upf
RECSM-UPF Summer School: Social Media and Big Data Research
big-data facebook rstudio scraping-websites social-media social-network-analysis text-analysis twitter
Last synced: 02 Jan 2026
https://github.com/dmytrovoytko/sublimetext-translate
🌐 Translation plugin (multi-engine, fast, flexible) for SublimeText 3 & 4, works without API keys, works in China
hacktoberfest plugin python readability sublime-text text-analysis translation
Last synced: 11 Apr 2025
https://github.com/smkrv/ha-text-ai
Cutting-edge AI solution for Home Assistant. Multi-LLM provider support to transform your smart home experience with intelligent, adaptive automation.
ai anthropic anthropic-claude artificial-intelligence deepseek deepseek-api gpt hacs hacs-custom hacs-integration home-assistant home-assistant-integration homeassistant llm natural-language-processing openai-api openrouter openrouter-api sonnet text-analysis
Last synced: 15 Apr 2025
https://github.com/chainsawriot/rectr
💒 Reproducible Extraction of Cross-lingual Topics using R
Last synced: 29 Oct 2025
https://github.com/quanteda/quanteda.corpora
A collection of corpora for quanteda
Last synced: 05 Apr 2025
https://github.com/knime/knime-textprocessing
KNIME - Text Processing Extension (Labs)
knime nlp-machine-learning text-analysis text-processing workflow
Last synced: 21 Jan 2026
https://github.com/apache/uima-ruta
Apache UIMA Ruta
apache java ruta text-analysis uima
Last synced: 19 Oct 2025
https://github.com/dario-github/notion-nlp
Read the text from a Notion database and perform NLP analysis.
flomo nlp notion notion-api notion-database python text-analysis text-summarization tf-idf
Last synced: 16 Mar 2026
https://github.com/arvindshmicrosoft/yelpdatasetsql
Working with the Yelp Dataset in Azure SQL and SQL Server
azuresql azuresqldb graph machine-learning mssql multi-class-classification r-server sql-server t-sql text-analysis yelp yelp-challenge yelp-dataset yelp-dataset-analysis
Last synced: 15 Apr 2025
https://github.com/nlpie/biomedicus
BioMedICUS: A biomedical and clinical NLP engine.
biomedical-informatics health-informatics natural-language-processing nlp text-analysis
Last synced: 16 Oct 2025
https://github.com/zgornel/stringanalysis.jl
Hard-Forked from JuliaText/TextAnalysis.jl
corpus-processing latent-semantic-analysis random-projections text-analysis text-processing
Last synced: 24 Jul 2025
https://github.com/sandsmark/scp-wiki
Mirror of the scp wiki, approx. 20 million words. If you just want the text for e. g. training some version of GPT download the latest release (half the size without the git history).
dataset scp scp-foundation text-analysis text-generation text-mining text-processing wikidot
Last synced: 04 Jan 2026
https://github.com/gjtorikian/what_you_say
Natural language detection library. Written in Rust, wrapped in Ruby.
text-analysis text-classification
Last synced: 11 Sep 2025
https://github.com/dhchenx/rsnltk
Rust-based Natural Language Toolkit using Python Bindings
human-language natural-language-processing nlp-in-rust rsnltk rust-text-analysis stanza text-analysis
Last synced: 24 Jul 2025
https://github.com/apache/uima-uimacpp
C++ support for Apache UIMA
apache java text-analysis uima
Last synced: 10 Apr 2025
https://github.com/masurii/fbscrapeideas
Modern CLI tool for scraping & analyzing Facebook groups using Playwright & Gemini AI. Features self-healing selectors, session security, and local offline analysis.
academic-research ai cli data-extraction data-mining facebook-scraper gemini-api idea-generation nlp python selenium text-analysis
Last synced: 28 Apr 2026
https://github.com/darkliquid/textstats
Generate information about text including syllable counts and Flesch-Kincaid, Gunning-Fog, Coleman-Liau, SMOG and Automated Readability scores.
automated-readability-scores coleman-liau dale-chall flesch-kincaid go smog syllable-counts text-analysis
Last synced: 25 Jan 2026
https://github.com/koheiw/newspapers
R package to import articles from newspaper databases
Last synced: 05 Apr 2025
https://github.com/chainsawriot/textplex
Calculate textual complexity using the algorithm by Tolochko & Boomgaarden (2019).
Last synced: 31 Aug 2025
https://github.com/raymondjavaxx/swearjar-php
Profanity detection PHP library
php profanity-detection profanity-validator text-analysis
Last synced: 22 Jun 2025
https://github.com/phastmike/tags
A simple text tagger
gnome gnome-app tagger text-analysis vala vala-applications
Last synced: 26 Feb 2026
https://github.com/bradleyboehmke/text-mining-tutorials
Various text analytics tutorials
r text-analysis text-mining tutorial-code tutorials
Last synced: 13 Apr 2025
https://github.com/jonathanraiman/ciseau
:rocket: Tokenize and clean strings in Python
natural-language-processing python text text-analysis tokenizer xml
Last synced: 11 Jul 2025
https://github.com/direct-phonology/dphon
uncover old chinese textual parallels based on sound
chinese-traditional nlp phonology python text-analysis
Last synced: 07 Apr 2025