Projects in Awesome Lists tagged with text-processing
A curated list of projects in awesome lists tagged with text-processing .
https://github.com/learnbyexample/Command-line-text-processing
:zap: From finding text to search and replace, from sorting to beautifying text and more :art:
awk command-line ebook grep linux perl regex ruby sed text-processing
Last synced: 22 Mar 2025
https://github.com/learnbyexample/command-line-text-processing
:zap: From finding text to search and replace, from sorting to beautifying text and more :art:
awk command-line ebook grep linux perl regex ruby sed text-processing
Last synced: 17 Jan 2025
https://github.com/google/diff-match-patch
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
diff difference match patch text-processing
Last synced: 25 Jan 2025
https://github.com/pymupdf/pymupdf
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps
Last synced: 22 Apr 2025
https://github.com/pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps
Last synced: 08 Apr 2025
https://github.com/chmln/sd
Intuitive find & replace CLI (sed alternative)
cli command-line regex rust terminal text-processing
Last synced: 22 Apr 2025
https://github.com/fastnlp/fastnlp
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
chinese-nlp deep-learning natural-language-processing nlp-library nlp-parsing text-classification text-processing
Last synced: 13 Apr 2025
https://github.com/fastnlp/fastNLP
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
chinese-nlp deep-learning natural-language-processing nlp-library nlp-parsing text-classification text-processing
Last synced: 07 Apr 2025
https://github.com/chonkie-ai/chonkie
π¦ CHONK your texts with Chonkie β¨ - The no-nonsense RAG chunking library
ai chunking etl nlp python rag retrieval semantic-segmentation text-chunking text-processing text-splitting vector-search
Last synced: 10 Apr 2025
https://github.com/pyparsing/pyparsing
Python library for creating PEG parsers
parser-combinators parsing parsing-expression-grammar parsing-library peg-parsers python python-2 python-3 python2 python3 text-processing
Last synced: 22 Apr 2025
https://github.com/kk7nc/text_classification
Text Classification Algorithms: A Survey
boosting-algorithms conditional-random-fields convolutional-neural-networks decision-trees deep-belief-network deep-learning deep-neural-network dimensionality-reduction document-classification hierarchical-attention-networks k-nearest-neighbours logistic-regression naive-bayes-classifier nlp-machine-learning random-forest recurrent-neural-networks rocchio-algorithm support-vector-machines text-classification text-processing
Last synced: 11 Apr 2025
https://github.com/kk7nc/Text_Classification
Text Classification Algorithms: A Survey
boosting-algorithms conditional-random-fields convolutional-neural-networks decision-trees deep-belief-network deep-learning deep-neural-network dimensionality-reduction document-classification hierarchical-attention-networks k-nearest-neighbours logistic-regression naive-bayes-classifier nlp-machine-learning random-forest recurrent-neural-networks rocchio-algorithm support-vector-machines text-classification text-processing
Last synced: 07 Apr 2025
https://github.com/bhavnicksm/chonkie
π¦ CHONK your texts with Chonkie β¨ - The no-nonsense RAG chunking library
ai chunking rag retrieval-augmented-generation text-processing
Last synced: 05 Dec 2024
https://github.com/pemistahl/lingua-go
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
go golang-library language-classification language-detection language-identification language-modeling language-processing language-recognition natural-language-processing nlp nlp-machine-learning text-processing
Last synced: 11 Apr 2025
https://github.com/birchb1024/frangipanni
Program to convert lines of text into a tree structure.
go golang text-processing tree-structure
Last synced: 09 Apr 2025
https://github.com/roshan-research/hazm
Persian NLP Toolkit
dependency-parser embeddings farsi lemmatization natural-language-processing nlp normalization persian persian-nlp pos-tagging python text-processing tokenizer
Last synced: 17 Nov 2024
https://github.com/burntsushi/aho-corasick
A fast implementation of Aho-Corasick in Rust.
aho-corasick finite-state-machine search substring-matching text-processing
Last synced: 10 Apr 2025
https://github.com/PyThaiNLP/pythainlp
Thai natural language processing in Python
computational-linguistics hacktoberfest natural-language-processing nlp-library python soundex text-processing thai thai-language thai-nlp thai-nlp-library thai-soundex word-segmentation
Last synced: 23 Apr 2025
https://github.com/BurntSushi/aho-corasick
A fast implementation of Aho-Corasick in Rust.
aho-corasick finite-state-machine search substring-matching text-processing
Last synced: 19 Nov 2024
https://github.com/pythainlp/pythainlp
Thai natural language processing in Python
computational-linguistics hacktoberfest natural-language-processing nlp-library python soundex text-processing thai thai-language thai-nlp thai-nlp-library thai-soundex word-segmentation
Last synced: 25 Apr 2025
https://github.com/wannaphongcom/pythainlp
Thai natural language processing in Python
computational-linguistics hacktoberfest natural-language-processing nlp-library python soundex text-processing thai thai-language thai-nlp thai-nlp-library thai-soundex word-segmentation
Last synced: 10 Mar 2025
https://github.com/helix-editor/nucleo
A fast and convenient fuzzy matcher library for rust
fuzzy-matching fuzzy-search performance rust text-processing
Last synced: 13 Apr 2025
https://github.com/sstadick/hck
A sharp cut(1) clone.
command-line rust text-processing
Last synced: 07 Apr 2025
https://github.com/cbaziotis/ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
nlp nlp-library semeval spell-corrector spelling-correction text-processing text-segmentation tokenization tokenizer word-normalization word-segmentation
Last synced: 09 Apr 2025
https://github.com/ChenghaoMou/text-dedup
All-in-one text de-duplication
data-processing de-duplication nlp text-processing
Last synced: 03 Apr 2025
https://github.com/derek73/python-nameparser
A simple Python module for parsing human names into their individual components
python python-module text-parser text-processing
Last synced: 27 Nov 2024
https://github.com/abadojack/whatlangGo
Natural language detection library for Go
go language nlp text-processing
Last synced: 12 Mar 2025
https://github.com/abadojack/whatlanggo
Natural language detection library for Go
go language nlp text-processing
Last synced: 14 Mar 2025
https://github.com/open-korean-text/open-korean-text
Open Korean Text Processor - An Open-source Korean Text Processor
korean korean-text-processing korean-tokenizer natural-language-processing text-processing tokenizer
Last synced: 01 Apr 2025
https://github.com/wenet-e2e/wetextprocessing
Text Normalization & Inverse Text Normalization
normalization production-ready text-processing
Last synced: 11 Apr 2025
https://github.com/wenet-e2e/WeTextProcessing
Text Normalization & Inverse Text Normalization
normalization production-ready text-processing
Last synced: 28 Nov 2024
https://github.com/proycon/pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
computational-linguistics evaluation-metrics folia language-modelling library linguistics machine-learning natural-language-processing nlp nlp-library python search-algorithms text-processing
Last synced: 09 Apr 2025
https://github.com/linuxscout/pyarabic
pyarabic
arabic-language nlp-library text-processing
Last synced: 15 Apr 2025
https://github.com/andrewbihl/bsed
Simple SQL-like syntax on top of Perl text processing.
awk csv domain-specific-language grep perl python sed text-processing
Last synced: 05 Apr 2025
https://github.com/airbnb/artificial-adversary
π£οΈ Tool to generate adversarial text examples and test machine learning models against them
adversarial-examples black-box-attacks black-box-benchmarking classification data-mining data-science machine-learning metrics python python2 python3 spam spam-classification spam-detection spam-filtering text text-analysis text-classification text-mining text-processing
Last synced: 04 Apr 2025
https://github.com/BurntSushi/regex-automata
A low level regular expression library that uses deterministic finite automata.
automata automaton dfa nfa regex regex-engine regexp rust text-processing
Last synced: 19 Nov 2024
https://github.com/ikegami-yukino/jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
character-converter japanese-kana japanese-language julius preprocessing pure-python text-processing transliteration
Last synced: 12 Apr 2025
https://github.com/lukaszliniewicz/Pandrator
Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.
audiobook audiobook-creator audiobook-maker audiobooks customtkinterprojects dubbing llm pdf-to-audio rvc silero subtitle-to-speech subtitle-to-voice text-processing text-to-speech tkinter-gui voice-clone voice-cloning voicecraft xtts xttsv2
Last synced: 25 Jan 2025
https://github.com/gagolews/stringi
Fast and portable character string processing in R (with the Unicode ICU)
icu icu4c natural-language-processing nlp r regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode
Last synced: 08 Apr 2025
https://github.com/textpipe/textpipe
Textpipe: clean and extract metadata from text
language-identification named-entities named-entity-recognition nlp text-analysis text-processing
Last synced: 07 Apr 2025
https://github.com/open-i18n/rust-unic
UNIC: Unicode and Internationalization Crates for Rust
cldr crates internationalization locale-data rust text-processing unic unicode unicode-algorithms unicode-characters
Last synced: 07 Apr 2025
https://github.com/catatsuy/purl
Streamlining Text Processing
grep-like regexp sed text-processing
Last synced: 04 Apr 2025
https://github.com/himkt/konoha
πΏ An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
janome japanese kytea mecab natural-language-processing nlp sentencepiece sudachi text-processing
Last synced: 12 Apr 2025
https://github.com/larrykollar/Unix-Text-Processing
Recreated sources for the book "UNIX Text Processing," published in 1987.
formatting gnu-troff groff publishing text-processing unix utp utp-revival
Last synced: 28 Nov 2024
https://github.com/daac-tools/daachorse
π A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.
aho-corasick double-array finite-state-machine no-std rust search substring-matching text-processing
Last synced: 14 Apr 2025
https://github.com/aappleby/matcheroni
A minimalist single-header library for building pattern-matchers, lexers, and parsers.
c cplusplus-20 lexer lexing parser parsing parsing-expression-grammar parsing-expression-grammars pattern-matching regex regular-expression regular-expression-engine regular-expressions text-processing
Last synced: 24 Apr 2025
https://github.com/textvec/textvec
Text vectorization tool to outperform TFIDF for classification tasks
machine-learning natural-language-processing nlp python text-analysis text-classification text-processing tf-idf
Last synced: 05 Apr 2025
https://github.com/WZBSocialScienceCenter/tmtoolkit
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
evaluation nlp parallel-processing python socialscience text-processing topic-modeling
Last synced: 13 Nov 2024
https://github.com/learnbyexample/cli_text_processing_coreutils
Example based guide for specialized text processing with GNU Coreutils
command-line coreutils ebook gnu linux text-processing
Last synced: 10 Jan 2025
https://github.com/s3nh/text-detector
Tool which allow you to detect and translate text.
craft crnn deep-learning nlp ocr-recognition pytorch recognition scene-text-detection scene-text-detectors text text-processing text-recognition
Last synced: 02 Apr 2025
https://github.com/learnbyexample/learn_ruby_oneliners
Example based guide for text processing with Ruby from the command line
command-line ebooks exercises learn-by-doing one-liners ruby text-processing
Last synced: 19 Dec 2024
https://learnbyexample.github.io/learn_ruby_oneliners/
Example based guide for text processing with Ruby from the command line
command-line ebooks exercises learn-by-doing one-liners ruby text-processing
Last synced: 08 Apr 2025
https://github.com/karolzak/support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
ai artificial-intelligence azure azure-app-service azure-machine-learning azure-web-app-service azure-webapp classification classifier machine-learning ml model numpy pandas python text-analysis text-classification text-mining text-processing web-service
Last synced: 08 Apr 2025
https://github.com/hakatashi/japanese.js
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
hiragana japanese javascript katakana romanize text-processing utility
Last synced: 07 Apr 2025
https://github.com/mycroftai/padatious
A neural network intent parser
intent intent-classification language-detection language-processing text-analysis text-processing
Last synced: 05 Apr 2025
https://github.com/lyeoni/prenlp
Preprocessing Library for Natural Language Processing
natural-language-processing nlp preprocessing-library text-preprocessing text-processing
Last synced: 10 Apr 2025
https://github.com/assafmo/xioc
Extract indicators of compromise from text, including "escaped" ones.
command-line command-line-tool data-mining defang escaping extract extraction indicators-of-compromise ioc iocs regex regexp text-mining text-processing
Last synced: 26 Mar 2025
https://github.com/goplus/bpl
Binary Processing Language
binary-parser bpl go golang language text-processing
Last synced: 12 Nov 2024
https://github.com/microsoft/browsecloud
A web app to create and browse text visualizations for automated customer listening.
bayesian-networks counting-grids nlp text-classification text-processing visualization
Last synced: 22 Nov 2024
https://github.com/zerox-dg/vi-rs
Vietnamese Input Method library
ime input-method text-processing vietnamese-language
Last synced: 12 Apr 2025
https://github.com/alihoseiny/word_cloud_fa
A wrapper for wordcloud module for creating Persian word clouds.
data-visualization python python3 text-processing
Last synced: 20 Nov 2024
https://github.com/stanfordnlp/stanza-old
Stanford NLP group's shared Python tools.
natural-language-processing nlp python text-analysis text-processing
Last synced: 14 Apr 2025
https://github.com/proycon/colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
c-plus-plus computational-linguistics corpus library linguistics ngram ngrams nlp pattern-recognition python skipgram text-processing
Last synced: 12 Apr 2025
https://github.com/brothersincode/virastar
Cleaning-up Persian Texts!
farsi javascript persian persian-language spelling-correction text-processing virastar
Last synced: 17 Dec 2024
https://github.com/milescranmer/vim-stream
vims - use vim like sed
awk ex regex sed stdin text-processing unix-command unix-pipes vim vim-stream vims
Last synced: 14 Apr 2025
https://github.com/01walid/goarabic
A Go Lang package for dealing with Arabic text.
arabic arabic-language glyphs go golang special-characters text-processing
Last synced: 15 Apr 2025
https://github.com/claustromaniac/Compare-UserJS
PowerShell script for comparing user.js (or prefs.js) files.
compare compare-files comparison-tool diff firefox powershell powershell-script text-processing
Last synced: 27 Mar 2025
https://github.com/claustromaniac/compare-userjs
PowerShell script for comparing user.js (or prefs.js) files.
compare compare-files comparison-tool diff firefox powershell powershell-script text-processing
Last synced: 13 Feb 2025
https://github.com/sdleffler/qp-trie-rs
An idiomatic and fast QP-trie implementation in pure Rust.
bytes data-structures qp-trie radix rust text-processing trie
Last synced: 05 Apr 2025
https://github.com/automattic/go-search-replace
π Search & replace URLs in WordPress SQL files.
golang text-processing wordpress
Last synced: 05 Apr 2025
https://github.com/cloudflare/sliceslice-rs
A fast implementation of single-pattern substring search using SIMD acceleration.
avx2 search-in-text simd simd-instructions simd-programming substring-search text-processing
Last synced: 09 Apr 2025
https://github.com/Automattic/go-search-replace
π Search & replace URLs in WordPress SQL files.
golang text-processing wordpress
Last synced: 02 Apr 2025
https://github.com/nschneid/unix-text-commands
Unix Text Processing Command Reference
command-line nlp reference text-processing unix
Last synced: 20 Feb 2025
https://github.com/Thomas-George-T/HackerRank-The-Linux-Shell-Challenges-Solutions
Complete Solutions and related tutorials for the Linux Shell - Bash, text processing, Arrays in Bash, Grep Sed Awk Challenges on HackerRank
awk bash challenge cut grep hackerrank hackerrank-solutions head linux linux-shell paste sed shell sort tail text-processing tr tutorial uniq unix
Last synced: 20 Apr 2025
https://github.com/elixir-nx/tokenizers
Elixir bindings for π€ Tokenizers
elixir machine-learning rust text-processing
Last synced: 05 Apr 2025
https://github.com/elektito/finglish
A Finglish to Persian converter.
languages persian text-processing transliteration
Last synced: 20 Nov 2024
https://github.com/n3mo/data-science
Data science tooling for Racket
data-science racket sentiment-analysis statistics text-processing
Last synced: 18 Nov 2024
https://github.com/sayamalt/fake-reviews-detection
Successfully developed a machine learning model which can predict whether an online review is fraudulent or not. The main idea used to detect the fake nature of reviews is that the review should be computer generated through unfair means. If the review is created manually, then it is considered legal and original.
fake-review-detection machine-learning machine-learning-algorithms natural-language-processing text-processing
Last synced: 12 Apr 2025
https://github.com/AllenDang/PipeIt
PipeIt is a text transformation, conversion, cleansing and extraction tool.
Last synced: 12 Nov 2024
https://github.com/allendang/pipeit
PipeIt is a text transformation, conversion, cleansing and extraction tool.
Last synced: 14 Apr 2025
https://github.com/mycroftai/lingua-franca
Mycroft's multilingual text parsing and formatting library
hacktoberfest library natural-language-processing text-processing
Last synced: 05 Apr 2025
https://github.com/MycroftAI/lingua-franca
Mycroft's multilingual text parsing and formatting library
hacktoberfest library natural-language-processing text-processing
Last synced: 15 Nov 2024
https://github.com/LanguageMachines/frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
computational-linguistics dependency-parser dutch folia lemmatiser morphological-analyser morphology named-entity-recognition natural-language-processing nlp pos-tagger syntax text-processing
Last synced: 27 Mar 2025
https://github.com/languagemachines/frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
computational-linguistics dependency-parser dutch folia lemmatiser morphological-analyser morphology named-entity-recognition natural-language-processing nlp pos-tagger syntax text-processing
Last synced: 09 Apr 2025
https://github.com/learnbyexample/learn_perl_oneliners
Example based guide for text processing with perl from the command line
command-line ebooks exercises learn-by-doing one-liners perl text-processing
Last synced: 13 Nov 2024
https://github.com/alirezatheh/perke
A keyphrase extractor for Persian
data-mining data-processing information-retrieval keyphrase keyphrase-extraction keyphrase-extractor keyword keyword-extraction keyword-extractor machine-learning ml natural-language-processing nlp persian persian-language python text-mining text-processing unsupervised-learning
Last synced: 19 Dec 2024
https://github.com/rmncldyo/gemini-ai-toolkit
Unlock the potential of Google's Gemini AI models with this versatile toolkit. Offering seamless chat, text generation, and multimodal interactions, supporting various file types, including PDF's, images, videos, audio, text and more. Enjoy real-time responses, customizable parameters, and easy integration for diverse AI tasks.
artificial-intelligence audio-transcribing chatbot conversational-ai gemini gemini-2-0-flash gemini-2-0-flash-exp gemini-advanced gemini-api gemini-flash gemini-pro gemini-pro-api gemini-pro-vision google google-api google-deepmind google-gemini image-analysis text-processing video-processing
Last synced: 09 Apr 2025
https://github.com/AlirezaTheH/perke
A keyphrase extractor for Persian
data-mining data-processing information-retrieval keyphrase keyphrase-extraction keyphrase-extractor keyword keyword-extraction keyword-extractor machine-learning ml natural-language-processing nlp persian persian-language python text-mining text-processing unsupervised-learning
Last synced: 20 Nov 2024
https://github.com/hasinhayder/javascript-text-expander
Expands texts as you type, naturally
javascript javascript-plugin text-analysis text-processing
Last synced: 18 Apr 2025
https://github.com/whitfin/bytelines
Read input lines as byte slices for high efficiency
algorithms memory-efficiency performance text-processing
Last synced: 09 Apr 2025
https://github.com/gatenlp/python-gatenlp
Python text processing, pattern matching, and NLP framework
annotations gatenlp language-engineering natural-language-processing nlp pattern-matching python python-gatenlp python3 text-processing
Last synced: 19 Dec 2024
https://github.com/learnbyexample/ruby_scripting
examples based tutorial for Ruby scripting
ebook linux ruby scripting text-processing workshop-materials
Last synced: 13 Nov 2024
https://github.com/dbklim/voice_chatbot
Chatbot in russian with speech recognition using PocketSphinx and speech synthesis using RHVoice. The AttentionSeq2Seq model is used. Imlemented using Python3+TensorFlow+Keras.
attention-model bot chatbot flask gensim keras lstm natural-language-processing nlp pocketsphinx restful-api rhvoice russian seq2seq speech-recognition speech-synthesis tensorflow text-processing word2vec
Last synced: 11 Nov 2024
https://github.com/dbklim/Voice_ChatBot
Chatbot in russian with speech recognition using PocketSphinx and speech synthesis using RHVoice. The AttentionSeq2Seq model is used. Imlemented using Python3+TensorFlow+Keras.
attention-model bot chatbot flask gensim keras lstm natural-language-processing nlp pocketsphinx restful-api rhvoice russian seq2seq speech-recognition speech-synthesis tensorflow text-processing word2vec
Last synced: 27 Nov 2024
https://github.com/voidful/tfkit
π€π handling multiple nlp task in one pipeline
multi-label-classification multi-task nlp tagger tagging text-classification text-generation text-processing transformer-models transformers
Last synced: 31 Dec 2024
https://github.com/thomasp85/hr
Easy Access to Uppercase H
rstudio rstudio-addin text-processing
Last synced: 22 Mar 2025
https://github.com/whitfin/s3-utils
Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI
aws aws-s3 command-line text-processing
Last synced: 16 Apr 2025
https://github.com/anaclumos/hangulbreak
π¨βπ» Playing with Hangul νκΈ
Last synced: 05 Dec 2024