An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with text-processing

A curated list of projects in awesome lists tagged with text-processing .

https://github.com/learnbyexample/Command-line-text-processing

:zap: From finding text to search and replace, from sorting to beautifying text and more :art:

awk command-line ebook grep linux perl regex ruby sed text-processing

Last synced: 22 Mar 2025

https://github.com/learnbyexample/command-line-text-processing

:zap: From finding text to search and replace, from sorting to beautifying text and more :art:

awk command-line ebook grep linux perl regex ruby sed text-processing

Last synced: 17 Jan 2025

https://github.com/google/diff-match-patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

diff difference match patch text-processing

Last synced: 25 Jan 2025

https://github.com/pymupdf/pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 22 Apr 2025

https://github.com/pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 08 Apr 2025

https://github.com/chmln/sd

Intuitive find & replace CLI (sed alternative)

cli command-line regex rust terminal text-processing

Last synced: 22 Apr 2025

https://github.com/fastnlp/fastnlp

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

chinese-nlp deep-learning natural-language-processing nlp-library nlp-parsing text-classification text-processing

Last synced: 13 Apr 2025

https://github.com/fastnlp/fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

chinese-nlp deep-learning natural-language-processing nlp-library nlp-parsing text-classification text-processing

Last synced: 07 Apr 2025

https://github.com/chonkie-ai/chonkie

πŸ¦› CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library

ai chunking etl nlp python rag retrieval semantic-segmentation text-chunking text-processing text-splitting vector-search

Last synced: 10 Apr 2025

https://github.com/bhavnicksm/chonkie

πŸ¦› CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library

ai chunking rag retrieval-augmented-generation text-processing

Last synced: 05 Dec 2024

https://github.com/birchb1024/frangipanni

Program to convert lines of text into a tree structure.

go golang text-processing tree-structure

Last synced: 09 Apr 2025

https://github.com/burntsushi/aho-corasick

A fast implementation of Aho-Corasick in Rust.

aho-corasick finite-state-machine search substring-matching text-processing

Last synced: 10 Apr 2025

https://github.com/BurntSushi/aho-corasick

A fast implementation of Aho-Corasick in Rust.

aho-corasick finite-state-machine search substring-matching text-processing

Last synced: 19 Nov 2024

https://github.com/helix-editor/nucleo

A fast and convenient fuzzy matcher library for rust

fuzzy-matching fuzzy-search performance rust text-processing

Last synced: 13 Apr 2025

https://github.com/sstadick/hck

A sharp cut(1) clone.

command-line rust text-processing

Last synced: 07 Apr 2025

https://github.com/cbaziotis/ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

nlp nlp-library semeval spell-corrector spelling-correction text-processing text-segmentation tokenization tokenizer word-normalization word-segmentation

Last synced: 09 Apr 2025

https://github.com/ChenghaoMou/text-dedup

All-in-one text de-duplication

data-processing de-duplication nlp text-processing

Last synced: 03 Apr 2025

https://github.com/derek73/python-nameparser

A simple Python module for parsing human names into their individual components

python python-module text-parser text-processing

Last synced: 27 Nov 2024

https://github.com/abadojack/whatlangGo

Natural language detection library for Go

go language nlp text-processing

Last synced: 12 Mar 2025

https://github.com/abadojack/whatlanggo

Natural language detection library for Go

go language nlp text-processing

Last synced: 14 Mar 2025

https://github.com/wenet-e2e/wetextprocessing

Text Normalization & Inverse Text Normalization

normalization production-ready text-processing

Last synced: 11 Apr 2025

https://github.com/wenet-e2e/WeTextProcessing

Text Normalization & Inverse Text Normalization

normalization production-ready text-processing

Last synced: 28 Nov 2024

https://github.com/proycon/pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

computational-linguistics evaluation-metrics folia language-modelling library linguistics machine-learning natural-language-processing nlp nlp-library python search-algorithms text-processing

Last synced: 09 Apr 2025

https://github.com/andrewbihl/bsed

Simple SQL-like syntax on top of Perl text processing.

awk csv domain-specific-language grep perl python sed text-processing

Last synced: 05 Apr 2025

https://github.com/BurntSushi/regex-automata

A low level regular expression library that uses deterministic finite automata.

automata automaton dfa nfa regex regex-engine regexp rust text-processing

Last synced: 19 Nov 2024

https://github.com/ikegami-yukino/jaconv

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

character-converter japanese-kana japanese-language julius preprocessing pure-python text-processing transliteration

Last synced: 12 Apr 2025

https://github.com/lukaszliniewicz/Pandrator

Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.

audiobook audiobook-creator audiobook-maker audiobooks customtkinterprojects dubbing llm pdf-to-audio rvc silero subtitle-to-speech subtitle-to-voice text-processing text-to-speech tkinter-gui voice-clone voice-cloning voicecraft xtts xttsv2

Last synced: 25 Jan 2025

https://github.com/gagolews/stringi

Fast and portable character string processing in R (with the Unicode ICU)

icu icu4c natural-language-processing nlp r regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode

Last synced: 08 Apr 2025

https://github.com/catatsuy/purl

Streamlining Text Processing

grep-like regexp sed text-processing

Last synced: 04 Apr 2025

https://github.com/himkt/konoha

🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

janome japanese kytea mecab natural-language-processing nlp sentencepiece sudachi text-processing

Last synced: 12 Apr 2025

https://github.com/larrykollar/Unix-Text-Processing

Recreated sources for the book "UNIX Text Processing," published in 1987.

formatting gnu-troff groff publishing text-processing unix utp utp-revival

Last synced: 28 Nov 2024

https://github.com/daac-tools/daachorse

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.

aho-corasick double-array finite-state-machine no-std rust search substring-matching text-processing

Last synced: 14 Apr 2025

https://github.com/textvec/textvec

Text vectorization tool to outperform TFIDF for classification tasks

machine-learning natural-language-processing nlp python text-analysis text-classification text-processing tf-idf

Last synced: 05 Apr 2025

https://github.com/cloudflare/wildcard

Wildcard matching

text-processing wildcard

Last synced: 15 Apr 2025

https://github.com/WZBSocialScienceCenter/tmtoolkit

Text Mining and Topic Modeling Toolkit for Python with parallel processing power

evaluation nlp parallel-processing python socialscience text-processing topic-modeling

Last synced: 13 Nov 2024

https://github.com/learnbyexample/cli_text_processing_coreutils

Example based guide for specialized text processing with GNU Coreutils

command-line coreutils ebook gnu linux text-processing

Last synced: 10 Jan 2025

https://github.com/learnbyexample/learn_ruby_oneliners

Example based guide for text processing with Ruby from the command line

command-line ebooks exercises learn-by-doing one-liners ruby text-processing

Last synced: 19 Dec 2024

https://learnbyexample.github.io/learn_ruby_oneliners/

Example based guide for text processing with Ruby from the command line

command-line ebooks exercises learn-by-doing one-liners ruby text-processing

Last synced: 08 Apr 2025

https://github.com/karolzak/support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

ai artificial-intelligence azure azure-app-service azure-machine-learning azure-web-app-service azure-webapp classification classifier machine-learning ml model numpy pandas python text-analysis text-classification text-mining text-processing web-service

Last synced: 08 Apr 2025

https://github.com/hakatashi/japanese.js

Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.

hiragana japanese javascript katakana romanize text-processing utility

Last synced: 07 Apr 2025

https://github.com/lyeoni/prenlp

Preprocessing Library for Natural Language Processing

natural-language-processing nlp preprocessing-library text-preprocessing text-processing

Last synced: 10 Apr 2025

https://github.com/goplus/bpl

Binary Processing Language

binary-parser bpl go golang language text-processing

Last synced: 12 Nov 2024

https://github.com/microsoft/browsecloud

A web app to create and browse text visualizations for automated customer listening.

bayesian-networks counting-grids nlp text-classification text-processing visualization

Last synced: 22 Nov 2024

https://github.com/zerox-dg/vi-rs

Vietnamese Input Method library

ime input-method text-processing vietnamese-language

Last synced: 12 Apr 2025

https://github.com/alihoseiny/word_cloud_fa

A wrapper for wordcloud module for creating Persian word clouds.

data-visualization python python3 text-processing

Last synced: 20 Nov 2024

https://github.com/stanfordnlp/stanza-old

Stanford NLP group's shared Python tools.

natural-language-processing nlp python text-analysis text-processing

Last synced: 14 Apr 2025

https://github.com/proycon/colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

c-plus-plus computational-linguistics corpus library linguistics ngram ngrams nlp pattern-recognition python skipgram text-processing

Last synced: 12 Apr 2025

https://github.com/01walid/goarabic

A Go Lang package for dealing with Arabic text.

arabic arabic-language glyphs go golang special-characters text-processing

Last synced: 15 Apr 2025

https://github.com/claustromaniac/Compare-UserJS

PowerShell script for comparing user.js (or prefs.js) files.

compare compare-files comparison-tool diff firefox powershell powershell-script text-processing

Last synced: 27 Mar 2025

https://github.com/claustromaniac/compare-userjs

PowerShell script for comparing user.js (or prefs.js) files.

compare compare-files comparison-tool diff firefox powershell powershell-script text-processing

Last synced: 13 Feb 2025

https://github.com/sdleffler/qp-trie-rs

An idiomatic and fast QP-trie implementation in pure Rust.

bytes data-structures qp-trie radix rust text-processing trie

Last synced: 05 Apr 2025

https://github.com/automattic/go-search-replace

πŸš€ Search & replace URLs in WordPress SQL files.

golang text-processing wordpress

Last synced: 05 Apr 2025

https://github.com/kyubyong/mtp

Multi-lingual Text Processing

text-processing

Last synced: 24 Feb 2025

https://github.com/cloudflare/sliceslice-rs

A fast implementation of single-pattern substring search using SIMD acceleration.

avx2 search-in-text simd simd-instructions simd-programming substring-search text-processing

Last synced: 09 Apr 2025

https://github.com/Automattic/go-search-replace

πŸš€ Search & replace URLs in WordPress SQL files.

golang text-processing wordpress

Last synced: 02 Apr 2025

https://github.com/nschneid/unix-text-commands

Unix Text Processing Command Reference

command-line nlp reference text-processing unix

Last synced: 20 Feb 2025

https://github.com/Thomas-George-T/HackerRank-The-Linux-Shell-Challenges-Solutions

Complete Solutions and related tutorials for the Linux Shell - Bash, text processing, Arrays in Bash, Grep Sed Awk Challenges on HackerRank

awk bash challenge cut grep hackerrank hackerrank-solutions head linux linux-shell paste sed shell sort tail text-processing tr tutorial uniq unix

Last synced: 20 Apr 2025

https://github.com/elixir-nx/tokenizers

Elixir bindings for πŸ€— Tokenizers

elixir machine-learning rust text-processing

Last synced: 05 Apr 2025

https://github.com/elektito/finglish

A Finglish to Persian converter.

languages persian text-processing transliteration

Last synced: 20 Nov 2024

https://github.com/sayamalt/fake-reviews-detection

Successfully developed a machine learning model which can predict whether an online review is fraudulent or not. The main idea used to detect the fake nature of reviews is that the review should be computer generated through unfair means. If the review is created manually, then it is considered legal and original.

fake-review-detection machine-learning machine-learning-algorithms natural-language-processing text-processing

Last synced: 12 Apr 2025

https://github.com/AllenDang/PipeIt

PipeIt is a text transformation, conversion, cleansing and extraction tool.

text-mining text-processing

Last synced: 12 Nov 2024

https://github.com/allendang/pipeit

PipeIt is a text transformation, conversion, cleansing and extraction tool.

text-mining text-processing

Last synced: 14 Apr 2025

https://github.com/mycroftai/lingua-franca

Mycroft's multilingual text parsing and formatting library

hacktoberfest library natural-language-processing text-processing

Last synced: 05 Apr 2025

https://github.com/MycroftAI/lingua-franca

Mycroft's multilingual text parsing and formatting library

hacktoberfest library natural-language-processing text-processing

Last synced: 15 Nov 2024

https://github.com/LanguageMachines/frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

computational-linguistics dependency-parser dutch folia lemmatiser morphological-analyser morphology named-entity-recognition natural-language-processing nlp pos-tagger syntax text-processing

Last synced: 27 Mar 2025

https://github.com/languagemachines/frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

computational-linguistics dependency-parser dutch folia lemmatiser morphological-analyser morphology named-entity-recognition natural-language-processing nlp pos-tagger syntax text-processing

Last synced: 09 Apr 2025

https://github.com/learnbyexample/learn_perl_oneliners

Example based guide for text processing with perl from the command line

command-line ebooks exercises learn-by-doing one-liners perl text-processing

Last synced: 13 Nov 2024

https://github.com/rmncldyo/gemini-ai-toolkit

Unlock the potential of Google's Gemini AI models with this versatile toolkit. Offering seamless chat, text generation, and multimodal interactions, supporting various file types, including PDF's, images, videos, audio, text and more. Enjoy real-time responses, customizable parameters, and easy integration for diverse AI tasks.

artificial-intelligence audio-transcribing chatbot conversational-ai gemini gemini-2-0-flash gemini-2-0-flash-exp gemini-advanced gemini-api gemini-flash gemini-pro gemini-pro-api gemini-pro-vision google google-api google-deepmind google-gemini image-analysis text-processing video-processing

Last synced: 09 Apr 2025

https://github.com/whitfin/bytelines

Read input lines as byte slices for high efficiency

algorithms memory-efficiency performance text-processing

Last synced: 09 Apr 2025

https://github.com/learnbyexample/ruby_scripting

examples based tutorial for Ruby scripting

ebook linux ruby scripting text-processing workshop-materials

Last synced: 13 Nov 2024

https://github.com/dbklim/voice_chatbot

Chatbot in russian with speech recognition using PocketSphinx and speech synthesis using RHVoice. The AttentionSeq2Seq model is used. Imlemented using Python3+TensorFlow+Keras.

attention-model bot chatbot flask gensim keras lstm natural-language-processing nlp pocketsphinx restful-api rhvoice russian seq2seq speech-recognition speech-synthesis tensorflow text-processing word2vec

Last synced: 11 Nov 2024

https://github.com/dbklim/Voice_ChatBot

Chatbot in russian with speech recognition using PocketSphinx and speech synthesis using RHVoice. The AttentionSeq2Seq model is used. Imlemented using Python3+TensorFlow+Keras.

attention-model bot chatbot flask gensim keras lstm natural-language-processing nlp pocketsphinx restful-api rhvoice russian seq2seq speech-recognition speech-synthesis tensorflow text-processing word2vec

Last synced: 27 Nov 2024

https://github.com/thomasp85/hr

Easy Access to Uppercase H

rstudio rstudio-addin text-processing

Last synced: 22 Mar 2025

https://github.com/whitfin/s3-utils

Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI

aws aws-s3 command-line text-processing

Last synced: 16 Apr 2025

https://github.com/anaclumos/hangulbreak

πŸ‘¨β€πŸ’» Playing with Hangul ν•œκΈ€

korean python text-processing

Last synced: 05 Dec 2024