Projects in Awesome Lists tagged with ngrams
A curated list of projects in awesome lists tagged with ngrams .
https://github.com/bakwc/JamSpell
Modern spell checking library - accurate, fast, multi-language
cpp csharp java ngrams nlp python ruby spellcheck spellchecker spelling-correction
Last synced: 04 May 2025
https://github.com/thepanacealab/covid19_twitter
Covid-19 Twitter dataset for non-commercial research use and pre-processing scripts - under active development
dataset dissemination frequent-terms ngrams retweets tweets tweets-acquired twitter-stream
Last synced: 04 Apr 2025
https://github.com/proycon/colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
c-plus-plus computational-linguistics corpus library linguistics ngram ngrams nlp pattern-recognition python skipgram text-processing
Last synced: 12 Apr 2025
https://github.com/landrok/language-detector
A fast and reliable PHP library for detecting languages
Last synced: 05 Apr 2025
https://github.com/winkjs/wink-nlp-utils
NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
bag-of-words natural-language-processing ngrams nlp phonetize sentence-boundary-detection stem stop-words tokenize
Last synced: 06 Apr 2025
https://github.com/postmodern/raingrams
A flexible and general-purpose ngrams library written in Ruby. Raingrams supports ngram sizes greater than 1, text/non-text grams, multiple parsing styles and open/closed vocabulary models.
Last synced: 15 Jun 2025
https://github.com/orgtre/google-books-ngram-frequency
Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code
google language-learning linguistics ngrams wordlist
Last synced: 12 Apr 2025
https://github.com/kampersanda/tongrams-rs
Rust library providing fast language model queries in compressed space
compression elias-fano language-model ngrams nlp trie
Last synced: 23 Apr 2025
https://github.com/slowikj/seqr
fast and comprehensive k-mer counting package
bioinformatics bioinformatics-tool dna-processing feature-engineering feature-extraction genomics hashing hashing-algorithms k-mer k-mer-counting kmer kmer-counting kmer-frequency-count kmers ngram ngrams protein-sequences rcpp rcppparallel rpackage
Last synced: 15 May 2025
https://github.com/kchapelier/ngram-word-generator
Word generation based on n-gram models, and a cli utility to generate said models.
javascript ngrams procedural-generation text
Last synced: 22 Sep 2025
https://github.com/khaledashrafh/auto-filling-text
This project is an auto-filling text program implemented in Python using N-gram models. The program suggests the next word based on the input given by the user. It utilizes N-gram models, specifically Trigrams and Bigrams, to generate predictions.
auto-complete auto-complete-text auto-filling bigram-model bigrams n-gram n-grams natural-language-processing news-articles ngram ngram-analysis ngram-language-model ngram-model ngrams nlp tkinter tkinter-gui trigram trigram-model trigrams
Last synced: 17 Oct 2025
https://github.com/go-generalize/volcago
Model Generator for Firestore
code-generation firebase firestore firestore-database generator go golang n-grams ngrams
Last synced: 12 Oct 2025
https://github.com/stephangeorg/trigram-similarity
Determining the similarity of alphanumeric text based on trigram matching.
ngrams postgres similarity text-similarity trigram trigrams
Last synced: 01 May 2025
https://github.com/dayyass/language-modeling
Pipeline for training Language Models using PyTorch.
decoding deep-learning gpt-2 language-modeling lstm natural-language-processing ngrams nlp python pytorch rnn sampling text-generation
Last synced: 13 Apr 2025
https://github.com/dariasmyr/fts-engine
A modular full-text search engine in Go with instant indexing, pluggable indexers, and configurable pre-search filters.
fulltext-search fuzzy-search ngram-analysis ngrams stemming trie
Last synced: 01 Apr 2026
https://github.com/saptak625/quizolytics
Quizolytics is a natural language processer to analyze question datasets and extract meaningful insights from them.
natural-language-processing ngrams nlp pmi question-answering quizbowl text-analysis
Last synced: 12 Feb 2026
https://github.com/camel-lab/arafix_ocr
A tool for improving the output of generic Arabic OCR systems using an n-gram based post-correction approach.
Last synced: 25 Jan 2026
https://github.com/cloudkj/ngram-syllables
Syllable counting and detection using an n-gram language model.
clojure language-model lisp ngrams syllable-count
Last synced: 16 Apr 2025
https://github.com/mochi-co/ngrams
A Go n-gram indexer for natural language processing with modular tokenizers and data stores
bigrams go golang language-model natural-language-processing ngram ngrams nlp tokenization trigrams
Last synced: 19 Mar 2025
https://github.com/jacksonllee/nskipgrams
A lightweight Python package to work with ngrams and skipgrams
computational-linguistics language linguistics natural-language-processing ngrams nlp skipgrams
Last synced: 01 Apr 2026
https://github.com/mbforbes/textmetrics
Automatic text metrics (BLEU, ROUGE, METEOR, +++)
bleu meteor metrics ngrams nlp rouge text text-metrics vocab vocabulary
Last synced: 27 Jan 2026
https://github.com/dapper91/schindel
Rust min-shingle hashing implementation
fuzzy-matching fuzzy-search minshingle ngrams rust shingles
Last synced: 09 Aug 2025
https://github.com/onlyphantom/textcomplete
A next word prediction app ala Swiftkey
data-science datascience ngrams r rsqlite shiny shiny-apps sqlite textmining
Last synced: 09 Apr 2025
https://github.com/pharo-ai/NgramModel
Ngram language model implemented in Pharo
language-model natural-language-processing ngram-language-model ngrams pharo statistics
Last synced: 11 May 2025
https://github.com/dineshkarthik/n-gram_processor
Using n-gram get set of words and their frequency of occurrence in given directory / sub-directory/ text file, which are present in a specific order at specific distance from a word.
Last synced: 05 May 2025
https://github.com/mazzzystar/n-grams-novel
An English & Chinese novel generator based on N-Grams.
ngrams nlp novel-generator text-generation
Last synced: 28 Apr 2025
https://github.com/pharo-ai/ngrammodel
Ngram language model implemented in Pharo
language-model natural-language-processing ngram-language-model ngrams pharo statistics
Last synced: 13 Jul 2025
https://github.com/preciz/fast_ngram
A fast and unicode aware letter N-gram library written in Elixir
Last synced: 10 Apr 2025
https://github.com/phughesmcr/simplengrams
The easiest way to get n-grams from strings!
ngram ngrams nlp nlp-parsing nlp-resources text-mining
Last synced: 07 Sep 2025
https://github.com/rafatbiin/gongram
Ngram generator in Go that just works
go go-package golang ngram ngrams nlp
Last synced: 14 Jan 2026
https://github.com/pasaopasen/stem-lem-pipeline
Russian text transformer used several stemming and lemmatization backends
ngrams nlp nltk preparation pypi-package text transformer
Last synced: 15 Feb 2026
https://github.com/linguistic-dev/n-gram-extractor
A PHP Library to extract n-grams from a text. Simple preprocessing tools (cleaning, tokenizing) included.
natural-language-processing ngram ngram-analysis ngrams nlp php php-library php7 tokenization tokenize tokenized-sentences tokenizer
Last synced: 22 Jul 2025
https://github.com/howerj/ngram
Print out a list of ngrams for a file; works on binary data as well as text
c command-line command-line-tool compression language-model library ngram ngrams
Last synced: 19 May 2026
https://github.com/tomeraberbach/wikipedia-ngrams
📚 A Kotlin project which extracts ngram counts from Wikipedia data dumps.
cli extracts-ngram-counts kotlin ngram ngrams nlp wikiextractor wikipedia wikipedia-corpus wikipedia-data-dump wikipedia-dump wikipedia-ngrams
Last synced: 13 Nov 2025
https://github.com/capjamesg/linguist.link
Find the most surprising words and most common n-grams on a web page.
Last synced: 03 Apr 2025
https://github.com/nusretipek/languagefinder
A simple to use language detection package written in Julia using bigarms, trigrams and quadrigrams. 25 default languages with a built-in option to train new ones.
detection julia language languagedetection ngrams nlp
Last synced: 21 Oct 2025
https://github.com/wmentor/qgram
N-gram Go library
go go-library golang golang-library ngram-analysis ngram-extraction ngram-language-model ngram-model ngrams quadrigram
Last synced: 29 May 2026
https://github.com/euskadi31/go-ngram
an n-gram is a contiguous sequence of n items from a given sequence of text or speech.
go golang golang-library machine-learning ngram ngram-analysis ngrams nlp
Last synced: 20 Jun 2025
https://github.com/pprzetacznik/nlp-n-grams
Natural Language Processing - n-grams statistics
Last synced: 19 Sep 2025
https://github.com/pharo-ai/ngram
N-gram functionality for Pharo
language-modeling natural-language-processing ngrams nlp pharo
Last synced: 11 May 2025
https://github.com/dohliam/corpus-tools
A collection of scripts for working with multilingual text corpora
corpora corpus corpus-linguistics frequency language linguistics ngram ngrams ruby salience stoplist stopwords
Last synced: 21 Mar 2025
https://github.com/savaged/recursionfun
Just some fun with recursion
csharp levenshtein-distance ngrams recursion
Last synced: 20 Aug 2025
https://github.com/mmsaki/crypto-nlp
Using Natural language processing to understand the sentiment in the latest news articles featuring Bitcoin and Ethereum.
natural-language-processing newsapi ngrams nltk sentiment-analysis
Last synced: 06 Apr 2025
https://github.com/rjzak/gogrammer
Generates byte ngrams from a collection of files with customisable parameters.
data-science golang malware-analysis ngrams
Last synced: 17 Jul 2025
https://github.com/michael-rapp/textminingutil
Provides various utility classes for use in text mining
distance-measures hamming-distance levenshtein-distance ngrams similarity-measures text-mining tokenizer
Last synced: 04 Oct 2025
https://github.com/cschen1205/cs-nlp-language-identifier
A language identifier built using NLP NGram language model
language-identification language-model ngrams nlp
Last synced: 19 May 2026
https://github.com/robcyberlab/ngram-similarity-engine
🤖Ngram Similarity Engine📚
code-analysis data-filtering data-science database-management feature-extraction jaccard-similarity machine-learning ngrams plagiarism-detection similarity-analysis sqlite
Last synced: 19 May 2026
https://github.com/remram44/ngram-search
Ngram-based indexing of strings into a binary file
full-text-search indexing ngram ngrams search text-search trigram trigrams
Last synced: 28 Jan 2026
https://github.com/saransh-cpp/memetastic-backend
Created for Elasticsearch X Code for Cause Hackathon. This is the backend for my MemeTastic Application. (Winner for the hackathon)
axios backend backend-api cicd elasticsearch express github-actions hackathon kibana ngrams nodejs
Last synced: 12 Apr 2026
https://github.com/pngo1997/n-gram-language-models
Builds N-gram language modes and applies text generation.
bigrams cfd conditional-frequency-distribution greedy-algorithms laplace-smoothing natural-language-processing ngrams nltk nucleus-sampling perplexity python random-sampling text-generation text-preprocessing trigrams unigram
Last synced: 14 May 2026
https://github.com/rlvtick/autocomplete-webapp
An Autocomplete WebApp to predict the next word or phrase by utilizing the MLE and simple N-gram probability model.
mle natural-language-processing ngrams streamlit-webapp
Last synced: 16 Jun 2025
https://github.com/jayvatti/spellchecker
Spell Checker using a Hash Table
cpp edit-distance-algorithm hashtable ipynb jaccard-similarity longest-common-subsequence ngrams spellchecker trie
Last synced: 06 Mar 2026
https://github.com/supersjgk/plagiarism-detector-ngrams-containment
A plagiarism detection model based on Linear Support Vector Machine Learning algorithm that uses Ngrams Containment as Similarity features
data-science machine-learning machine-learning-algorithms ngrams plagiarism-detection python sklearn svm
Last synced: 13 Apr 2026
https://github.com/e-panourgia/text-analytics
Text Analytics
cnn mlp ngrams pytorch rnn transformers
Last synced: 04 May 2026
https://github.com/daedalus/distiller
Model distiller automator
ai bloom-filter huggingface large-language-models model-distillation ngrams openai scikit-learn sqlite tfidf torch transformers unsloth
Last synced: 17 Jul 2025
https://github.com/linggarm/bigram-percentage-counter-from-two-sentences
Bigram Percentage Counter from 2 Sentences (ROUGE Metric : Recall-Oriented Understudy for Gisting Evaluation)
metrics ngram ngrams nlp rouge rouge-metric
Last synced: 24 Jun 2025
https://github.com/textcorpuslabs/vlngramcounter
NGram counter for large datasets
Last synced: 22 Mar 2025
https://github.com/ology/midi-ngram
Find the top repeated note phrases of a MIDI file
Last synced: 30 Mar 2025
https://github.com/iglee/hmms-and-pcfg
POS tagging by using ngram based hidden markov models.
bigrams hmm hmm-viterbi-algorithm ngrams nlp pos trigrams
Last synced: 25 Jul 2025
https://github.com/allanreda/n-grams-analysis-for-internal-linking
N-grams analysis for identifying internal linking opportunities for SEO
internal-linking ngrams nlp python seo web-scraping
Last synced: 08 Nov 2025
https://github.com/haimonmon/nein-eleven
A Retro Tetris game but instead of just blocks, how about a block with words on it, with a new word base tetris game. 🎉 🎉
game game-development ngrams nlp python python3 tetra tetris tetrominoes word
Last synced: 16 Aug 2025
https://github.com/shruthimohan03/basic-sentence-generation-using-ngram
Generating similar sentences given input sentences using n-gram approach
natural-language-processing ngrams sentence-generation
Last synced: 03 Apr 2025
https://github.com/loichyan/noodler
🍜 A port of python-ngram provides fuzzy search using N-gram
ngrams rust search text-processing
Last synced: 06 Mar 2025
https://github.com/shenxiangzhuang/bleuscore
BLEU Score in Rust
bleu bleu-score deep-learning maturin ngrams nlp pyo3 python rust tokenizer
Last synced: 01 Jul 2025
https://github.com/pritamgouda11/modelling-indian-names
Used patterns in Indian names that models could learn, modelling those using n-gram models, then moved to neural n-gram and RNN models.
fnn machine-learning natural-language-processing ngrams rnn smoothing
Last synced: 15 Mar 2025
https://github.com/jamiefutch/basicutils
Utility library for .Net applications
console-tool csharp ngrams utility-library
Last synced: 10 Jun 2025
https://github.com/alwaysvivek/next-word-prediction
🔮 Predicts the next word in a text sequence using either an N-gram statistical model or an LSTM-based neural network.
argparse laplace-smoothing machine-learning neural-network ngrams nlp nltk numpy python3 tensorflow
Last synced: 12 Apr 2026
https://github.com/box1bs/wfts
A multi-threaded web crawler with HTML processing, naive ranking task prioritization, and a scheduler, with an integrated full-text search engine, n-gram support, and typo correction
fulltext-search ngrams scheduler shingles spelling-correction web webscraping
Last synced: 30 May 2026
https://github.com/girishji/ngramview-complete.vim
Next word completion for Vim based on Google Ngrams Viewer.
autocomplete autocomplete-search autocompletion google-ngram-api google-ngram-viewer ngram ngrams vim vim-cmd vim-plugin vim-search vim9-script
Last synced: 10 Jun 2026
https://github.com/linggarm/natural-language-processing
Collection of codes of Natural Language Processing college course
artificial-intelligence bigrams data-science jupyter-notebook machine-learning natural-language-processing ngrams python rouge-metric
Last synced: 21 Apr 2026
https://github.com/girishji/ngram-complete.vim
Next word completion for Vim based on Google Ngrams Viewer.
autocomplete autocomplete-search autocompletion google-ngram-api google-ngram-viewer ngram ngrams vim vim-cmd vim-plugin vim-search vim9-script
Last synced: 07 Jun 2026
https://github.com/daniel-lima-lopez/n-gram-example
Implementation of a BiGram-based language system in Python
ngram ngram-language-model ngrams nlp nlp-machine-learning python
Last synced: 27 Apr 2026
https://github.com/amber-abuah/ngram-text-generation
Text generation for autocomplete using N-Grams and Maximum Likelihood Estimators.
mle ngram-language-model ngrams nlp nltk streamlit
Last synced: 09 May 2026
https://github.com/veralvx/docker-languagetool-cli
LanguageTool client for Docker/Podman with ngrams and fasttext installed by default
docker docker-image dockerfile dockerfiles fasttext languagetool ngram ngrams podman podman-image spellcheck spellchecker spelling text-analysis
Last synced: 11 May 2026