Projects in Awesome Lists tagged with sentence-tokenizer
A curated list of projects in awesome lists tagged with sentence-tokenizer .
https://github.com/nipunsadvilkar/pysbd
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
python rule-based segmentation sentence sentence-boundary-detection sentence-tokenizer
Last synced: 14 May 2025
https://github.com/nipunsadvilkar/pySBD
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
python rule-based segmentation sentence sentence-boundary-detection sentence-tokenizer
Last synced: 12 Apr 2025
https://github.com/neurosnap/sentences
A multilingual command line sentence tokenizer in Golang
cli sentence-tokenizer sentences tokenizer
Last synced: 16 May 2025
https://github.com/vngrs-ai/vnlp
State-of-the-art, lightweight NLP tools for Turkish language. Developed by VNGRS.
deasciifier deep-learning dependency-parsing fasttext morphological-analysis morphological-disambiguation named-entity-recognition nlp normalization number-to-words part-of-speech-tagging sentence-splitting sentence-tokenizer sentiment-analysis spelling-correction stemming stopword-removal turkish-nlp word-embeddings word2vec
Last synced: 10 Feb 2025
https://github.com/megagonlabs/bunkai
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
japanese python sentence-boundary-detection sentence-tokenizer
Last synced: 05 Apr 2025
https://github.com/lfcipriani/punkt-segmenter
Ruby port of the NLTK Punkt sentence segmentation algorithm
nlp-library nltk punkt-segmenter ruby ruby-port rubynlp sentence-boundaries sentence-tokenizer tokenized-sentences
Last synced: 09 Apr 2025
https://github.com/cbilgili/zemberek-nlp-server
Zemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
docker javascript nlp part-of-speech-tagger rest sentence-tokenizer spark turkish turkish-language zemberek
Last synced: 03 May 2025
https://github.com/Flight-School/sentences
A command-line utility that splits natural language text into sentences.
cli macos nlp sentence-tokenizer swift
Last synced: 23 Nov 2024
https://github.com/ikegami-yukino/sengiri
Yet another sentence-level tokenizer for the Japanese text
japanese-language japanese-sentences sentence-tokenizer tokenizer
Last synced: 21 Mar 2025
https://github.com/apdullahyayik/TrTokenizer
🧩 A simple sentence tokenizer.
regular-expression sentence-tokenizer turkish-language turkish-nlp word-segmentation word-tokenizing
Last synced: 10 Feb 2025
https://github.com/bhattbhavesh91/sentence-transformers-example
HuggingFace's Transformer models for sentence / text embedding generation.
huggingface huggingface-transformers huggingface-transformers-pipeline sentence-embeddings sentence-similarity sentence-tokenizer
Last synced: 17 Apr 2025
https://github.com/kmint21/html2sent
HTML2SENT modifies HTML to improve sentences tokenizer quality
nlp nltk python sentence-segmentation sentence-tokenizer text-mining tokenizer
Last synced: 12 May 2025
https://github.com/elifftosunn/textdataclean
Kirli veri çekildiğinde ön işleme adımlarına gerek kalmadan model eğitimi için hazır hale getirmek amacıyla yapılan uygulamadır.
corpus deasciifier morphological-analysis ngram nltk numpy pandas sentence-embedding sentence-tokenizer stemmer stopwords string turkish turkish-sentence-tokenizer word-tokenizer
Last synced: 15 Mar 2025
https://github.com/aburraq/stanfordcorenlp
My legal background gave me a deep appreciation for language's importance. It's not just words; it's a profound understanding woven into every case. This connection led me to coding, where I coded a potent pipeline system with Stanford CoreNLP.
java lemmatizer named-entity-recognition nlp oop partofspeech-tagger sentence-tokenizer sentiment-analysis stanfordnlp tokenizer
Last synced: 27 Feb 2025