Projects in Awesome Lists tagged with tokenization
A curated list of projects in awesome lists tagged with tokenization .
https://github.com/explosion/spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
ai artificial-intelligence cython data-science deep-learning entity-linking machine-learning named-entity-recognition natural-language-processing neural-network neural-networks nlp nlp-library python spacy text-classification tokenization
Last synced: 11 Nov 2025
https://github.com/explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
ai artificial-intelligence cython data-science deep-learning entity-linking machine-learning named-entity-recognition natural-language-processing neural-network neural-networks nlp nlp-library python spacy text-classification tokenization
Last synced: 15 Mar 2025
https://github.com/nvidia/cosmos-tokenizer
A suite of image and video neural tokenizers
diffusion tokenization transformers
Last synced: 30 Oct 2025
https://github.com/lunasec-io/lunasec
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTrace GitHub App: https://github.com/marketplace/lunatrace-by-lunasec/
compliance continuous-delivery cve-scanning cybersecurity dependency-analysis devsecops gdpr log4shell pci-dss sbom sbom-generator scanning scanning-tool security security-tools soc2 software-composition-analysis tokenization web-security zero-trust
Last synced: 15 May 2025
https://github.com/securitybunker/databunker
Secure Vault for Customer PII/PHI/PCI/KYC Records
anonymization application-server ccpa compliance data-anonymization data-protection database encryption gdpr legaltech passportjs pii piidata privacy privacy-by-design secure-storage security tokenization user-consent vault
Last synced: 14 May 2025
https://github.com/ravenproject/ravencoin
Ravencoin Core integration/staging tree
asset bitcoin blockchain raven ravencoin token tokenization
Last synced: 15 May 2025
https://github.com/RavenProject/Ravencoin
Ravencoin Core integration/staging tree
asset bitcoin blockchain raven ravencoin token tokenization
Last synced: 09 May 2025
https://github.com/VKCOM/YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
bpe natural-language-processing nlp tokenization word-segmentation
Last synced: 03 Apr 2025
https://github.com/vkcom/youtokentome
Unsupervised text tokenizer focused on computational efficiency
bpe natural-language-processing nlp tokenization word-segmentation
Last synced: 27 Sep 2025
https://github.com/explosion/spacy-streamlit
👑 spaCy building blocks and visualizers for Streamlit apps
dependency-parsing machine-learning named-entity-recognition natural-language-processing ner nlp part-of-speech-tagging spacy streamlit text-classification tokenization visualizer visualizers word-vectors
Last synced: 19 Oct 2025
https://github.com/amodinho/datacamp-python-data-science-track
All the slides, accompanying code and exercises all stored in this repo. 🎈
bokeh data-science datacamp datacamp-course datacamp-exercises datacamp-machine-learning datacamp-projects datacamp-python datacamp-solutions-python datascience machinelearning natural-language-processing neural-network neural-networks nlp pandas python scikit-learn tokenization
Last synced: 24 Oct 2025
https://github.com/AmoDinho/datacamp-python-data-science-track
All the slides, accompanying code and exercises all stored in this repo. 🎈
bokeh data-science datacamp datacamp-course datacamp-exercises datacamp-machine-learning datacamp-projects datacamp-python datacamp-solutions-python datascience machinelearning natural-language-processing neural-network neural-networks nlp pandas python scikit-learn tokenization
Last synced: 26 Mar 2025
https://github.com/nlp-uoregon/trankit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
adapters artificial-intelligence deeplearning dependency-parsing language-model lemmatization machine-learning morphological-tagging multilingual natural-language-processing nlp part-of-speech-tagging pytorch sentence-segmentation tokenization universal-dependencies xlm-roberta
Last synced: 14 May 2025
https://github.com/cbaziotis/ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
nlp nlp-library semeval spell-corrector spelling-correction text-processing text-segmentation tokenization tokenizer word-normalization word-segmentation
Last synced: 14 Jan 2026
https://github.com/alasdairforsythe/tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
text-tokenization tokenisation tokenization tokenize tokenizer tokenizing vocabulary vocabulary-builder vocabulary-generator
Last synced: 16 Jan 2026
https://github.com/adobe/NLP-Cube
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
dependency-parser dependency-parsing embeddings information-extraction language-pipeline lemmatization machine-translation nlp-cube parse part-of-speech-tagger sentence-splitting tokenization universal-dependencies
Last synced: 27 Mar 2025
https://github.com/macmade/clangkit
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
c c-plus-plus clang code diagnostics llvm objective-c parsing source static-analysis syntax-highlighting tokenization
Last synced: 07 Apr 2025
https://github.com/macmade/ClangKit
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
c c-plus-plus clang code diagnostics llvm objective-c parsing source static-analysis syntax-highlighting tokenization
Last synced: 15 Mar 2025
https://github.com/daac-tools/vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer
japanese morphological-analysis nlp rust segmentation tokenization tokenizer
Last synced: 15 May 2025
https://github.com/opennmt/tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
bpe cpp icu machine-translation natural-language-processing python sentencepiece tokenization tokenizer unicode
Last synced: 08 Oct 2025
https://github.com/foundationvision/omnitokenizer
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
auto-regressive-model image-generation tokenization vae video-generation vqvae
Last synced: 07 Apr 2025
https://github.com/WorksApplications/sudachi.rs
Sudachi in Rust 🦀 and new generation of SudachiPy
morphological-analysis nlp-libary pos-tagging python rust segmentation sudachi tokenization
Last synced: 04 Apr 2025
https://github.com/natasha/razdel
Rule-based token, sentence segmentation for Russian language
nlp python russian sentence-boundary-detection sentence-segmentation tokenization
Last synced: 04 Apr 2025
https://github.com/CodeChain-io/codechain
CodeChain's official implementation in Rust.
asset blockchain digital-securities rust tokenization
Last synced: 30 Mar 2025
https://github.com/codechain-io/codechain
CodeChain's official implementation in Rust.
asset blockchain digital-securities rust tokenization
Last synced: 06 Apr 2025
https://github.com/daac-tools/vaporetto
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
analyzer japanese morphological-analysis nlp rust segmentation tokenization tokenizer
Last synced: 12 Apr 2025
https://github.com/SaberaTalukder/TOTEM
The official code 👩💻 for - TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis
foundation-models representation-learning time-series time-series-analysis time-series-anomaly-detection time-series-forecasting time-series-foundation-model time-series-imputation tokenization
Last synced: 26 Aug 2025
https://github.com/janlukasschroeder/nlp-cheat-sheet-python
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
cheat-sheet dependency-parsing introduction lemmatization lexnlp machine-learning named-entity-recognition nlp nltk pos-tagging python sentence-similarity spacy spacy-nlp spans starter-kit tokenization
Last synced: 18 Oct 2025
https://github.com/milaan9/python_natural_language_processing
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
bag-of-words inversedocumentfrequency ipython-notebook lemmatization named-entity-recognition nlp partofspeech-tagger python4datascience python4everybody sentence-segmentation stemming stopwords termfrequency tf-idf tokenization tutor-milaan9 vocabulary-matching
Last synced: 09 Apr 2025
https://github.com/milaan9/Python_Natural_Language_Processing
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
bag-of-words inversedocumentfrequency ipython-notebook lemmatization named-entity-recognition nlp partofspeech-tagger python4datascience python4everybody sentence-segmentation stemming stopwords termfrequency tf-idf tokenization tutor-milaan9 vocabulary-matching
Last synced: 28 Aug 2025
https://github.com/agentops-ai/tokencost
Easy token price estimates for LLMs
analytics claude large-language-models llm observability openai price price-tracker token tokenization
Last synced: 04 Apr 2025
https://github.com/AgentOps-AI/tokencost
Easy token price estimates for LLMs
analytics claude large-language-models llm observability openai price price-tracker token tokenization
Last synced: 06 Oct 2025
https://github.com/adbar/simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
corpus-tools language-detection language-identification lemmatiser lemmatization lemmatizer low-resource-nlp morphological-analysis nlp tokenization tokenizer wordlist
Last synced: 24 Dec 2025
https://github.com/gautierdag/bpeasy
Fast bare-bones BPE for modern tokenizer training
Last synced: 06 Apr 2025
https://github.com/cohere-ai/magikarp
Code for the paper "Fishing for Magikarp"
large-language-models tokenization
Last synced: 05 Apr 2025
https://github.com/thudm/icetk
A unified tokenization tool for Images, Chinese and English.
Last synced: 06 Apr 2025
https://github.com/rth/vtext
Simple NLP in Rust with Python bindings
bag-of-words information-retrieval nlp tf-idf tokenization
Last synced: 06 Apr 2025
https://github.com/bminixhofer/zett
Code for Zero-Shot Tokenizer Transfer
language-model llm llms multilingual tokenization transfer-learning
Last synced: 05 Apr 2025
https://github.com/lucidrains/charformer-pytorch
Implementation of the GBST block from the Charformer paper, in Pytorch
artificial-intelligence deep-learning tokenization transformer
Last synced: 20 Aug 2025
https://github.com/mit-ccc/tweebanknlp
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
dependency-parser lemmatization machine-learning named-entity-recognition natural-language-processing ner nlp-toolkit pos-tagging text-annotation tokenization tweet-analysis twitter-nlp
Last synced: 11 May 2025
https://github.com/clipperhouse/uax29
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.
go golang nlp tokenization tokenizer uax29 unicode
Last synced: 01 Feb 2026
https://github.com/dluc/openai-tools
A collection of tools for working with OpenAI
gpt-3 gpt3 openai tokenization tokenizer
Last synced: 19 Apr 2025
https://github.com/googlecloudplatform/dlp-dataflow-deidentification
Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP
beam bigquery data dataflow dlp pii tokenization
Last synced: 11 Apr 2025
https://github.com/nlpcloud/nlpcloud-python
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and more...
ad-generator chatbot code-generation embeddings grammar-correction keyword-extraction language-detection machine-translation ner nlp paraphrasing question-answering semantic-similarity sentiment-analysis spelling-correction text-classification text-generation text-summarization tokenization
Last synced: 28 Jan 2026
https://github.com/ARBML/tkseem
Arabic Tokenization Library. It provides many tokenization algorithms.
arabic-nlp nlp tkseem tokenization
Last synced: 19 Mar 2025
https://github.com/pythainlp/attacut
A Fast and Accurate Neural Thai Word Segmenter
cnn hacktoberfest hactoberfest2022 nlp tokenization
Last synced: 13 Apr 2025
https://github.com/av/klmbr
klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs
inference llm prompts tokenization
Last synced: 23 Aug 2025
https://github.com/liuzl/ling
Natural Language Processing Toolkit in Golang
corenlp lemmatization nlp normalization opencc spacy tokenization
Last synced: 30 Oct 2025
https://github.com/winkjs/wink-tokenizer
Multilingual tokenizer that automatically tags each token with its type
devanagari french german hindi konkani latin marathi multilingual tagging tokenization tokenizer wink
Last synced: 28 Oct 2025
https://github.com/cedricrupb/code_tokenize
Fast tokenization and structural analysis of any programming language
ast code-analysis language parser tokenization
Last synced: 19 Nov 2025
https://github.com/nlpcloud/nlpcloud-js
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and much more...
ad-generator chatbot code-generation conversational-ai embeddings intent-classification keywords-extraction language-detection machine-translation ner nlp paraphrasing question-answering semantic-similarity sentiment-analysis text-classification text-generation text-summarization tokenization
Last synced: 28 Jan 2026
https://github.com/trainingbypackt/natural-language-processing-fundamentals
Use Python and NLTK to build out your own text classifiers and solve common NLP problems
api binary-classifier latent-dirichlet-allocation lda linear-regression markov-chain natural-language-processing nlp pandas python scikit-learn supervised tokenization unsupervised
Last synced: 10 Apr 2025
https://github.com/georg-jung/fastberttokenizer
Fast and memory-efficient library for WordPiece tokenization as it is used by BERT.
ai bert bert-embeddings llm machine-learning natural-language-processing nlp nlp-machine-learning tokenization tokens wordpiece wordpiece-tokenization
Last synced: 04 Apr 2025
https://github.com/cashtokens/cashtokens
A proposal to enable two new primitives on Bitcoin Cash: fungible tokens and non-fungible tokens.
bitcoin bitcoin-cash bitcoin-cash-chip cashtokens cryptocurrency tokenization
Last synced: 04 Apr 2025
https://github.com/zouharvi/tokenization-scorer
Simple-to-use scoring function for arbitrarily tokenized texts.
bpe segmentation subword tokenization
Last synced: 06 Oct 2025
https://github.com/Quillhash/Real-World-Assets-RWA
This repository comprises the theoretical and technical aspects of tokenisation of real world assets.
blockchain smart-contracts tokenization web3
Last synced: 27 Apr 2025
https://github.com/anyks/alm
Smart Language Model
alm arpa cpp language-models tokenization tokenizer vocab-pruning
Last synced: 28 Apr 2025
https://github.com/anki-code/xontrib-output-search
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
cli command-line console python shell terminal tmux tmux-plugin tmux-plugins tokenization tokenizer xonsh xontrib zellij
Last synced: 12 Dec 2025
https://github.com/googlecloudplatform/auto-data-tokenize
Identify and tokenize sensitive data automatically using Cloud DLP and Dataflow
cloud-migration data-governance data-loss-prevention dataflow deidentification tokenization
Last synced: 02 Jul 2025
https://github.com/GoogleCloudPlatform/auto-data-tokenize
Identify and tokenize sensitive data automatically using Cloud DLP and Dataflow
cloud-migration data-governance data-loss-prevention dataflow deidentification tokenization
Last synced: 04 Apr 2025
https://github.com/rosette-api/python
Babel Street Analytics Client Library for Python
categorization entity-extraction fuzzy-matching language-detection language-identification lemmatization machine-learning morphology name-generation name-similarity name-translation natural-language-processing nlp python relation-extraction sentiment-analysis text text-analysis text-mining tokenization
Last synced: 04 Apr 2025
https://github.com/bastienbot/nlp-js-tools-french
POS Tagger, lemmatizer and stemmer for french language in javascript
lemmatization lemmatizer nlp postagging postgresql stemmer stemming tokenization tokenizer
Last synced: 01 Aug 2025
https://github.com/mysto/node-fpe
FPE - Format Preserving Encryption with FF3 in Node-js
anonymization crypto cryptography encryption ff3 format-preserving-encryption fpe nist-recommendation nist-specification node nodejs privacy-tools tokenization
Last synced: 16 Jan 2026
https://github.com/JackHCC/Chinese-Tokenization
利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)】
bert-crf bilstm-crf hmm-viterbi-algorithm ngram nlp tokenization
Last synced: 12 May 2025
https://github.com/thisiscetin/textoken
Simple and customizable text tokenization gem.
Last synced: 09 Jul 2025
https://github.com/thalesgroup/ciphertrust_application_protection
Public code samples and resources for the Thales CipherTrust Application Protection products of the CipherTrust Data Security Platform
Last synced: 16 Jul 2025
https://github.com/aboudjem/erc-3643
ERC-3643 - Raptor Version is a simple, educational look at the T-REX standard. Using Solidity and Web3, this project demystifies tokenized securities. Remember, Raptor is for learning, not production. Dive in for an accessible peek into blockchain finance!
cedefi cefi defi eip-3643 eip3643 erc-3643 erc3643 evm hardhat real-world-asset real-world-assets rwa security-token security-tokens smart-contracts solidity t-rex tokenization
Last synced: 01 Mar 2025
https://github.com/Sovichea/khmer_segmenter
A zero-dependency, high-performance Khmer word segmenter using the Viterbi algorithm. Optimized for dictionary accuracy, ultra-low memory footprint, and edge deployment.
c-language dictionary-based khmer khmer-language khmer-nlp lightweight nlp portable python tokenization viterbi-algorithm word-segmentation zero-dependency zig-build-system
Last synced: 14 Jan 2026
https://github.com/verygoodsecurity/vgs-collect-ios
VGS Collect iOS SDK
collect credit-card ios pci pci-dss security ssn swift team-developer-experience tokenization vgs zerodata
Last synced: 09 Oct 2025
https://github.com/julienkay/com.doji.transformers
A Unity package to run pretrained transformer models with Unity Sentis
ai clip machine-learning sentis tokenization tokenizer transformer-models transformers unity
Last synced: 10 Apr 2025
https://github.com/dnbaker/bioseq
Tokenizers and Machine Learning Models for biological sequence data
biological-sequences machine-learning tokenization transformers
Last synced: 19 Sep 2025
https://github.com/eliben/go-sentencepiece
Go implementation of the SentencePiece tokenizer
encoding go golang language-model llm sentencepiece tokenization
Last synced: 11 Aug 2025
https://github.com/johannschopplich/tokenx
📐 GPT token estimation and context size utilities without a full tokenizer
tiktoken token-counter tokenization tokenizer
Last synced: 01 May 2025
https://github.com/ankane/youtokentome-ruby
High performance unsupervised text tokenization for Ruby
bpe byte-pair-encoding npl tokenization unsupervised-learning word-segmentation
Last synced: 16 Jul 2025
https://github.com/daac-tools/python-vaporetto
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.
analyzer japanese morphological-analysis nlp python rust segmentation tokenization tokenizer
Last synced: 11 Oct 2025
https://github.com/bminixhofer/tokenkit
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
distillation jax llms machine-learning tokenization tokenizer-transfer transfer-learning
Last synced: 15 May 2025
https://github.com/yieldhabitat/yieldhabitat_
Real estate tokenization platform built on multiple blockchains enabling fractional ownership of premium properties through tokenization.
blockchain cross-chain defi ethereum fractional-ownership property-investment real-estate solana tokenization web3
Last synced: 02 Apr 2025
https://github.com/vsce-toolroom/vscode-textmate-languageservice
Language APIs and support features from Textmate tokenization in Visual Studio Code.
grammar language-features syntax textmate tokenization tokenizer visual-studio-code vscode vscode-extension
Last synced: 13 Apr 2025
https://github.com/zsmoore/lexr
Lexical analyzer for Javascript developers
flex lex lexer lexer-generator lexical lexical-analysis lexical-analyzer lexical-parser lexing scan scanner scanning token tokenization tokenizer tokens
Last synced: 16 Jan 2026
https://github.com/flolu/mongo-search
Fuzzy Text Search And Autocompletion With MongoDB And Node.js
autocomplete-search autocompletion fuzzy-search mongo mongodb mongodb-atlas nodejs search search-index text-search tokenization typescript
Last synced: 27 Apr 2025
https://github.com/taurushq-io/private-cmtat-aztec
Private version of CMTAT security token in Noir (Aztec network DSL)
security-token smart-contracts tokenization zero-knowledge
Last synced: 23 Jan 2026
https://github.com/bnosac/tokenizers.bpe
R package for Byte Pair Encoding based on YouTokenToMe
bpe byte-pair-encoding text-mining tokenization
Last synced: 13 Jun 2025
https://github.com/jkrukowski/swift-sentencepiece
Use SentencePiece in Swift for tokenization and detokenization.
Last synced: 11 Oct 2025
https://github.com/bureaucratic-labs/models
Pre-trained models for tokenization, sentence segmentation and so on
conditional-random-fields machine-learning natural-language-processing russian-specific sentence-segmentation tokenization
Last synced: 14 Jan 2026
https://github.com/LoopscaleLabs/rwa-token
The RWA Token Program is a wrapper and extension program for Solana Token Extensions that creates a uniform approach to permissions tokens on SVM blockchains.
real-world-assets solana solana-token tokenization
Last synced: 02 Apr 2025
https://github.com/khaledashrafh/tiny-compiler
This project is a fully functional compiler for the TINY programming language, which is a language that supports basic arithmetic, boolean, and control flow operations. The compiler can scan, parse, and run code written in the TINY language.
compiler cpp parser semantic-analyzer syntax-analyzer tiny tiny-compiler tiny-language tokenization
Last synced: 17 Oct 2025
https://github.com/davzim/rtiktoken
BPE Tokenizer for OpenAI's models
bpe openai r rust tokenization
Last synced: 06 May 2025
https://github.com/eklem/words-n-numbers
Tokenizing strings of text. Regex extracting arrays of words and optionally numbers, emojis, tags, usernames and email addresses from strings. For Node.js and the browser. When you need more than just [a-z] regular expressions.
nlp offline-first regex tokenization tokenizer
Last synced: 02 Sep 2025
https://github.com/jparkerweb/llm-distillery
🍶 llm-distillery ⇢ use LLMs to run map-reduce summarization tasks on large documents until a target token size is met.
ai-text-reduction large-language-model llm openai-api semantic-chunking text-compression text-distillation text-processing text-summarization token-management tokenization
Last synced: 01 May 2025
https://github.com/winkjs/wink-eng-lite-model
English lite language model for wink-nlp.
custom-entity-detection english model named-entity-recognition natural-language-processing negation-handling ner nlp pos-tagging sentence-boundary-detection sentiment-analysis tokenization winkjs winknlp
Last synced: 14 Oct 2025
https://github.com/kemingy/plane
A text processing tool including tag(HTML, URL, Email) extraction and removing, punctuation normalization, simple segmentation, and so on.
chinese-nlp data-cleaning nlp preprocess regex tokenization tokenizer
Last synced: 17 Mar 2025
https://github.com/cqb13/ti-tools
TI Tools is a CLI tool designed for converting 8xp files (used by TI-83 and TI-84 calculators) to text files and vice versa. It also supports various other features for working with 8xp files.
8xp 8xp-files texas-instruments texas-instruments-calculators ti-84 ti-basic ti-calculators tokenization
Last synced: 07 Jan 2026
https://github.com/just-krivi/ethereum-kryptonite-asset-tokenization
Crowdsale Dapp for ERC-20 Kryptonite token (fake stablecoin backed by Kryptonite mineral).
dapp erc-20 ethereum ico initial-coin-offering kyc open-zeppelin reactjs tokenization truffle
Last synced: 25 Sep 2025
https://github.com/rosette-api/java
Rosette API Client Library for Java
entity-extraction entity-linking fuzzy-matching java machine-learning name-translation natural-language-processing nlp rosette text-analytics text-mining tokenization
Last synced: 07 Apr 2025
https://github.com/aboudjem/erc-6960
ERC-6960 - DLT standard (Dual Layer Token) for RWA
dlt dual-layer-token eip-6960 eip6960 erc-6960 erc6960 fractionalization real-world-asset real-world-assets rwa tokenization
Last synced: 01 Mar 2025
https://github.com/andreihar/taibun
Taiwanese Hokkien Transliterator and Tokeniser
hokkien natural-language-processing nlp nlp-library poj python romanisation romanization taigi taiwanese tl tokenisation tokeniser tokenization tokenizer transliteration transliterator zhuyin
Last synced: 21 Mar 2025
https://github.com/labrijisaad/twitter-sentiment-analysis-with-python
I aim in this project to analyze the sentiment of tweets provided from the Sentiment140 dataset by developing a machine learning sentiment analysis model involving the use of classifiers. The performance of these classifiers is then evaluated using accuracy and F1 scores.
accuracy-score bernoulli-naive-bayes confusion-matrix f1-score lemmatization logistic-regression machine-learning nlp roc-auc-curve sentiment-analysis sentiment140-dataset stemming support-vector-machine tokenization twitter-sentiment-analysis
Last synced: 08 Apr 2025