{"id":255,"url":"https://github.com/arbox/nlp-with-ruby","last_synced_at":"2025-09-27T09:31:57.596Z","repository":{"id":50683260,"uuid":"53977944","full_name":"arbox/nlp-with-ruby","owner":"arbox","description":"Curated List: Practical Natural Language Processing done in Ruby","archived":false,"fork":false,"pushed_at":"2023-06-27T09:38:04.000Z","size":1254,"stargazers_count":1035,"open_issues_count":5,"forks_count":70,"subscribers_count":59,"default_branch":"master","last_synced_at":"2024-05-20T04:38:32.980Z","etag":null,"topics":["awesome","awesome-list","computational-linguistics","list","machine-learning","natural-language-processing","nlp","pos-tag","ruby","rubyml","rubynlp","sentiment-analysis"],"latest_commit_sha":null,"homepage":"http://rubynlp.org","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arbox.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":"contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-03-15T20:56:12.000Z","updated_at":"2024-05-11T09:23:58.000Z","dependencies_parsed_at":"2024-01-11T13:19:56.920Z","dependency_job_id":null,"html_url":"https://github.com/arbox/nlp-with-ruby","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Fnlp-with-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Fnlp-with-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Fnlp-with-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Fnlp-with-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arbox","download_url":"https://codeload.github.com/arbox/nlp-with-ruby/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219871582,"owners_count":16554424,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awesome","awesome-list","computational-linguistics","list","machine-learning","natural-language-processing","nlp","pos-tag","ruby","rubyml","rubynlp","sentiment-analysis"],"created_at":"2024-01-05T20:12:50.254Z","updated_at":"2025-09-27T09:31:57.460Z","avatar_url":"https://github.com/arbox.png","language":"Ruby","readme":"\u003cimg src=\"header.png\" align=\"center\"\u003e\n\n[![Awesome](https://awesome.re/badge-flat.svg)](https://github.com/sindresorhus/awesome#readme) [![Support Me](https://img.shields.io/badge/%F0%9F%92%97-Support%20Me-blue.svg?style=flat-square)](https://www.patreon.com/arbox)\n\n[[RubyML](https://github.com/arbox/machine-learning-with-ruby) |\n [RubyDataScience](https://github.com/arbox/data-science-with-ruby) |\n [RubyInterop](https://github.com/arbox/ruby-interoperability)]\n\n\n# Awesome NLP with Ruby [\u003cimg src=\"ruby.jpg\" align=\"left\" width=\"30px\" height=\"30px\" /\u003e][ruby]\n\n\u003e Useful resources for text processing in Ruby\n\nThis curated list comprises [_awesome_](https://github.com/sindresorhus/awesome/blob/master/awesome.md)\nresources, libraries, information sources about computational processing of texts\nin human languages with the [Ruby programming language](ruby).\nThat field is often referred to as\n[NLP](https://en.wikipedia.org/wiki/Natural_language_processing),\n[Computational Linguistics](https://en.wikipedia.org/wiki/Computational_linguistics),\n[HLT](https://en.wikipedia.org/wiki/Language_technology) (Human Language Technology)\nand can be brought in conjunction with\n[Artificial Intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence),\n[Machine Learning](https://en.wikipedia.org/wiki/Machine_learning),\n[Information Retrieval](https://en.wikipedia.org/wiki/Information_retrieval),\n[Text Mining](https://en.wikipedia.org/wiki/Text_mining),\n[Knowledge Extraction](https://en.wikipedia.org/wiki/Knowledge_extraction)\nand other related disciplines.\n\nThis list comes from our day to day work on Language Models and NLP Tools.\nRead [why](motivation.md) this list is awesome. Our [FAQ](FAQ.md) describes the\nimportant decisions and useful answers you may be interested in.\n\n:sparkles: Every [contribution](#contributing) is welcome! Add links through pull\nrequests or create an issue to start a discussion.\n\nFollow us on [Twitter](https://twitter.com/NonWebRuby)\nand please spread the word using the `#RubyNLP` hash tag!\n\n\u003c!-- nodoc --\u003e\n## Contents\n\n\u003c!-- toc --\u003e\n\n- [:sparkles: Tutorials](#sparkles-tutorials)\n- [NLP Pipeline Subtasks](#nlp-pipeline-subtasks)\n  * [Pipeline Generation](#pipeline-generation)\n  * [Multipurpose Engines](#multipurpose-engines)\n    + [On-line APIs](#on-line-apis)\n  * [Language Identification](#language-identification)\n  * [Segmentation](#segmentation)\n  * [Lexical Processing](#lexical-processing)\n    + [Stemming](#stemming)\n    + [Lemmatization](#lemmatization)\n    + [Lexical Statistics: Counting Types and Tokens](#lexical-statistics-counting-types-and-tokens)\n    + [Filtering Stop Words](#filtering-stop-words)\n  * [Phrasal Level Processing](#phrasal-level-processing)\n  * [Syntactic Processing](#syntactic-processing)\n    + [Constituency Parsing](#constituency-parsing)\n  * [Semantic Analysis](#semantic-analysis)\n  * [Pragmatical Analysis](#pragmatical-analysis)\n- [High Level Tasks](#high-level-tasks)\n  * [Spelling and Error Correction](#spelling-and-error-correction)\n  * [Text Alignment](#text-alignment)\n  * [Machine Translation](#machine-translation)\n  * [Sentiment Analysis](#sentiment-analysis)\n  * [Numbers, Dates, and Time Parsing](#numbers-dates-and-time-parsing)\n  * [Named Entity Recognition](#named-entity-recognition)\n  * [Text-to-Speech-to-Text](#text-to-speech-to-text)\n- [Dialog Agents, Assistants, and Chatbots](#dialog-agents-assistants-and-chatbots)\n- [Linguistic Resources](#linguistic-resources)\n- [Machine Learning Libraries](#machine-learning-libraries)\n- [Data Visualization](#data-visualization)\n- [Optical Character Recognition](#optical-character-recognition)\n- [Text Extraction](#text-extraction)\n- [Full Text Search, Information Retrieval, Indexing](#full-text-search-information-retrieval-indexing)\n- [Language Aware String Manipulation](#language-aware-string-manipulation)\n- [Articles, Posts, Talks, and Presentations](#articles-posts-talks-and-presentations)\n- [Projects and Code Examples](#projects-and-code-examples)\n- [Books](#books)\n- [Community](#community)\n- [Needs your Help!](#needs-your-help)\n- [Related Resources](#related-resources)\n- [License](#license)\n\n\u003c!-- tocstop --\u003e\n\n\u003c!-- doc --\u003e\n\n## :sparkles: Tutorials\n\nPlease help us to fill out this section! :smiley:\n\n## NLP Pipeline Subtasks\n\nAn NLP Pipeline starts with a plain text.\n\n### Pipeline Generation\n\n- [composable_operations](https://github.com/t6d/composable_operations) -\n  Definition framework for operation pipelines.\n- [ruby-spark](https://github.com/ondra-m/ruby-spark) -\n  Spark bindings with an easy to understand DSL.\n- [phobos](https://github.com/phobos/phobos) -\n  Simplified Ruby Client for [Apache Kafka](https://kafka.apache.org/).\n- [parallel](https://github.com/grosser/parallel) -\n  Supervisor for parallel execution on multiple CPUs or in many threads.\n- [pwrake](https://github.com/masa16/pwrake) -\n  Rake extensions to run local and remote tasks in parallel.\n\n### Multipurpose Engines\n\n- [open-nlp](https://github.com/louismullie/open-nlp) -\n  Ruby Bindings for the [OpenNLP](https://opennlp.apache.org/) Toolkit.\n- [stanford-core-nlp](https://github.com/louismullie/stanford-core-nlp) -\n  Ruby Bindings for the Stanford [CoreNLP](https://github.com/stanfordnlp/CoreNLP) tools.\n- [treat](https://github.com/louismullie/treat) -\n  Natural Language Processing framework for Ruby (like [NLTK](http://www.nltk.org/) for Python).\n- [nlp_toolz](https://github.com/LeFnord/nlp_toolz) -\n  Wrapper over some [OpenNLP](https://opennlp.apache.org/) classes and\n  the original [Berkeley Parser](https://github.com/slavpetrov/berkeleyparser).\n- [open_nlp](https://github.com/hck/open_nlp) -\n  JRuby Bindings for the [OpenNLP](https://opennlp.apache.org/) Toolkit.\n- [ruby-spacy](https://github.com/yohasebe/ruby-spacy) \u0026mdash;\n  Wrapper module for spaCy NLP library via [PyCall](https://github.com/mrkn/pycall.rb).\n\n#### On-line APIs\n\n- [alchemyapi_ruby](https://github.com/alchemyapi/alchemyapi_ruby) -\n  Legacy Ruby SDK for AlchemyAPI/Bluemix.\n- [wit-ruby](https://github.com/wit-ai/wit-ruby) -\n  Ruby client library for the [Wit.ai](https://wit.ai/) Language Understanding Platform.\n- [wlapi](https://github.com/arbox/wlapi) - Ruby client library for\n  [Wortschatz Leipzig](http://wortschatz.uni-leipzig.de/de) web services.\n- [monkeylearn-ruby](https://github.com/monkeylearn/monkeylearn-ruby) - Sentiment\n  Analysis, Topic Modelling, Language Detection, Named Entity Recognition via\n  a Ruby based Web API client.\n- [google-cloud-language](https://github.com/googleapis/google-cloud-ruby/tree/master/google-cloud-language) -\n  Google's Natural Language service API for Ruby.\n\n### Language Identification\n\nLanguage Identification is one of the first crucial steps in every NLP Pipeline.\n\n- [scylla](https://github.com/hashwin/scylla) -\n  Language Categorization and Identification.\n\n### Segmentation\n\nTools for Tokenization, Word and Sentence Boundary Detection and Disambiguation.\n\n- [tokenizer](https://github.com/arbox/tokenizer) -\n  Simple multilingual tokenizer.\n  \u003csup\u003e[[tutorial](tutorials/tokenizer.md)]\u003c/sup\u003e\n- [pragmatic_tokenizer](https://github.com/diasks2/pragmatic_tokenizer) -\n  Multilingual tokenizer to split a string into tokens.\n- [nlp-pure](https://github.com/parhamr/nlp-pure) -\n  Natural language processing algorithms implemented in pure Ruby with minimal dependencies.\n- [textoken](https://github.com/manorie/textoken) -\n  Simple and customizable text tokenization library.\n- [pragmatic_segmenter](https://github.com/diasks2/pragmatic_segmenter) -\n  Word Boundary Disambiguation with many cookies.\n- [punkt-segmenter](https://github.com/lfcipriani/punkt-segmenter) -\n  Pure Ruby implementation of the Punkt Segmenter.\n- [tactful_tokenizer](https://github.com/zencephalon/Tactful_Tokenizer) -\n  RegExp based tokenizer for different languages.\n- [scapel](https://github.com/louismullie/scalpel) -\n  Sentence Boundary Disambiguation tool.\n\n### Lexical Processing\n\n#### Stemming\n\nStemming is the term used in information retrieval to describe the process for\nreducing wordforms to some base representation. Stemming should be distinguished\nfrom [Lemmatization](#lemmatization) since `stems` are not necessarily have\nlinguistic motivation.\n\n- [ruby-stemmer](https://github.com/aurelian/ruby-stemmer) -\n  Ruby-Stemmer exposes the SnowBall API to Ruby.\n- [uea-stemmer](https://github.com/ealdent/uea-stemmer) -\n  Conservative stemmer for search and indexing.\n\n#### Lemmatization\n\nLemmatization is considered a process of finding a base form of a word. Lemmas\nare often collected in dictionaries.\n\n- [lemmatizer](https://github.com/yohasebe/lemmatizer) -\n  WordNet based Lemmatizer for English texts.\n\n#### Lexical Statistics: Counting Types and Tokens\n\n- [wc](https://github.com/thesp0nge/wc) -\n  Facilities to count word occurrences in a text.\n- [word_count](https://github.com/AtelierConvivialite/word_count) -\n  Word counter for `String` and `Hash` objects.\n- [words_counted](https://github.com/abitdodgy/words_counted) -\n  Pure Ruby library counting word statistics with different custom options.\n\n#### Filtering Stop Words\n\n- [stopwords-filter](https://github.com/brenes/stopwords-filter) - Filter and\n  Stop Word Lexicon based on the SnowBall lemmatizer.\n\n### Phrasal Level Processing\n\n- [n_gram](https://github.com/reddavis/N-Gram) -\n  N-Gram generator.\n- [ruby-ngram](https://github.com/tkellen/ruby-ngram) -\n  Break words and phrases into ngrams.\n- [raingrams](https://github.com/postmodern/raingrams) -\n  Flexible and general-purpose ngrams library written in pure Ruby.\n\n### Syntactic Processing\n\n#### Constituency Parsing\n\n- [stanfordparser](https://rubygems.org/gems/stanfordparser) -\n  Ruby based wrapper for the Stanford Parser.\n- [rley](https://github.com/famished-tiger/Rley) -\n  Pure Ruby implementation of the [Earley](https://en.wikipedia.org/wiki/Earley_parser)\n  Parsing Algorithm for Context-Free Constituency Grammars.\n- [rsyntaxtree](https://github.com/yohasebe/rsyntaxtree) -\n  Visualization for syntactic trees in Ruby based on [RMagick](https://github.com/rmagick/rmagick).\n  \u003csup\u003e[dep: [ImageMagick](#imagemagick)]\u003c/sup\u003e\n\n### Semantic Analysis\n\n- [amatch](https://github.com/flori/amatch) -\n  Set of five distance types between strings (including Levenshtein, Sellers, Jaro-Winkler, 'pair distance').\n- [damerau-levenshtein](https://github.com/GlobalNamesArchitecture/damerau-levenshtein) -\n  Calculates edit distance using the Damerau-Levenshtein algorithm.\n- [hotwater](https://github.com/colinsurprenant/hotwater) -\n  Fast Ruby FFI string edit distance algorithms.\n- [levenshtein-ffi](https://github.com/dbalatero/levenshtein-ffi) -\n  Fast string edit distance computation, using the Damerau-Levenshtein algorithm.\n- [tf_idf](https://github.com/reddavis/TF-IDF) -\n  Term Frequency / Inverse Document Frequency in pure Ruby.\n- [tf-idf-similarity](https://github.com/jpmckinney/tf-idf-similarity) -\n  Calculate the similarity between texts using TF/IDF.\n\n### Pragmatical Analysis\n- [SentimentLib](https://github.com/nzaillian/sentiment_lib) -\n  Simple extensible sentiment analysis gem.\n\n## High Level Tasks\n\n### Spelling and Error Correction\n\n- [gingerice](https://github.com/subosito/gingerice) -\n  Spelling and Grammar corrections via the [Ginger](https://www.gingersoftware.com/) API.\n- [hunspell-i18n](https://github.com/romanbsd/hunspell) -\n  Ruby bindings to the standard [Hunspell](https://hunspell.github.io/) Spell Checker.\n- [ffi-hunspell](https://github.com/postmodern/ffi-hunspell) -\n  FFI based Ruby bindings for [Hunspell](https://hunspell.github.io/).\n- [hunspell](https://github.com/segabor/Hunspell) -\n  Ruby bindings to [Hunspell](https://hunspell.github.io/) via Ruby C API.\n\n### Text Alignment\n\n- [alignment](https://github.com/povilasjurcys/alignment) -\n  Alignment routines for bilingual texts (Gale-Church implementation).\n\n### Machine Translation\n\n- [google-api-client](https://github.com/googleapis/google-api-ruby-client) -\n  Google API Ruby Client.\n- [microsoft_translator](https://github.com/ikayzo/microsoft_translator) -\n  Ruby client for the microsoft translator API.\n- [termit](https://github.com/pawurb/termit) -\n  Google Translate with speech synthesis in your terminal.\n- [zipf](https://github.com/pks/zipf) -\n  implementation of BLEU and other base algorithms.\n\n### Sentiment Analysis\n\n- [stimmung](https://github.com/pachacamac/stimmung) -\n  Semantic Polarity based on the\n  [SentiWS](http://wortschatz.uni-leipzig.de/en/download) lexicon.\n\n### Numbers, Dates, and Time Parsing\n\n- [chronic](https://github.com/mojombo/chronic) -\n  Pure Ruby natural language date parser.\n- [chronic_between](https://github.com/jrobertson/chronic_between) -\n  Simple Ruby natural language parser for date and time ranges.\n- [chronic_duration](https://github.com/henrypoydar/chronic_duration) -\n  Pure Ruby parser for elapsed time.\n- [kronic](https://github.com/xaviershay/kronic) -\n  Methods for parsing and formatting human readable dates.\n- [nickel](https://github.com/iainbeeston/nickel) -\n  Extracts date, time, and message information from naturally worded text.\n- [tickle](https://github.com/yb66/tickle) -\n  Parser for recurring and repeating events.\n- [numerizer](https://github.com/jduff/numerizer) -\n  Ruby parser for English number expressions.\n\n### Named Entity Recognition\n\n- [ruby-ner](https://github.com/mblongii/ruby-ner) -\n  Named Entity Recognition with Stanford NER and Ruby.\n- [ruby-nlp](https://github.com/tiendung/ruby-nlp) -\n  Ruby Binding for Stanford Pos-Tagger and Name Entity Recognizer.\n\n### Text-to-Speech-to-Text\n\n- [espeak-ruby](https://github.com/dejan/espeak-ruby) -\n  Small Ruby API for utilizing 'espeak' and 'lame' to create text-to-speech mp3 files.\n- [tts](https://github.com/c2h2/tts) -\n  Text-to-Speech conversion using the Google translate service.\n- [att_speech](https://github.com/adhearsion/att_speech) -\n  Ruby wrapper over the AT\u0026T Speech API for speech to text.\n- [pocketsphinx-ruby](https://github.com/watsonbox/pocketsphinx-ruby) -\n  Pocketsphinx bindings.\n\n## Dialog Agents, Assistants, and Chatbots\n\n- [chatterbot](https://github.com/muffinista/chatterbot) -\n  Straightforward ruby-based Twitter Bot Framework, using OAuth to authenticate.\n- [lita](https://github.com/litaio/lita) -\n  Highly extensible chat operation bot framework written with persistent storage on [Redis](https://redis.io/).\n\n## Linguistic Resources\n\n- [rwordnet](https://github.com/doches/rwordnet) -\n  Pure Ruby self contained API library for the [Princeton WordNet®](https://wordnet.princeton.edu/).\n- [wordnet](https://github.com/ged/ruby-wordnet/blob/master/README.rdoc) -\n  Performance tuned bindings for the [Princeton WordNet®](https://wordnet.princeton.edu/).\n\n## Machine Learning Libraries\n\n[Machine Learning](https://en.wikipedia.org/wiki/Machine_learning) Algorithms\nin pure Ruby or written in other programming languages with appropriate bindings\nfor Ruby.\n\nFor more up-to-date list please look at the [Awesome ML with Ruby][ml-with-ruby] list.\n\n- [rb-libsvm](https://github.com/febeling/rb-libsvm) -\n  Support Vector Machines with Ruby.\n- [weka](https://github.com/paulgoetze/weka-jruby) -\n  JRuby bindings for Weka, different ML algorithms implemented through Weka.\n- [decisiontree](https://github.com/igrigorik/decisiontree) -\n  Decision Tree ID3 Algorithm in pure Ruby\n  \u003csup\u003e[[post](https://www.igvita.com/2007/04/16/decision-tree-learning-in-ruby/)]\u003c/sup\u003e.\n- [rtimbl](https://github.com/maspwr/rtimbl) -\n  Memory based learners from the Timbl framework.\n- [classifier-reborn](https://github.com/jekyll/classifier-reborn) -\n  General classifier module to allow Bayesian and other types of classifications.\n- [lda-ruby](https://github.com/ealdent/lda-ruby) -\n  Ruby implementation of the [LDA](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation)\n  (Latent Dirichlet Allocation) for automatic Topic Modelling and Document Clustering.\n- [liblinear-ruby-swig](https://github.com/tomz/liblinear-ruby-swig) -\n  Ruby interface to LIBLINEAR (much more efficient than LIBSVM for text classification).\n- [linnaeus](https://github.com/djcp/linnaeus) -\n  Redis-backed Bayesian classifier.\n- [maxent_string_classifier](https://github.com/mccraigmccraig/maxent_string_classifier) -\n  JRuby maximum entropy classifier for string data, based on the OpenNLP Maxent framework.\n- [naive_bayes](https://github.com/reddavis/Naive-Bayes) -\n  Simple Naive Bayes classifier.\n- [nbayes](https://github.com/oasic/nbayes) -\n  Full-featured, Ruby implementation of Naive Bayes.\n- [omnicat](https://github.com/mustafaturan/omnicat) -\n  Generalized rack framework for text classifications.\n- [omnicat-bayes](https://github.com/mustafaturan/omnicat-bayes) -\n  Naive Bayes text classification implementation as an OmniCat classifier strategy.\n- [ruby-fann](https://github.com/tangledpath/ruby-fann) -\n  Ruby bindings to the [Fast Artificial Neural Network Library (FANN)](http://leenissen.dk/fann/wp/).\n- [rblearn](https://github.com/himkt/rblearn) - Feature Extraction and Crossvalidation library.\n\n## Data Visualization\n\nPlease refer to the [Data Visualization](https://github.com/arbox/data-science-with-ruby#visualization)\nsection on the [Data Science with Ruby][ds-with-ruby] list.\n\n## Optical Character Recognition\n\n* [tesseract-ocr](https://github.com/meh/ruby-tesseract-ocr) -\n  FFI based wrapper over the [Tesseract OCR Engine](https://github.com/tesseract-ocr/tesseract).\n\n## Text Extraction\n\n- [yomu](https://github.com/yomurb/yomu) -\n  library for extracting text and metadata from files and documents\n  using the [Apache Tika](https://tika.apache.org/) content analysis toolkit.\n\n## Full Text Search, Information Retrieval, Indexing\n\n- [rsolr](https://github.com/rsolr/rsolr) -\n  Ruby and Rails client library for [Apache Solr](http://lucene.apache.org/solr/).\n- [sunspot](https://github.com/sunspot/sunspot) -\n  Rails centric client for [Apache Solr](http://lucene.apache.org/solr/).\n- [thinking-sphinx](https://github.com/pat/thinking-sphinx) -\n  [Active Record](https://guides.rubyonrails.org/active_record_basics.html)\n  plugin for using [Sphinx](http://sphinxsearch.com/) in (not only) Rails based projects.\n- [elasticsearch](https://github.com/elastic/elasticsearch-ruby/tree/master/elasticsearch) -\n  Ruby client and API for [Elasticsearch](https://www.elastic.co/).\n- [elasticsearch-rails](https://github.com/elastic/elasticsearch-rails) -\n  Ruby and Rails integrations for [Elasticsearch](https://www.elastic.co/).\n- [google-api-client](https://github.com/googleapis/google-api-ruby-client) -\n  Ruby API library for [Google](https://developers.google.com/api-client-library/ruby/) services.\n\n## Language Aware String Manipulation\n\nLibraries for language aware string manipulation, i.e. search, pattern matching,\ncase conversion, transcoding, regular expressions which need information about\nthe underlying language.\n\n- [fuzzy_match](https://github.com/seamusabshere/fuzzy_match) -\n  Fuzzy string comparison with Distance measures and Regular Expression.\n- [fuzzy-string-match](https://github.com/kiyoka/fuzzy-string-match) -\n  Fuzzy string matching library for Ruby.\n- [active_support](https://github.com/rails/rails/tree/master/activesupport/lib/active_support) -\n  RoR `ActiveSupport` gem has various string extensions that can handle case.\n- [fuzzy_tools](https://github.com/brianhempel/fuzzy_tools) -\n  Toolset for fuzzy searches in Ruby tuned for accuracy.\n- [u](http://disu.se/software/u-1.0/) -\n  U extends Ruby’s Unicode support.\n- [unicode](https://github.com/blackwinter/unicode) -\n  Unicode normalization library.\n- [CommonRegexRuby](https://github.com/talyssonoc/CommonRegexRuby) -\n  Find a lot of kinds of common information in a string.\n- [regexp-examples](https://github.com/tom-lord/regexp-examples) -\n  Generate strings that match a given regular expression.\n- [verbal_expressions](https://github.com/ryan-endacott/verbal_expressions) -\n  Make difficult regular expressions easy.\n- [translit_kit](https://github.com/AnalyzePlatypus/TranslitKit) -\n  Transliterate Hebrew \u0026 Yiddish text into Latin characters.\n- [re2](https://github.com/mudge/re2) -\n  hight-speed Regular Expression library for Text Mining and Text Extraction.\n- [regex_sample](https://github.com/mochizukikotaro/regex_sample) -\n  sample string generation from a given Regular Expression.\n- [iuliia](https://github.com/adnikiforov/iuliia-rb) \u0026mdash;\n  transliteration Cyrillic to Latin in many possible ways (defined by the [reference implementation](https://github.com/nalgeon/iuliia)).\n\n## Articles, Posts, Talks, and Presentations\n\n- 2019\n  - _Extracting Text From Images Using Ruby_ by [aonemd](https://twitter.com/aonemd)\n    \u003csup\u003e[[post](https://aonemd.github.io/blog/extracting-text-from-images-using-ruby) |\n    [code](https://gist.github.com/aonemd/7bb3c4760d9e47a9ce8e270198cb40a0)]\u003c/sup\u003e\n- 2018\n  - _Natural Language Processing and Tweet Sentiment Analysis_ by [Cassandra Corrales](https://twitter.com/casita305)\n    \u003csup\u003e[[post](https://medium.com/@cmcorrales3/natural-language-processing-and-tweet-sentiment-analysis-fa1edbb5ddd5)]\u003c/sup\u003e\n- 2017\n  - _The Google NLP API Meets Ruby_ by [Aja Hammerly](https://twitter.com/the_thagomizer)\n    \u003csup\u003e[[post](http://www.thagomizer.com/blog/2017/04/13/the-google-nlp-api-meets-ruby.html)]\u003c/sup\u003e\n  - _Syntax Isn't Everything: NLP For Rubyists_ by [Aja Hammerly](https://twitter.com/the_thagomizer)\n    \u003csup\u003e[[slides](http://www.thagomizer.com/files/NLP_RailsConf2017.pdf)]\u003c/sup\u003e\n  - _Scientific Computing on JRuby_ by [Prasun Anand](https://twitter.com/prasun_anand)\n    \u003csup\u003e[[slides](https://www.slideshare.net/PrasunAnand2/fosdem2017-scientific-computing-on-jruby) |\n    [video](https://ftp.fau.de/fosdem/2017/K.4.201/ruby_scientific_computing_on_jruby.mp4) |\n    [slides](https://www.slideshare.net/PrasunAnand2/scientific-computing-on-jruby) |\n    [slides](https://www.slideshare.net/PrasunAnand2/scientific-computation-on-jruby)]\u003c/sup\u003e\n  - _Unicode Normalization in Ruby_ by [Starr Horne](https://twitter.com/starrhorne)\n    \u003csup\u003e[[post](https://blog.honeybadger.io/ruby_unicode_normalization/)]\u003c/sup\u003e\n- 2016\n  - _Quickly Create a Telegram Bot in Ruby_ by [Ardian Haxha](https://twitter.com/ArdianHaxha)\n    \u003csup\u003e[[tutorial](https://www.sitepoint.com/quickly-create-a-telegram-bot-in-ruby/)]\u003c/sup\u003e\n  - _Deep Learning: An Introduction for Ruby Developers_ by [Geoffrey Litt](https://twitter.com/geoffreylitt)\n    \u003csup\u003e[[slides](https://speakerdeck.com/geoffreylitt/deep-learning-an-introduction-for-ruby-developers)]\u003c/sup\u003e\n  - _How I made a pure-Ruby word2vec program more than 3x faster_ by [Kei Sawada](https://twitter.com/remore)\n    \u003csup\u003e[[slides](https://speakerdeck.com/remore/how-i-made-a-pure-ruby-word2vec-program-more-than-3x-faster)]\u003c/sup\u003e\n  - _Dōmo arigatō, Mr. Roboto: Machine Learning with Ruby_ by [Eric Weinstein](https://twitter.com/ericqweinstein)\n    \u003csup\u003e[[slides](https://speakerdeck.com/ericqweinstein/domo-arigato-mr-roboto-machine-learning-with-ruby) | [video](https://www.youtube.com/watch?v=T1nFQ49TyeA)]\u003c/sup\u003e\n- 2015\n  - _N-gram Analysis for Fun and Profit_ by [Jesus Castello](https://github.com/matugm)\n    \u003csup\u003e[[tutorial](https://www.rubyguides.com/2015/09/ngram-analysis-ruby/)]\u003c/sup\u003e\n  - _Machine Learning made simple with Ruby_ by [Lorenzo Masini](https://github.com/rugginoso)\n    \u003csup\u003e[[tutorial](https://www.leanpanda.com/blog/2015/08/24/machine-learning-automatic-classification/)]\u003c/sup\u003e\n  - _Using Ruby Machine Learning to Find Paris Hilton Quotes_ by [Rick Carlino](https://github.com/RickCarlino)\n    \u003csup\u003e[[tutorial](http://web.archive.org/web/20160414072324/http://datamelon.io/blog/2015/using-ruby-machine-learning-id-paris-hilton-quotes.html)]\u003c/sup\u003e\n  - _Exploring Natural Language Processing in Ruby_ by [Kevin Dias](https://github.com/diasks2)\n    \u003csup\u003e[[slides](https://www.slideshare.net/diasks2/exploring-natural-language-processing-in-ruby)]\u003c/sup\u003e\n  - _Machine Learning made simple with Ruby_ by [Lorenzo Masini](https://twitter.com/rugginoso)\n    \u003csup\u003e[[post](https://www.leanpanda.com/blog/2015/08/24/machine-learning-automatic-classification/)]\u003c/sup\u003e\n  - _Practical Data Science in Ruby_ by Bobby Grayson\n    \u003csup\u003e[[slides](http://slides.com/bobbygrayson/p#/)]\u003c/sup\u003e\n- 2014\n  - _Natural Language Parsing with Ruby_ by [Glauco Custódio](https://github.com/glaucocustodio)\n    \u003csup\u003e[[tutorial](http://glaucocustodio.github.io/2014/11/10/natural-language-parsing-with-ruby/)]\u003c/sup\u003e\n  - _Demystifying Data Science: Analyzing Conference Talks with Rails and Ngrams_ by\n    [Todd Schneider](https://github.com/toddwschneider)\n    \u003csup\u003e[[video](https://www.youtube.com/watch?v=2ZDCxwB29Bg) | [code](https://github.com/Genius/abstractogram)]\u003c/sup\u003e\n  - _Natural Language Processing with Ruby_ by [Konstantin Tennhard](https://github.com/t6d)\n    \u003csup\u003e[[video](https://www.youtube.com/watch?v=5u86qVh8r0M) | [video](https://www.youtube.com/watch?v=oFmy_QBQ5DU) |\n    [video](https://www.youtube.com/watch?v=sPkeeWnsMn0) |\n    [slides](http://euruko2013.org/speakers/presentations/natural_language_processing_with_ruby_and_opennlp-tennhard.pdf)]\u003c/sup\u003e\n- 2013\n  - _How to parse 'go' - Natural Language Processing in Ruby_ by\n    [Tom Cartwright](https://twitter.com/tomcartwrightuk)\n    \u003csup\u003e[[slides](https://www.slideshare.net/TomCartwright/natual-language-processing-in-ruby) |\n    [video](https://skillsmatter.com/skillscasts/4883-how-to-parse-go)]\u003c/sup\u003e\n  - _Natural Language Processing in Ruby_ by [Brandon Black](https://twitter.com/brandonmblack)\n    \u003csup\u003e[[slides](https://speakerdeck.com/brandonblack/natural-language-processing-in-ruby) |\n    [video](http://confreaks.tv/videos/railsconf2013-natural-language-processing-with-ruby)]\u003c/sup\u003e\n  - _Natural Language Processing with Ruby: n-grams_ by [Nathan Kleyn](https://github.com/nathankleyn)\n    \u003csup\u003e[[tutorial](https://www.sitepoint.com/natural-language-processing-ruby-n-grams/) |\n    [code](https://github.com/nathankleyn/ruby-nlp)]\u003c/sup\u003e\n  - _Seeking Lovecraft, Part 1: An introduction to NLP and the Treat Gem_ by\n    [Robert Qualls](https://github.com/rlqualls)\n    \u003csup\u003e[[tutorial](https://www.sitepoint.com/seeking-lovecraft-part-1-an-introduction-to-nlp-and-the-treat-gem/)]\u003c/sup\u003e\n- 2012\n  - _Machine Learning with Ruby, Part One_ by [Vasily Vasinov](https://twitter.com/vasinov)\n    \u003csup\u003e[[tutorial](http://www.vasinov.com/blog/machine-learning-with-ruby-part-one/)]\u003c/sup\u003e\n- 2011\n  - _Ruby one-liners_ by [Benoit Hamelin](https://twitter.com/benoithamelin)\n    \u003csup\u003e[[post](http://benoithamelin.tumblr.com/ruby1line)]\u003c/sup\u003e\n  - _Clustering in Ruby_ by [Colin Drake](https://twitter.com/colinfdrake)\n    \u003csup\u003e[[post](https://colindrake.me/post/k-means-clustering-in-ruby/)/)]\u003c/sup\u003e\n- 2010\n  - _bayes_motel – Bayesian classification for Ruby_ by [Mike Perham](https://twitter.com/mperham)\n    \u003csup\u003e[[post](http://www.mikeperham.com/2010/04/28/bayes_motel-bayesian-classification-for-ruby/)]\u003c/sup\u003e\n- 2009\n  - _Porting the UEA-Lite Stemmer to Ruby_ by [Jason Adams](https://twitter.com/ealdent)\n    \u003csup\u003e[[post](https://ealdent.wordpress.com/2009/07/16/porting-the-uea-lite-stemmer-to-ruby/)]\u003c/sup\u003e\n  - _NLP Resources for Ruby_ by [Jason Adams](https://twitter.com/ealdent)\n    \u003csup\u003e[[post](https://ealdent.wordpress.com/2009/09/13/nlp-resources-for-ruby/)]\u003c/sup\u003e\n- 2008\n  - _Support Vector Machines (SVM) in Ruby_ by [Ilya Grigorik](https://twitter.com/igrigorik)\n    \u003csup\u003e[[post](https://www.igvita.com/2008/01/07/support-vector-machines-svm-in-ruby/)]\u003c/sup\u003e\n  - _Practical text classification with Ruby_ by [Gleicon Moraes](https://twitter.com/gleicon)\n    \u003csup\u003e[[post](https://zenmachine.wordpress.com/practical-text-classification-with-ruby/) |\n    [code](https://github.com/gleicon/zenmachine)]\u003c/sup\u003e\n- 2007\n  - _Decision Tree Learning in Ruby_ by [Ilya Grigorik](https://twitter.com/igrigorik)\n    \u003csup\u003e[[post](https://www.igvita.com/2007/04/16/decision-tree-learning-in-ruby/)]\u003c/sup\u003e\n- 2006\n  - _Speak My Language: Natural Language Processing With Ruby_ by [Michael Granger](https://deveiate.org/resume.html)\n    \u003csup\u003e[[slides](https://deveiate.org/misc/Speak-My-Language.pdf) |\n          [write-up](http://blog.nicksieger.com/articles/2006/10/22/rubyconf-natural-language-generation-and-processing-in-ruby/) |\n          [write-up](http://juixe.com/papers/RubyConf2006.pdf)]\u003c/sup\u003e\n\n## Projects and Code Examples\n\n- [Going the Distance](https://github.com/schneems/going_the_distance) -\n  Implementations of various distance algorithms with example calculations.\n- [Named entity recognition with Stanford NER and Ruby](https://github.com/mblongii/ruby-ner) -\n  NER Examples in Ruby and Java with some [explanations](https://web.archive.org/web/20120722225402/http://mblongii.com/2012/04/15/named-entity-recognition-with-stanford-ner-and-ruby/).\n- [Words Counted](http://rubywordcount.com/) -\n  examples of customizable word statistics powered by\n  [words_counted](https://github.com/abitdodgy/words_counted).\n- [RSyntaxTree](https://yohasebe.com/rsyntaxtree/) -\n  Web based demonstration of the syntactic tree visualization.\n\n## Books\n\n-  [Miller, Rob](https://twitter.com/robmil/).\n   _Text Processing with Ruby: Extract Value from the Data That Surrounds You._\n   Pragmatic Programmers, 2015.\n   \u003csup\u003e[[link](https://www.amazon.com/Text-Processing-Ruby-Extract-Surrounds/dp/1680500708)]\u003c/sup\u003e\n-  [Watson, Mark](https://twitter.com/mark_l_watson).\n   _Scripting Intelligence: Web 3.0 Information Gathering and Processing._\n   APRESS, 2010.\n   \u003csup\u003e[[link](https://www.amazon.de/Scripting-Intelligence-Information-Gathering-Processing/dp/1430223510)]\u003c/sup\u003e\n-  [Watson, Mark](https://twitter.com/mark_l_watson).\n   _Practical Semantic Web and Linked Data Applications._ Lulu, 2010.\n   \u003csup\u003e[[link](http://www.lulu.com/shop/mark-watson/practical-semantic-web-and-linked-data-applications-java-edition/paperback/product-10915016.html)]\u003c/sup\u003e\n\n## Community\n\n- [Reddit](https://www.reddit.com/r/LanguageTechnology/search?q=ruby\u0026restrict_sr=on)\n- [Stack Overflow](https://stackoverflow.com/search?q=%5Bnlp%5D+and+%5Bruby%5D)\n- [Twitter](https://twitter.com/search?q=Ruby%20NLP%20%23ruby%20OR%20%23nlproc%20OR%20%23rubynlp%20OR%20%23nlp\u0026src=typd\u0026lang=en)\n\n## Needs your Help!\n\nAll projects in this section are really important for the community but need\nmore attention. Please if you have spare time and dedication spend some hours\non the code here.\n\n- [ferret](https://github.com/dbalmain/ferret) -\n  Information Retrieval in C and Ruby.\n- [summarize](https://github.com/ssoper/summarize) -\n  Ruby native wrapper for [Open Text Summarizer](https://github.com/neopunisher/Open-Text-Summarizer).\n\n## Related Resources\n\n- [Neural Machine Translation Implementations](https://github.com/jonsafari/nmt-list)\n- [Awesome Ruby](https://github.com/markets/awesome-ruby#natural-language-processing) -\n  Among other awesome items a short list of NLP related projects.\n- [Ruby NLP](https://github.com/diasks2/ruby-nlp) -\n  State-of-Art collection of Ruby libraries for NLP.\n- [Speech and Natural Language Processing](https://github.com/edobashira/speech-language-processing) -\n  General List of NLP related resources (mostly not for Ruby programmers).\n- [Scientific Ruby](http://sciruby.com/) -\n  Linear Algebra, Visualization and Scientific Computing for Ruby.\n- [iRuby](https://github.com/SciRuby/iruby) - IRuby kernel for Jupyter (formelly IPython).\n- [Awesome OCR](https://github.com/kba/awesome-ocr) -\n  Multitude of OCR (Optical Character Recognition) resources.\n- [Awesome TensorFlow](https://github.com/jtoy/awesome-tensorflow) -\n  Machine Learning with TensorFlow libraries.\n- \u003ca name=\"imagemagic\"\u003e\u003c/a\u003e\n  [ImageMagick](https://imagemagick.org/index.php)\n\n## License\n\n[![Creative Commons Zero 1.0](http://mirrors.creativecommons.org/presskit/buttons/80x15/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/) `Awesome NLP with Ruby` by [Andrei Beliankou](https://github.com/arbox) and\n[Contributors](https://github.com/arbox/nlp-with-ruby/graphs/contributors).\n\nTo the extent possible under law, the person who associated CC0 with\n`Awesome NLP with Ruby` has waived all copyright and related or neighboring rights\nto `Awesome NLP with Ruby`.\n\nYou should have received a copy of the CC0 legalcode along with this\nwork. If not, see \u003chttps://creativecommons.org/publicdomain/zero/1.0/\u003e.\n\n\u003c!--- Links ---\u003e\n[ruby]: https://www.ruby-lang.org/en/\n[motivation]: https://github.com/arbox/nlp-with-ruby/blob/master/motivation.md\n[faq]: https://github.com/arbox/nlp-with-ruby/blob/master/FAQ.md\n[ds-with-ruby]: https://github.com/arbox/data-science-with-ruby\n[ml-with-ruby]: https://github.com/arbox/machine-learning-with-ruby\n[change-pr]: https://github.com/RichardLitt/knowledge/blob/master/github/amending-a-commit-guide.md\n","funding_links":["https://www.patreon.com/arbox"],"categories":["Data Science","Computer Science","Technical","Ruby","Uncategorized","计算机科学","Other Lists","Natural Language Processing","Packages","函式庫","Live Site:   [searchAwesome](https://search-awesome.vercel.app/)","Libraries","Natural Language Understanding","Knowledge","Themed Directories"],"sub_categories":["General-Purpose Machine Learning","Uncategorized","TeX Lists","ramanihiteshc@gmail.com","Libraries","書籍","Books"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farbox%2Fnlp-with-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farbox%2Fnlp-with-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farbox%2Fnlp-with-ruby/lists"}