{"id":13858507,"url":"https://github.com/arbox/data-science-with-ruby","last_synced_at":"2025-07-20T09:31:31.448Z","repository":{"id":20804165,"uuid":"88170991","full_name":"arbox/data-science-with-ruby","owner":"arbox","description":"Practical Data Science with Ruby based tools.","archived":false,"fork":false,"pushed_at":"2023-07-19T13:51:33.000Z","size":217,"stargazers_count":695,"open_issues_count":1,"forks_count":51,"subscribers_count":40,"default_branch":"master","last_synced_at":"2024-05-23T06:25:50.799Z","etag":null,"topics":["awesome","awesome-list","data-analysis","data-analytics","data-mining","data-science","data-visualization","list","ruby","rubydatascience","visualization"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arbox.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":"contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-04-13T14:01:18.000Z","updated_at":"2024-05-13T09:43:50.000Z","dependencies_parsed_at":"2024-01-07T10:50:48.063Z","dependency_job_id":"a964452c-e755-45d1-b120-8967655eb9aa","html_url":"https://github.com/arbox/data-science-with-ruby","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Fdata-science-with-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Fdata-science-with-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Fdata-science-with-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arbox%2Fdata-science-with-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arbox","download_url":"https://codeload.github.com/arbox/data-science-with-ruby/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226449352,"owners_count":17626899,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awesome","awesome-list","data-analysis","data-analytics","data-mining","data-science","data-visualization","list","ruby","rubydatascience","visualization"],"created_at":"2024-08-05T03:02:11.666Z","updated_at":"2024-11-27T15:19:43.643Z","avatar_url":"https://github.com/arbox.png","language":"Ruby","readme":"\u003cimg src=\"header.png\" align=\"center\"\u003e\n\n[[RubyNLP](https://github.com/arbox/nlp-with-ruby) |\n [RubyML](https://github.com/arbox/machine-learning-with-ruby) |\n [RubyInterop](https://github.com/arbox/ruby-interoperability)]\n\n\n# Awesome Data Science with Ruby [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)\n\n\u003e Links and Resources for Data Processing and Analysis in Ruby\n\n[Data Science](https://en.wikipedia.org/wiki/Data_science) is a new\n\"sexy\" buzzword without specific meaning but often used to substitute\nStatistics, Scientific Computing, Text and Data Mining and\nVisualization, Machine Learning, Data Processing and Warehousing as\nwell as Retrieval Algorithms of any kind.\n\nThis curated list comprises [_awesome_][awesome] tutorials, libraries,\ninformation sources about various Data Science applications using\nthe [Ruby programming language][ruby].\n\nA lot of useful resources on this list come from the development by\n[The Ruby Science Foundation][sciruby], our [contributors][contributors] and\nour own day to day work on various data intensive applications.\nRead [why](#wait-but-why) this list is awesome.\n\n:sparkles: Every [contribution](contributing.md) is welcome!\nAdd links through pull requests or create an issue to start a discussion.\n\nFollow us on [Twitter](https://twitter.com/NonWebRuby)\nand please spread the word using the `#RubyDataScience` hash tag!\n\n\u003c!-- nodoc --\u003e\n\n## Contents\n\n\u003c!-- toc --\u003e\n\n- [Ruby vs. Python vs. Julia vs. R](#ruby-vs-python-vs-julia-vs-r)\n- [Standing on the shoulders of giants](#standing-on-the-shoulders-of-giants)\n- [Data Manipulation](#data-manipulation)\n- [Distributed Computing](#distributed-computing)\n- [Data Structures](#data-structures)\n- [Data sets](#data-sets)\n- [Statistics](#statistics)\n- [Numeric and Symbolic Computation](#numeric-and-symbolic-computation)\n- [Visualization](#visualization)\n- [Interactive Computing](#interactive-computing)\n- [Input and Output](#input-and-output)\n  * [General formats](#general-formats)\n  * [Database Adapters](#database-adapters)\n  * [Domain specific formats](#domain-specific-formats)\n- [Provisioning Infrastructure](#provisioning-infrastructure)\n- [Machine Learning](#machine-learning)\n- [Articles, Posts, Talks, and Presentations](#articles-posts-talks-and-presentations)\n- [Related resources](#related-resources)\n- [Wait but why?](#wait-but-why)\n- [License](#license)\n\n\u003c!-- tocstop --\u003e\n\n\u003c!-- doc --\u003e\n\n## Ruby vs. Python vs. Julia vs. R\n\n| Ruby         | Python | Julia | R   |\n| ---          | ---    | ---   | --- |\n| Daru / Rover | Pandas |       |     |\n| NArray       | NumPy  |       |     |\n\n## Standing on the shoulders of giants\n\nRuby is (for now) not a Data Science centric language with a very large established library.\nLeveraging libraries from R, Python, and Julia helps Ruby to solve your tasks!\n\u003c!--- TODO: Add the talk by @mrkn ---\u003e\n\n- [pycall](https://github.com/mrkn/pycall.rb) \u0026mdash; Bridge into the Python world.\n- [rserve-client](https://github.com/clbustos/Rserve-Ruby-client) \u0026mdash;\n  Ruby connector for [Rserve](http://www.rforge.net/Rserve/), R's binary server.\n\n## Data Manipulation\n\n- [kiba](https://github.com/thbar/kiba/) \u0026mdash;\n  lightweight Ruby ETL (Extract-Transform-Load) framework.\n- [jongleur](https://gitlab.com/RedFred7/Jongleur) \u0026mdash;\n  Workflow manager using DAG definitions to execute ETL tasks.\n\n## Distributed Computing\n\n- [ruby-spark](https://github.com/ondra-m/ruby-spark) \u0026mdash;\n  Ruby Interface to [Apache Spark](https://spark.apache.org/) 1.x.x.\n- [jruby-spark](https://github.com/chyh1990/jruby-spark) \u0026mdash;\n  JRuby based bindings for [Apache Spark](https://spark.apache.org/).\n\n## Data Structures\n\n- [daru](https://github.com/SciRuby/daru) \u0026mdash;\n  Data Frame and Vector structures with comprehensive manipulating and visualization methods.\n- [Rover](https://github.com/ankane/rover) \u0026mdash;\n  Data Frame and Vector structures with comprehensive manipulating and visualization methods.\n- [numo-narray](https://github.com/ruby-numo/numo-narray) \u0026mdash;\n  n-dimensional Numerical Array for Ruby.\n- [nmatrix](https://github.com/sciruby/nmatrix) \u0026mdash;\n  dense and sparse linear algebra library for Ruby via [SciRuby](http://sciruby.com/).\n- [kdtree](https://github.com/gurgeous/kdtree) \u0026mdash;\n  blazingly fast native 2d k-d tree.\n- [mdarray](https://github.com/rbotafogo/mdarray) \u0026mdash;\n  Array structure for `JRuby`.\n- [spreadsheet](https://github.com/zdavatz/spreadsheet) \u0026mdash;\n  manipulation library for MS Excel spreadsheets.\n- [networkx](https://github.com/SciRuby/networkx.rb) \u0026mdash;\n  Ruby based [NetworkX](https://networkx.github.io/) clone that handles various\n  usecases of the Graph Data Structure.\n- [cumo](https://github.com/sonots/cumo) \u0026mdash;\n  CUDA-aware numerical Array library with [NArray](https://github.com/ruby-numo/numo-narray) similar interface.\n\n## Data sets\n\n- [rdatasets](https://github.com/kojix2/rdatasets) \u0026mdash;\n  Data sets available in R via [Rdatasets](https://github.com/vincentarelbundock/Rdatasets).\n- [red-datasets](https://github.com/red-data-tools/red-datasets) \u0026mdash;\n  Growing collection of publicly available data sets such as CIFAR-10, Iris, MNIST etc.\n\n## Statistics\n\n- [rb-gsl](https://github.com/blackwinter/rb-gsl) \u0026mdash;\n  Ruby interface to the GNU Scientific Library. \u003csup\u003e[[dep: GLS](#gls)]\u003c/sup\u003e\n- [simple_stats](https://github.com/brianhempel/simple_stats) \u0026mdash;\n  `Enumerable` patches for descriptive statistics.\n- [enumerable-statistics](https://github.com/mrkn/enumerable-statistics) \u0026mdash;\n  fast implementation of descriptive statistics for the `Enumerable` module.\n- [statsample](https://github.com/sciruby/statsample) \u0026mdash;\n  basic and advanced statistics for Ruby. \u003csup\u003e[[dep: GLS](#gls)]\u003c/sup\u003e\n- [statsample-glm](https://github.com/sciruby/statsample-glm) \u0026mdash;\n  extension of `statsample` by Generalized Linear Models.\n- [statsample-bivariate-extension](https://github.com/sciruby/statsample-bivariate-extension) \u0026mdash;\n  extension of `statsample` by Bivariate Correlations.\n- [statsample-timeseries](https://github.com/sciruby/statsample-timeseries) \u0026mdash;\n  extension of `statsample` by Time Series estimators.\n- [pca](https://github.com/gbuesing/pca) \u0026mdash;\n  Principal Component Analysis (PCA) in Ruby.\n- [descriptive-statistics](https://github.com/jtescher/descriptive-statistics) \u0026mdash;\n  descriptive extensions for the `Enumerable` module or standalone usage.\n- [distribution](https://github.com/sciruby/distribution) \u0026mdash;\n  probabilistic distributions and descriptive measures for them.\n- [statistics2](https://github.com/abscondment/statistics2) \u0026mdash;\n  Normal, Chi-square, t- and F- probability distributions for Ruby.\n- [fast_statistics](https://github.com/Martin-Nyaga/fast_statistics) \u0026mdash;\n  fast computation of descriptive statistics (min, max, mean, median, 1st and 3rd quartiles, population standard deviation) for a multivariate dataset.\n\n## Numeric and Symbolic Computation\n\n- [numo-linalg](https://github.com/ruby-numo/numo-linalg) \u0026mdash;\n  linear algebraic operations for NArray.\n- [numo-gsl](https://github.com/ruby-numo/numo-gsl) \u0026mdash;\n  Math and Statistics for NArray using GSL.\u003csup\u003e[[dep: GSL](#gsl)]\u003c/sup\u003e\n- [symengine](https://github.com/symengine/symengine.rb) \u0026mdash;\n  Symbolic Computation with [SymEngine](https://github.com/symengine/symengine).\n- [numo-ffte](https://github.com/ruby-numo/numo-ffte) \u0026mdash;\n  Fast Fourier Transformation for NArray using the FFTE package.\u003csup\u003e[[FFTE](#ffte)]\u003c/sup\u003e\n\n## Visualization\n\nComprehensive tools for Data Visualization.\n\n- [matplotlib](https://github.com/mrkn/matplotlib.rb) \u0026mdash;\n  Ruby based wrapper around [matplotlib](https://matplotlib.org/).\n  \u003csup\u003e[[dep: matplotlib](#matplotlib)]\u003c/sup\u003e\n- [mathematical](https://github.com/gjtorikian/mathematical) \u0026mdash;\n  PNG and MathML renderings for your equations.\n- [daru-view](https://github.com/sciruby/daru-view) \u0026mdash;\n  daru-view is interactive plotting gem for web application\n  (any Ruby web application framework like Rails/Sinatra/Nanoc/Hanami) \u0026 IRuby notebook.\n  It is a plugin gem for daru.\n- [daru-plotly](https://github.com/genya0407/daru-plotly) \u0026mdash;\n  [Plotly](https://plot.ly/) based visualization for Daru.\n- [benchmark-plot](https://github.com/v0dro/benchmark-plot)\n- [Vega](https://github.com/ankane/vega) \u0026mdash;\n  [Vega](https://vega.github.io/vega/) and [Vega-lite](https://vega.github.io/vega-lite/)\n  based visualization for Rover.\n- [Gruff](https://github.com/topfunky/gruff) \u0026mdash;\n  graphing library built on top of [rmagick](https://github.com/rmagick/rmagick).\n- [Rubyplot](https://github.com/SciRuby/rubyplot) \u0026mdash;\n  graphing library built on top of [GR](https://gr-framework.org).\n- [Nyaplotjs](https://github.com/domitry/Nyaplotjs)\n- [nyaplot](https://github.com/domitry/nyaplot)\n- [gnuplotrb](https://github.com/SciRuby/gnuplotrb)\n- [ruby-graphviz](https://github.com/glejeune/Ruby-Graphviz)\n  \u003csup\u003e[[dep: Graphviz](#graphviz)]\u003c/sup\u003e\n- [gnuplot](https://github.com/rdp/ruby_gnuplot/tree/master)\n  \u003csup\u003e[[dep: gnuplot](#gnuplot)]\u003c/sup\u003e\n- https://github.com/zuhao/plotrb\n- https://github.com/brasten/scruffy\n- https://github.com/zverok/worldize\n- https://github.com/masa16/ruby-mathgl\n- [numo-gnuplot](https://github.com/ruby-numo/numo-gnuplot) \u0026mdash;\n  gnuplot interface for the Numo package.\n- [chartkick](https://github.com/ankane/chartkick) \u0026mdash;\n  Create beautiful JavaScript charts with one line of Ruby.\n- [iruby-chartkick](https://github.com/Absolventa/iruby-chartkick) \u0026mdash;\n  Use [chartkick](https://github.com/ankane/chartkick) within IRuby-backed jupyter notebooks\n- [ruby-gr](https://github.com/red-data-tools/GR.rb) \u0026mdash;\n  Ruby interface to [GR](https://gr-framework.org/), a framework for visualisation applications.\n  \u003csup\u003e[[dep: GR](#gr)]\u003c/sup\u003e\n\n## Interactive Computing\n\n- [iruby](https://github.com/sciruby/iruby) \u0026mdash;\n  Ruby kernel for [Jupyter](https://jupyter.org/).\n- [iruby-rails](https://github.com/mrkn/iruby-rails) \u0026mdash;\n  Integration library for IRuby and Rails.\n- [jupyter_on_rails](https://github.com/Yuki-Inoue/jupyter_on_rails/) \u0026mdash;\n  Another integration library for IRuby and Rails.\n\n\n## Input and Output\n\n### General formats\n\n- https://github.com/fiksu/rcsv\n- [ox](https://github.com/ohler55/ox) \u0026mdash;\n  Optimized for speed XML parser and object marshaller.\n- [oj](https://github.com/ohler55/oj) \u0026mdash;\n  High-speed JSON parser.\n- Markdown\n- Nokogiri\n- CSV\n\n### Database Adapters\n\n- pg\n- Mongo\n- MySQL\n\n### Domain specific formats\n\n- BibTeX\n- [inih](https://github.com/woodruffw/ruby-inih) \u0026mdash; fast C based INI parser for Ruby.\n- [bolognese](https://github.com/datacite/bolognese) \u0026mdash;\n  conversion tool for citation formats like BibTeX, RIS, or Crossref XML.\n\n\n## Provisioning Infrastructure\n\n- https://github.com/mrkn/gpu-instance\n- https://github.com/mrkn/computing_node\n- https://github.com/k1LoW/awspec\n\n## Machine Learning\n\nPlease look at our extensive [Awesome ML with Ruby][ml-with-ruby] list.\n\n## Articles, Posts, Talks, and Presentations\n\n- 2019\n  - _Parallelising ETL workflows with the Jongleur gem_ by [Fred Heath](https://github.com/RedFred7)\n  \u003csup\u003e[[post](http://bootstrap.me.uk/gems/2019/01/06/jongleur-etl.html)]\u003c/sup\u003e\n- 2018\n- 2017\n  - _Progress of Ruby-Numo: Numerical Computing for Ruby_ by [Masahiro Tanaka](https://github.com/masa16)\n    \u003csup\u003e[[slides](https://speakerdeck.com/masa16tanaka/progress-of-ruby-numo-numerical-computing-for-ruby)]\u003c/sup\u003e\n  - _Chartkick: data visualization made easy with Ruby_ by [Govind Unnikrishnan](https://twitter.com/govind_k_u)\n    \u003csup\u003e[[post](https://blog.redpanthers.co/chartkick-data-visualization-easy-ruby/)]\u003c/sup\u003e\n  - _Development of Data Science Ecosystem for Ruby_ by [Kenta Murata](https://twitter.com/mrkn)\n    \u003csup\u003e[[slides](https://speakerdeck.com/mrkn/development-of-data-science-ecosystem-for-ruby) |\n          [video](https://www.youtube.com/watch?v=U9GdgZowmGY) |\n          [page](https://rubykaigi.org/2017/presentations/mrkn.html)]\u003c/sup\u003e\n- 2016\n  - _Scientific Computation and Data Visualization with Ruby_ by [Sameer Deshmukh](https://twitter.com/v0dro)\n    \u003csup\u003e[[slides](https://www.slideshare.net/SrijanOne/webinar-scientific-computation-and-data-visualization-with-ruby) |\n          [video](https://www.youtube.com/watch?v=5970kC6MfBE)]\u003c/sup\u003e\n- 2015\n- 2014\n- 2013\n  - _Seeing the Big Picture: Quick and Dirty Data Visualization with Ruby_ by [Aja Hammerly](https://twitter.com/the_thagomizer)\n    \u003csup\u003e[[video](https://www.youtube.com/watch?v=dWPRLCU39AU) |\n          [slides](http://www.thagomizer.com/files/dataviz_windy_city_13.pdf) |\n          [code](https://github.com/thagomizer/data_visualization_talk)]\u003c/sup\u003e\n- 2012\n- 2011\n- 2010\n  - _NArray and scientific computing with Ruby_ by [Masahiro Tanaka](https://twitter.com/masa16tanaka)\n    \u003csup\u003e[[video](https://vimeo.com/14823720) |\n          [slides](https://www.slideshare.net/masa16tanaka/narray-and-scientific-computing-with-ruby)]\u003c/sup\u003e\n\n## Community\n\n- https://gitter.im/red-data-tools/en\n- https://gitter.im/red-data-tools/ja\n- http://ruby-data.org/\n- https://twitter.com/RubyData\n- https://discourse.ruby-data.org/\n\n## Related resources\n\n- [Awesome Data Science with Python](https://github.com/r0f1/datascience)\n- \u003ca name=\"imagemagic\"\u003e\u003c/a\u003e\n  [ImageMagick](https://imagemagick.org/index.php)\n- \u003ca name=\"gsl\"\u003e\u003c/a\u003e\n  [GSL](https://www.gnu.org/software/gsl/)\n- \u003ca name=\"ffte\"\u003e\u003c/a\u003e\n  [FFTE](http://www.ffte.jp/)\n- \u003ca name=\"symengine\"\u003e\u003c/a\u003e\n  [SymEngine](https://github.com/symengine/symengine)\n- [Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata#data-visualization) -\n  awesome curated list on all around Big Data.\n- [Awesome Spark](https://github.com/awesome-spark/awesome-spark) \u0026mdash;\n  awesome list on Apache Spark goodies.\n\n## Wait but why?\n\nThere are a lot of software lists with tools related to the Data Science.\nThere are a couple of lists with Ruby related projects. There are no lists of\nonly working and tested software with documented scope. We'll try to make one!\n\nWhat is awesome? Awesome are documented, maintained and focused tools.\n\nCan something turn not awesome at a point? Yes! Abandoned projects with broken\ndependencies aren't awesome any more! They leave this list.\n\n## License\n\n[![Creative Commons Zero 1.0](http://mirrors.creativecommons.org/presskit/buttons/80x15/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/) `Awesome Data Science with Ruby` by [Andrei Beliankou](https://github.com/arbox) and\n[Contributors][contributors].\n\nTo the extent possible under law, the person who associated CC0 with\n`Awesome Data Science with Ruby` has waived all copyright and related or neighboring rights\nto `Awesome Data Science with Ruby`.\n\nYou should have received a copy of the CC0 legalcode along with this\nwork. If not, see \u003chttps://creativecommons.org/publicdomain/zero/1.0/\u003e.\n\n\u003c!--- Links ---\u003e\n[ruby]: https://www.ruby-lang.org/en/\n[ml-with-ruby]: https://github.com/arbox/machine-learning-with-ruby\n[awesome]: https://github.com/sindresorhus/awesome/blob/master/awesome.md\n[change-pr]: https://github.com/RichardLitt/knowledge/blob/master/github/amending-a-commit-guide.md\n[sciruby]: https://github.com/sciruby\n[contributors]: https://github.com/arbox/data-science-with-ruby/graphs/contributors\n","funding_links":[],"categories":["Ruby","Programming Language Lists","Awesome List","Data Visualization","Data structures"],"sub_categories":["Ruby Lists","Ukraine","Text-to-Speech-to-Text","Vector search"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farbox%2Fdata-science-with-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farbox%2Fdata-science-with-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farbox%2Fdata-science-with-ruby/lists"}