{"id":13484500,"url":"https://github.com/SciRuby/daru","last_synced_at":"2025-03-27T16:30:54.589Z","repository":{"id":21466511,"uuid":"24784998","full_name":"SciRuby/daru","owner":"SciRuby","description":"Data Analysis in RUby","archived":false,"fork":false,"pushed_at":"2023-08-15T13:17:11.000Z","size":4646,"stargazers_count":1041,"open_issues_count":92,"forks_count":140,"subscribers_count":36,"default_branch":"master","last_synced_at":"2024-10-29T15:35:05.790Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SciRuby.png","metadata":{"files":{"readme":"README.md","changelog":"History.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2014-10-04T08:23:21.000Z","updated_at":"2024-10-23T04:29:43.000Z","dependencies_parsed_at":"2022-07-27T02:32:13.183Z","dependency_job_id":"423e60d7-5be6-4bb1-a1ed-855a9938383c","html_url":"https://github.com/SciRuby/daru","commit_stats":{"total_commits":632,"total_committers":48,"mean_commits":"13.166666666666666","dds":0.4082278481012658,"last_synced_commit":"3ac3d15f3c1b2e7f5a00b61e6cc81c652d4d0250"},"previous_names":["v0dro/daru"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SciRuby%2Fdaru","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SciRuby%2Fdaru/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SciRuby%2Fdaru/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SciRuby%2Fdaru/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SciRuby","download_url":"https://codeload.github.com/SciRuby/daru/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245882289,"owners_count":20687860,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T17:01:25.310Z","updated_at":"2025-03-27T16:30:53.736Z","avatar_url":"https://github.com/SciRuby.png","language":"Ruby","readme":"# daru - Data Analysis in RUby\n\n[![Gem Version](https://badge.fury.io/rb/daru.svg)](http://badge.fury.io/rb/daru)\n[![Build Status](https://travis-ci.org/SciRuby/daru.svg?branch=master)](https://travis-ci.org/SciRuby/daru)\n[![Gitter](https://badges.gitter.im/v0dro/daru.svg)](https://gitter.im/v0dro/daru?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge)\n[![Open Source Helpers](https://www.codetriage.com/sciruby/daru/badges/users.svg)](https://www.codetriage.com/sciruby/daru)\n\n## Introduction\n\ndaru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby.\n\ndaru makes it easy and intuitive to process data predominantly through 2 data structures:\n`Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations.\nTested with MRI 2.5.1 and 2.7.1.\n\n## daru plugin gems\n\n- **[daru-view](https://github.com/SciRuby/daru-view)**\n\ndaru-view is for easy and interactive plotting in web application \u0026 IRuby \nnotebook. It can work in any Ruby web application frameworks like Rails, Sinatra, Nanoc and hopefully in others too.\n\nArticles/Blogs, that summarize powerful features of daru-view:\n\n* [GSoC 2017 daru-view](http://sciruby.com/blog/2017/09/01/gsoc-2017-data-visualization-using-daru-view/)\n* [GSoC 2018 Progress Report](https://github.com/SciRuby/daru-view/wiki/GSoC-2018---Progress-Report)\n* [HighCharts Official blog post regarding daru-view](https://www.highcharts.com/blog/post/i-am-ruby-developer-how-can-i-use-highcharts/)\n\n- **[daru-io](https://github.com/SciRuby/daru-io)**\n\nThis gem extends support for many Import and Export methods of `Daru::DataFrame`. This gem is intended to help Rubyists who are into Data Analysis or Web Development, by serving as a general purpose conversion library that takes input in one format (say, JSON) and converts it another format (say, Avro) while also making it incredibly easy to getting started on analyzing data with daru. One can read more in [SciRuby/blog/daru-io](http://sciruby.com/blog/2017/08/29/gsoc-2017-support-to-import-export-of-more-formats/).\n\n\n## Features\n\n* Data structures:\n    - Vector - A basic 1-D vector.\n    - DataFrame - A 2-D spreadsheet-like structure for manipulating and storing data sets. This is daru's primary data structure.\n* Compatible with [IRuby notebook](https://github.com/SciRuby/iruby), [statsample](https://github.com/SciRuby/statsample), [statsample-glm](https://github.com/SciRuby/statsample-glm) and [statsample-timeseries](https://github.com/SciRuby/statsample-timeseries).\n* Support for time series.\n* Singly and hierarchically indexed data structures.\n* Flexible and intuitive API for manipulation and analysis of data.\n* Easy plotting, statistics and arithmetic.\n* Plentiful iterators.\n* Optional speed and space optimization on MRI with [NMatrix](https://github.com/SciRuby/nmatrix) and GSL.\n* Easy splitting, aggregation and grouping of data.\n* Quickly reducing data with pivot tables for quick data summary.\n* Import and export data from and to Excel, CSV, SQL Databases, ActiveRecord and plain text files.\n\n## Installation\n\n```console\n$ gem install daru\n```\n\n## Notebooks\n\n#### Notebooks on most use cases\n\n* [Overview of most daru functions](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Daru%20Demo.ipynb)\n* [Basic Creation of Vectors and DataFrame](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Creation%20of%20Vector%20and%20DataFrame.ipynb)\n* [Detailed Usage of Daru::Vector](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Usage%20of%20Vector.ipynb)\n* [Detailed Usage of Daru::DataFrame](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Usage%20of%20DataFrame.ipynb)\n* [Searching and combining data in daru](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Searching%20and%20Combining%20Data.ipynb)\n* [Grouping, Splitting and Pivoting Data](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Grouping%2C%20Splitting%20and%20Pivoting.ipynb)\n* [Usage of Categorical Data](http://nbviewer.jupyter.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Categorical%20Data/Categorical%20Data.ipynb)\n\n#### Visualization\n* [Visualizing Data With Daru::DataFrame](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Visualization/Visualizing%20data%20with%20daru%20DataFrame.ipynb)\n* [Plotting using Nyaplot](http://nbviewer.jupyter.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Plotting/Visualization.ipynb)\n* [Plotting using GnuplotRB](http://nbviewer.jupyter.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Plotting/Gnuplotrb.ipynb)\n* [Vector plotting with Gruff](http://nbviewer.jupyter.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Plotting/Gruff%20Vector.ipynb)\n* [DataFrame plotting with Gruff](http://nbviewer.jupyter.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Plotting/Gruff%20DataFrame.ipynb)\n\n#### Notebooks on Time series\n\n* [Basic Time Series](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Basic%20Time%20Series.ipynb)\n* [Time Series Analysis and Plotting](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Time%20Series%20Functions.ipynb)\n\n#### Notebooks on Indexing\n* [Indexing in Vector](http://nbviewer.jupyter.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Categorical%20Data/Indexing%20in%20Vector.ipynb)\n* [Indexing in DataFrame](http://nbviewer.jupyter.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Categorical%20Data/Indexing%20in%20DataFrame.ipynb)\n\n### Case Studies\n\n* [Logistic Regression Analysis with daru and statsample-glm](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Logistic%20Regression%20with%20daru%20and%20statsample-glm.ipynb)\n* [Finding and Plotting most heard artists from a Last.fm dataset](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Finding%20and%20plotting%20the%20most%20heard%20artists%20on%20last%20fm.ipynb)\n* [Analyzing baby names with daru](http://nbviewer.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Analyzing%20baby%20names/Use%20Case%20-%20Daru%20for%20analyzing%20baby%20names%20data.ipynb)\n* [Example usage of Categorical Data](http://nbviewer.jupyter.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Categorical%20Data/examples/%5BExample%5D%20Categorical%20Data.ipynb)\n* [Example usage of Categorical Index](http://nbviewer.jupyter.org/github/SciRuby/sciruby-notebooks/blob/master/Data%20Analysis/Categorical%20Data/examples/%5BExample%5D%20Categorical%20Index.ipynb)\n\n## Blog Posts\n\n* [Data Analysis in RUby: Basic data manipulation and plotting](http://v0dro.github.io/blog/2014/11/25/data-analysis-in-ruby-basic-data-manipulation-and-plotting/)\n* [Data Analysis in RUby: Splitting, sorting, aggregating data and data types](http://v0dro.github.io/blog/2015/02/24/data-analysis-in-ruby-part-2/)\n* [Finding and Combining data in daru](http://v0dro.github.io/blog/2015/08/03/finding-and-combining-data-in-daru/)\n* [Introduction to analyzing datasets with daru library](http://gafur.me/2018/02/05/analysing-datasets-with-daru-library.html)\n\n### Time series\n\n* [Analysis of Time Series in daru](http://v0dro.github.io/blog/2015/07/31/analysis-of-time-series-in-daru/)\n* [Date Offsets in Daru](http://v0dro.github.io/blog/2015/07/27/date-offsets-in-daru/)\n\n### Categorical Data\n\n* [Categorical Index](http://lokeshh.github.io/gsoc2016/blog/2016/06/14/categorical-index/)\n* [Categorical Data](http://lokeshh.github.io/gsoc2016/blog/2016/06/21/categorical-data/)\n* [Visualization with Categorical Data](http://lokeshh.github.io/gsoc2016/blog/2016/07/02/visualization/)\n\n## Basic Usage\n\ndaru exposes two major data structures: `DataFrame` and `Vector`. The Vector is a basic 1-D structure corresponding to a labelled Array, while the `DataFrame` - daru's primary data structure - is 2-D spreadsheet-like structure for manipulating and storing data sets.\n\nBasic DataFrame intitialization.\n\n``` ruby\ndata_frame = Daru::DataFrame.new(\n  {\n    'Beer' =\u003e ['Kingfisher', 'Snow', 'Bud Light', 'Tiger Beer', 'Budweiser'],\n    'Gallons sold' =\u003e [500, 400, 450, 200, 250]\n  },\n  index: ['India', 'China', 'USA', 'Malaysia', 'Canada']\n)\ndata_frame\n```\n![init0](images/init0.png)\n\n\nLoad data from CSV files.\n``` ruby\ndf = Daru::DataFrame.from_csv('TradeoffData.csv')\n```\n![init1](images/init1.png)\n\n*Basic Data Manipulation*\n\nSelecting rows.\n``` ruby\ndata_frame.row['USA']\n```\n![man0](images/man0.png)\n\nSelecting columns.\n``` ruby\ndata_frame['Beer']\n```\n![man1](images/man1.png)\n\nA range of rows.\n``` ruby\ndata_frame.row['India'..'USA']\n```\n![man2](images/man2.png)\n\nThe first 2 rows.\n``` ruby\ndata_frame.first(2)\n```\n![man3](images/man3.png)\n\nThe last 2 rows.\n``` ruby\ndata_frame.last(2)\n```\n![man4](images/man4.png)\n\nAdding a new column.\n``` ruby\ndata_frame['Gallons produced'] = [550, 500, 600, 210, 240]\n```\n![man5](images/man5.png)\n\nCreating a new column based on data in other columns.\n``` ruby\ndata_frame['Demand supply gap'] = data_frame['Gallons produced'] - data_frame['Gallons sold']\n```\n![man6](images/man6.png)\n\n*Condition based selection*\n\nSelecting countries based on the number of gallons sold in each. We use a syntax similar to that defined by [Arel](https://github.com/rails/arel), i.e. by using the `where` clause.\n``` ruby\ndata_frame.where(data_frame['Gallons sold'].lt(300))\n```\n![con0](images/con0.png)\n\nYou can pass a combination of boolean operations into the `#where` method and it should work fine:\n``` ruby\ndata_frame.where(\n  data_frame['Beer']\n  .in(['Snow', 'Kingfisher','Tiger Beer'])\n  .and(\n    data_frame['Gallons produced'].gt(520).or(data_frame['Gallons produced'].lt(250))\n  )\n)\n```\n![con1](images/con1.png)\n\n*Plotting*\n\nDaru supports plotting of interactive graphs with [nyaplot](https://github.com/domitry/nyaplot). You can easily create a plot with the `#plot` method. Here we plot the gallons sold on the Y axis and name of the brand on the X axis in a bar graph.\n``` ruby\ndata_frame.plot type: :bar, x: 'Beer', y: 'Gallons sold' do |plot, diagram|\n  plot.x_label \"Beer\"\n  plot.y_label \"Gallons Sold\"\n  plot.yrange [0,600]\n  plot.width 500\n  plot.height 400\nend\n```\n![plot0](images/plot0.png)\n\nIn addition to nyaplot, daru also supports plotting out of the box with [gnuplotrb](https://github.com/SciRuby/gnuplotrb).\n\n## Documentation\n\nDocs can be found [here](http://www.rubydoc.info/gems/daru).\n\n## Contributing\n\nPick a feature from the Roadmap or the issue tracker or think of your own and send me a Pull Request!\n\nFor details see [CONTRIBUTING](https://github.com/SciRuby/daru/blob/master/CONTRIBUTING.md).\n\n## Acknowledgements\n\n* Google and the Ruby Science Foundation for the Google Summer of Code 2016 grant for speed enhancements and implementation of support for categorical data. Special thanks to [@lokeshh](https://github.com/lokeshh), [@zverok](https://github.com/zverok) and [@agisga](https://github.com/agisga) for their efforts.\n* Google and the Ruby Science Foundation for the Google Summer of Code 2015 grant for further developing daru and integrating it with other ruby gems.\n* Thank you [last.fm](http://www.last.fm/) for making user data accessible to the public.\n\nCopyright (c) 2015, Sameer Deshmukh\nAll rights reserved\n","funding_links":[],"categories":["Scientific","Data Structures","Ruby","Libraries"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSciRuby%2Fdaru","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSciRuby%2Fdaru","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSciRuby%2Fdaru/lists"}