{"id":13600789,"url":"https://github.com/ankane/rover","last_synced_at":"2025-11-17T14:05:12.180Z","repository":{"id":40153038,"uuid":"263795414","full_name":"ankane/rover","owner":"ankane","description":"Simple, powerful data frames for Ruby","archived":false,"fork":false,"pushed_at":"2024-12-29T20:24:22.000Z","size":223,"stargazers_count":363,"open_issues_count":2,"forks_count":18,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-04-13T13:18:40.049Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-14T02:30:50.000Z","updated_at":"2025-03-26T18:29:18.000Z","dependencies_parsed_at":"2025-02-28T08:18:44.545Z","dependency_job_id":"1939c124-6ed5-4ca0-8f16-0da8abd19126","html_url":"https://github.com/ankane/rover","commit_stats":{"total_commits":294,"total_committers":7,"mean_commits":42.0,"dds":"0.41156462585034015","last_synced_commit":"6f6de6ed8a55ff7da8b9d70fa290530ae1d2f7f9"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Frover","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Frover/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Frover/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Frover/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/rover/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717239,"owners_count":21150390,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T18:00:48.662Z","updated_at":"2025-11-17T14:05:12.173Z","avatar_url":"https://github.com/ankane.png","language":"Ruby","funding_links":[],"categories":["Data Structures","Ruby","Libraries"],"sub_categories":[],"readme":"# Rover\n\nSimple, powerful data frames for Ruby\n\n:mountain: Designed for data exploration and machine learning, and powered by [Numo](https://github.com/ruby-numo/numo-narray)\n\n:evergreen_tree: Uses [Vega](https://github.com/ankane/vega) for visualization\n\n[![Build Status](https://github.com/ankane/rover/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/rover/actions)\n\n## Installation\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"rover-df\"\n```\n\n## Intro\n\nA data frame is an in-memory table. It’s a useful data structure for data analysis and machine learning. It uses columnar storage for fast operations on columns.\n\nTry it out for forecasting by clicking the button below (it can take a few minutes to start):\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ankane/ml-stack/master?filepath=Forecasting.ipynb)\n\nUse the `Run` button (or `SHIFT` + `ENTER`) to run each line.\n\n## Creating Data Frames\n\nFrom an array\n\n```ruby\nRover::DataFrame.new([\n  {a: 1, b: \"one\"},\n  {a: 2, b: \"two\"},\n  {a: 3, b: \"three\"}\n])\n```\n\nFrom a hash\n\n```ruby\nRover::DataFrame.new({\n  a: [1, 2, 3],\n  b: [\"one\", \"two\", \"three\"]\n})\n```\n\nFrom Active Record\n\n```ruby\nRover::DataFrame.new(User.all)\n```\n\nFrom a CSV\n\n```ruby\nRover.read_csv(\"file.csv\")\n# or\nRover.parse_csv(\"CSV,data,string\")\n```\n\nFrom Parquet (requires the [red-parquet](https://github.com/apache/arrow/tree/master/ruby/red-parquet) gem)\n\n```ruby\nRover.read_parquet(\"file.parquet\")\n# or\nRover.parse_parquet(\"PAR1...\")\n```\n\n## Attributes\n\nGet number of rows\n\n```ruby\ndf.count\n```\n\nGet column names\n\n```ruby\ndf.keys\n```\n\nCheck if a column exists\n\n```ruby\ndf.include?(name)\n```\n\n## Selecting Data\n\nSelect a column\n\n```ruby\ndf[:a]\n```\n\nSelect multiple columns\n\n```ruby\ndf[[:a, :b]]\n```\n\nSelect first rows\n\n```ruby\ndf.head\n# or\ndf.first(5)\n```\n\nSelect last rows\n\n```ruby\ndf.tail\n# or\ndf.last(5)\n```\n\nSelect rows by index\n\n```ruby\ndf[1]\n# or\ndf[1..3]\n# or\ndf[[1, 4, 5]]\n```\n\nIterate over rows\n\n```ruby\ndf.each_row { |row| ... }\n```\n\nIterate over a column\n\n```ruby\ndf[:a].each { |item| ... }\n# or\ndf[:a].each_with_index { |item, index| ... }\n```\n\n## Filtering\n\nFilter on a condition\n\n```ruby\ndf[df[:a] == 100]\ndf[df[:a] != 100]\ndf[df[:a] \u003e 100]\ndf[df[:a] \u003e= 100]\ndf[df[:a] \u003c 100]\ndf[df[:a] \u003c= 100]\n```\n\nIn\n\n```ruby\ndf[df[:a].in?([1, 2, 3])]\ndf[df[:a].in?(1..3)]\ndf[df[:a].in?([\"a\", \"b\", \"c\"])]\n```\n\nNot in\n\n```ruby\ndf[!df[:a].in?([1, 2, 3])]\n```\n\nAnd, or, and exclusive or\n\n```ruby\ndf[(df[:a] \u003e 100) \u0026 (df[:b] == \"one\")] # and\ndf[(df[:a] \u003e 100) | (df[:b] == \"one\")] # or\ndf[(df[:a] \u003e 100) ^ (df[:b] == \"one\")] # xor\n```\n\n## Operations\n\nBasic operations\n\n```ruby\ndf[:a] + 5\ndf[:a] - 5\ndf[:a] * 5\ndf[:a] / 5\ndf[:a] % 5\ndf[:a] ** 2\ndf[:a].sqrt\ndf[:a].cbrt\ndf[:a].abs\n```\n\nRounding\n\n```ruby\ndf[:a].round\ndf[:a].ceil\ndf[:a].floor\n```\n\nLogarithm\n\n```ruby\ndf[:a].ln # or log\ndf[:a].log(5)\ndf[:a].log10\ndf[:a].log2\n```\n\nExponentiation\n\n```ruby\ndf[:a].exp\ndf[:a].exp2\n```\n\nTrigonometric functions\n\n```ruby\ndf[:a].sin\ndf[:a].cos\ndf[:a].tan\ndf[:a].asin\ndf[:a].acos\ndf[:a].atan\n```\n\nHyperbolic functions\n\n```ruby\ndf[:a].sinh\ndf[:a].cosh\ndf[:a].tanh\ndf[:a].asinh\ndf[:a].acosh\ndf[:a].atanh\n```\n\nError function\n\n```ruby\ndf[:a].erf\ndf[:a].erfc\n```\n\nSummary statistics\n\n```ruby\ndf[:a].count\ndf[:a].sum\ndf[:a].mean\ndf[:a].median\ndf[:a].percentile(90)\ndf[:a].min\ndf[:a].max\ndf[:a].std\ndf[:a].var\n```\n\nCount occurrences\n\n```ruby\ndf[:a].tally\n```\n\nCross tabulation\n\n```ruby\ndf[:a].crosstab(df[:b])\n```\n\n## Grouping\n\nGroup\n\n```ruby\ndf.group(:a).count\n```\n\nWorks with all summary statistics\n\n```ruby\ndf.group(:a).max(:b)\n```\n\nMultiple groups\n\n```ruby\ndf.group(:a, :b).count\n```\n\n## Visualization\n\nAdd [Vega](https://github.com/ankane/vega) to your application’s Gemfile:\n\n```ruby\ngem \"vega\"\n```\n\nAnd use:\n\n```ruby\ndf.plot(:a, :b)\n```\n\nSpecify the chart type (`line`, `pie`, `column`, `bar`, `area`, or `scatter`)\n\n```ruby\ndf.plot(:a, :b, type: \"pie\")\n```\n\nGroup data\n\n```ruby\ndf.plot(:a, :b, group: :c)\n```\n\nStacked columns or bars\n\n```ruby\ndf.plot(:a, :b, group: :c, stacked: true)\n```\n\n## Updating Data\n\nAdd a new column\n\n```ruby\ndf[:a] = 1\n# or\ndf[:a] = [1, 2, 3]\n```\n\nUpdate a single element\n\n```ruby\ndf[:a][0] = 100\n```\n\nUpdate multiple elements\n\n```ruby\ndf[:a][0..2] = 1\n# or\ndf[:a][0..2] = [1, 2, 3]\n```\n\nUpdate all elements\n\n```ruby\ndf[:a] = df[:a].map { |v| v.gsub(\"a\", \"b\") }\n# or\ndf[:a].map! { |v| v.gsub(\"a\", \"b\") }\n```\n\nUpdate elements matching a condition\n\n```ruby\ndf[:a][df[:a] \u003e 100] = 0\n```\n\nClamp\n\n```ruby\ndf[:a].clamp!(0, 100)\n```\n\nDelete columns\n\n```ruby\ndf.delete(:a)\n# or\ndf.except!(:a, :b)\n```\n\nRename columns\n\n```ruby\ndf.rename(a: :new_a, b: :new_b)\n# or\ndf[:new_a] = df.delete(:a)\n```\n\nSort rows\n\n```ruby\ndf.sort_by! { |r| r[:a] }\n```\n\nClear all data\n\n```ruby\ndf.clear\n```\n\n## Combining Data Frames\n\nAdd rows\n\n```ruby\ndf.concat(other_df)\n```\n\nAdd columns\n\n```ruby\ndf.merge!(other_df)\n```\n\nInner join\n\n```ruby\ndf.inner_join(other_df)\n# or\ndf.inner_join(other_df, on: :a)\n# or\ndf.inner_join(other_df, on: [:a, :b])\n# or\ndf.inner_join(other_df, on: {df_col: :other_df_col})\n```\n\nLeft join\n\n```ruby\ndf.left_join(other_df)\n```\n\n## Encoding\n\nOne-hot encoding\n\n```ruby\ndf.one_hot\n```\n\nDrop a variable in each category to avoid the dummy variable trap\n\n```ruby\ndf.one_hot(drop: true)\n```\n\n## Conversion\n\nArray of hashes\n\n```ruby\ndf.to_a\n```\n\nHash of arrays\n\n```ruby\ndf.to_h\n```\n\nNumo array\n\n```ruby\ndf.to_numo\n```\n\nCSV\n\n```ruby\ndf.to_csv\n```\n\nParquet (requires the [red-parquet](https://github.com/apache/arrow/tree/master/ruby/red-parquet) gem)\n\n```ruby\ndf.to_parquet\n```\n\n## Types\n\nYou can specify column types when creating a data frame\n\n```ruby\nRover::DataFrame.new(data, types: {\"a\" =\u003e :int64, \"b\" =\u003e :float64})\n```\n\nOr\n\n```ruby\nRover.read_csv(\"data.csv\", types: {\"a\" =\u003e :int64, \"b\" =\u003e :float64})\n```\n\nSupported types are:\n\n- boolean - `:bool`\n- float - `:float64`, `:float32`\n- integer - `:int64`, `:int32`, `:int16`, `:int8`\n- unsigned integer - `:uint64`, `:uint32`, `:uint16`, `:uint8`\n- object - `:object`\n\nGet column types\n\n```ruby\ndf.types\n```\n\nFor a specific column\n\n```ruby\ndf[:a].type\n```\n\nChange the type of a column\n\n```ruby\ndf[:a].to!(:int32)\n```\n\n## History\n\nView the [changelog](https://github.com/ankane/rover/blob/master/CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/rover/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/rover/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone https://github.com/ankane/rover.git\ncd rover\nbundle install\nbundle exec rake test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Frover","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Frover","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Frover/lists"}