{"id":25907803,"url":"https://github.com/ankane/polars-ruby","last_synced_at":"2025-03-03T07:01:34.694Z","repository":{"id":63603202,"uuid":"569122226","full_name":"ankane/ruby-polars","owner":"ankane","description":"Blazingly fast DataFrames for Ruby","archived":false,"fork":false,"pushed_at":"2025-02-25T20:54:46.000Z","size":2559,"stargazers_count":887,"open_issues_count":1,"forks_count":39,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-01T03:26:32.105Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-22T05:59:48.000Z","updated_at":"2025-02-25T20:54:50.000Z","dependencies_parsed_at":"2023-10-25T20:32:13.237Z","dependency_job_id":"b70b0258-ecf7-4e53-9277-7a240123d292","html_url":"https://github.com/ankane/ruby-polars","commit_stats":{"total_commits":1240,"total_committers":11,"mean_commits":"112.72727272727273","dds":0.00887096774193552,"last_synced_commit":"9e1d056a3594534f5fe431375724747e7535b758"},"previous_names":["ankane/ruby-polars","ankane/polars-ruby"],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fruby-polars","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fruby-polars/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fruby-polars/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fruby-polars/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/ruby-polars/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241465615,"owners_count":19967336,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-03T07:01:17.823Z","updated_at":"2025-03-03T07:01:34.685Z","avatar_url":"https://github.com/ankane.png","language":"Ruby","readme":"# Ruby Polars\n\n🔥 Blazingly fast DataFrames for Ruby, powered by [Polars](https://github.com/pola-rs/polars)\n\n[![Build Status](https://github.com/ankane/ruby-polars/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/ruby-polars/actions)\n\n## Installation\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"polars-df\"\n```\n\n## Getting Started\n\nThis library follows the [Polars Python API](https://docs.pola.rs/api/python/stable/reference/index.html).\n\n```ruby\nPolars.scan_csv(\"iris.csv\")\n  .filter(Polars.col(\"sepal_length\") \u003e 5)\n  .group_by(\"species\")\n  .agg(Polars.all.sum)\n  .collect\n```\n\nYou can follow [Polars tutorials](https://docs.pola.rs/user-guide/getting-started/) and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.\n\n## Reference\n\n- [Series](https://www.rubydoc.info/gems/polars-df/Polars/Series)\n- [DataFrame](https://www.rubydoc.info/gems/polars-df/Polars/DataFrame)\n- [LazyFrame](https://www.rubydoc.info/gems/polars-df/Polars/LazyFrame)\n\n## Examples\n\n### Creating DataFrames\n\nFrom a CSV\n\n```ruby\nPolars.read_csv(\"file.csv\")\n\n# or lazily with\nPolars.scan_csv(\"file.csv\")\n```\n\nFrom Parquet\n\n```ruby\nPolars.read_parquet(\"file.parquet\")\n\n# or lazily with\nPolars.scan_parquet(\"file.parquet\")\n```\n\nFrom Active Record\n\n```ruby\nPolars.read_database(User.all)\n# or\nPolars.read_database(\"SELECT * FROM users\")\n```\n\nFrom JSON\n\n```ruby\nPolars.read_json(\"file.json\")\n# or\nPolars.read_ndjson(\"file.ndjson\")\n\n# or lazily with\nPolars.scan_ndjson(\"file.ndjson\")\n```\n\nFrom Feather / Arrow IPC\n\n```ruby\nPolars.read_ipc(\"file.arrow\")\n\n# or lazily with\nPolars.scan_ipc(\"file.arrow\")\n```\n\nFrom Avro\n\n```ruby\nPolars.read_avro(\"file.avro\")\n```\n\nFrom Delta Lake (requires [deltalake-rb](https://github.com/ankane/delta-ruby)) [experimental]\n\n```ruby\nPolars.read_delta(\"./table\")\n\n# or lazily with\nPolars.scan_delta(\"./table\")\n```\n\nFrom a hash\n\n```ruby\nPolars::DataFrame.new({\n  a: [1, 2, 3],\n  b: [\"one\", \"two\", \"three\"]\n})\n```\n\nFrom an array of hashes\n\n```ruby\nPolars::DataFrame.new([\n  {a: 1, b: \"one\"},\n  {a: 2, b: \"two\"},\n  {a: 3, b: \"three\"}\n])\n```\n\nFrom an array of series\n\n```ruby\nPolars::DataFrame.new([\n  Polars::Series.new(\"a\", [1, 2, 3]),\n  Polars::Series.new(\"b\", [\"one\", \"two\", \"three\"])\n])\n```\n\n## Attributes\n\nGet number of rows\n\n```ruby\ndf.height\n```\n\nGet column names\n\n```ruby\ndf.columns\n```\n\nCheck if a column exists\n\n```ruby\ndf.include?(name)\n```\n\n## Selecting Data\n\nSelect a column\n\n```ruby\ndf[\"a\"]\n```\n\nSelect multiple columns\n\n```ruby\ndf[[\"a\", \"b\"]]\n```\n\nSelect first rows\n\n```ruby\ndf.head\n```\n\nSelect last rows\n\n```ruby\ndf.tail\n```\n\n## Filtering\n\nFilter on a condition\n\n```ruby\ndf[Polars.col(\"a\") == 2]\ndf[Polars.col(\"a\") != 2]\ndf[Polars.col(\"a\") \u003e 2]\ndf[Polars.col(\"a\") \u003e= 2]\ndf[Polars.col(\"a\") \u003c 2]\ndf[Polars.col(\"a\") \u003c= 2]\n```\n\nAnd, or, and exclusive or\n\n```ruby\ndf[(Polars.col(\"a\") \u003e 1) \u0026 (Polars.col(\"b\") == \"two\")] # and\ndf[(Polars.col(\"a\") \u003e 1) | (Polars.col(\"b\") == \"two\")] # or\ndf[(Polars.col(\"a\") \u003e 1) ^ (Polars.col(\"b\") == \"two\")] # xor\n```\n\n## Operations\n\nBasic operations\n\n```ruby\ndf[\"a\"] + 5\ndf[\"a\"] - 5\ndf[\"a\"] * 5\ndf[\"a\"] / 5\ndf[\"a\"] % 5\ndf[\"a\"] ** 2\ndf[\"a\"].sqrt\ndf[\"a\"].abs\n```\n\nRounding\n\n```ruby\ndf[\"a\"].round(2)\ndf[\"a\"].ceil\ndf[\"a\"].floor\n```\n\nLogarithm\n\n```ruby\ndf[\"a\"].log # natural log\ndf[\"a\"].log(10)\n```\n\nExponentiation\n\n```ruby\ndf[\"a\"].exp\n```\n\nTrigonometric functions\n\n```ruby\ndf[\"a\"].sin\ndf[\"a\"].cos\ndf[\"a\"].tan\ndf[\"a\"].asin\ndf[\"a\"].acos\ndf[\"a\"].atan\n```\n\nHyperbolic functions\n\n```ruby\ndf[\"a\"].sinh\ndf[\"a\"].cosh\ndf[\"a\"].tanh\ndf[\"a\"].asinh\ndf[\"a\"].acosh\ndf[\"a\"].atanh\n```\n\nSummary statistics\n\n```ruby\ndf[\"a\"].sum\ndf[\"a\"].mean\ndf[\"a\"].median\ndf[\"a\"].quantile(0.90)\ndf[\"a\"].min\ndf[\"a\"].max\ndf[\"a\"].std\ndf[\"a\"].var\n```\n\n## Grouping\n\nGroup\n\n```ruby\ndf.group_by(\"a\").count\n```\n\nWorks with all summary statistics\n\n```ruby\ndf.group_by(\"a\").max\n```\n\nMultiple groups\n\n```ruby\ndf.group_by([\"a\", \"b\"]).count\n```\n\n## Combining Data Frames\n\nAdd rows\n\n```ruby\ndf.vstack(other_df)\n```\n\nAdd columns\n\n```ruby\ndf.hstack(other_df)\n```\n\nInner join\n\n```ruby\ndf.join(other_df, on: \"a\")\n```\n\nLeft join\n\n```ruby\ndf.join(other_df, on: \"a\", how: \"left\")\n```\n\n## Encoding\n\nOne-hot encoding\n\n```ruby\ndf.to_dummies\n```\n\n## Conversion\n\nArray of hashes\n\n```ruby\ndf.rows(named: true)\n```\n\nHash of series\n\n```ruby\ndf.to_h\n```\n\nCSV\n\n```ruby\ndf.to_csv\n# or\ndf.write_csv(\"file.csv\")\n```\n\nParquet\n\n```ruby\ndf.write_parquet(\"file.parquet\")\n```\n\nJSON\n\n```ruby\ndf.write_json(\"file.json\")\n# or\ndf.write_ndjson(\"file.ndjson\")\n```\n\nFeather / Arrow IPC\n\n```ruby\ndf.write_ipc(\"file.arrow\")\n```\n\nAvro\n\n```ruby\ndf.write_avro(\"file.avro\")\n```\n\nDelta Lake [experimental]\n\n```ruby\ndf.write_delta(\"./table\")\n```\n\nNumo array\n\n```ruby\ndf.to_numo\n```\n\n## Types\n\nYou can specify column types when creating a data frame\n\n```ruby\nPolars::DataFrame.new(data, schema: {\"a\" =\u003e Polars::Int32, \"b\" =\u003e Polars::Float32})\n```\n\nSupported types are:\n\n- boolean - `Boolean`\n- float - `Float64`, `Float32`\n- integer - `Int64`, `Int32`, `Int16`, `Int8`\n- unsigned integer - `UInt64`, `UInt32`, `UInt16`, `UInt8`\n- string - `String`, `Binary`, `Categorical`\n- temporal - `Date`, `Datetime`, `Time`, `Duration`\n- nested -  `List`, `Struct`, `Array`\n- other - `Object`, `Null`\n\nGet column types\n\n```ruby\ndf.schema\n```\n\nFor a specific column\n\n```ruby\ndf[\"a\"].dtype\n```\n\nCast a column\n\n```ruby\ndf[\"a\"].cast(Polars::Int32)\n```\n\n## Visualization\n\nAdd [Vega](https://github.com/ankane/vega-ruby) to your application’s Gemfile:\n\n```ruby\ngem \"vega\"\n```\n\nAnd use:\n\n```ruby\ndf.plot(\"a\", \"b\")\n```\n\nSpecify the chart type (`line`, `pie`, `column`, `bar`, `area`, or `scatter`)\n\n```ruby\ndf.plot(\"a\", \"b\", type: \"pie\")\n```\n\nGroup data\n\n```ruby\ndf.group_by(\"c\").plot(\"a\", \"b\")\n```\n\nStacked columns or bars\n\n```ruby\ndf.group_by(\"c\").plot(\"a\", \"b\", stacked: true)\n```\n\n## History\n\nView the [changelog](https://github.com/ankane/ruby-polars/blob/master/CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/ruby-polars/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/ruby-polars/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone https://github.com/ankane/ruby-polars.git\ncd ruby-polars\nbundle install\nbundle exec rake compile\nbundle exec rake test\nbundle exec rake test:docs\n```\n","funding_links":[],"categories":["Libraries/Packages/Scripts","Libraries"],"sub_categories":["Ruby"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fpolars-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Fpolars-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fpolars-ruby/lists"}