{"id":13600795,"url":"https://github.com/ankane/ruby-polars","last_synced_at":"2025-11-17T14:02:32.716Z","repository":{"id":63603202,"uuid":"569122226","full_name":"ankane/ruby-polars","owner":"ankane","description":"Blazingly fast DataFrames for Ruby","archived":false,"fork":false,"pushed_at":"2025-11-15T05:31:22.000Z","size":3628,"stargazers_count":954,"open_issues_count":1,"forks_count":43,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-11-15T05:44:42.906Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-11-22T05:59:48.000Z","updated_at":"2025-11-15T05:31:25.000Z","dependencies_parsed_at":"2023-10-25T20:32:13.237Z","dependency_job_id":"b70b0258-ecf7-4e53-9277-7a240123d292","html_url":"https://github.com/ankane/ruby-polars","commit_stats":{"total_commits":1240,"total_committers":11,"mean_commits":"112.72727272727273","dds":0.00887096774193552,"last_synced_commit":"9e1d056a3594534f5fe431375724747e7535b758"},"previous_names":["ankane/ruby-polars","ankane/polars-ruby"],"tags_count":36,"template":false,"template_full_name":null,"purl":"pkg:github/ankane/ruby-polars","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fruby-polars","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fruby-polars/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fruby-polars/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fruby-polars/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/ruby-polars/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fruby-polars/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284893575,"owners_count":27080531,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-17T02:00:06.431Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T18:00:48.781Z","updated_at":"2025-11-17T14:02:32.711Z","avatar_url":"https://github.com/ankane.png","language":"Ruby","funding_links":[],"categories":["Libraries","Ruby","DataFrame Libraries"],"sub_categories":[],"readme":"# Ruby Polars\n\n🔥 Blazingly fast DataFrames for Ruby, powered by [Polars](https://github.com/pola-rs/polars)\n\n[![Build Status](https://github.com/ankane/ruby-polars/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/ruby-polars/actions)\n\n## Installation\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"polars-df\"\n```\n\n## Getting Started\n\nThis library follows the [Polars Python API](https://docs.pola.rs/api/python/stable/reference/index.html).\n\n```ruby\nPolars.scan_csv(\"iris.csv\")\n  .filter(Polars.col(\"sepal_length\") \u003e 5)\n  .group_by(\"species\")\n  .agg(Polars.all.sum)\n  .collect\n```\n\nYou can follow [Polars tutorials](https://docs.pola.rs/user-guide/getting-started/) and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.\n\n## Reference\n\n- [Series](https://www.rubydoc.info/gems/polars-df/Polars/Series)\n- [DataFrame](https://www.rubydoc.info/gems/polars-df/Polars/DataFrame)\n- [LazyFrame](https://www.rubydoc.info/gems/polars-df/Polars/LazyFrame)\n\n## Examples\n\n### Creating DataFrames\n\nFrom a CSV\n\n```ruby\nPolars.read_csv(\"file.csv\")\n\n# or lazily with\nPolars.scan_csv(\"file.csv\")\n```\n\nFrom Parquet\n\n```ruby\nPolars.read_parquet(\"file.parquet\")\n\n# or lazily with\nPolars.scan_parquet(\"file.parquet\")\n```\n\nFrom Active Record\n\n```ruby\nPolars.read_database(User.all)\n# or\nPolars.read_database(\"SELECT * FROM users\")\n```\n\nFrom JSON\n\n```ruby\nPolars.read_json(\"file.json\")\n# or\nPolars.read_ndjson(\"file.ndjson\")\n\n# or lazily with\nPolars.scan_ndjson(\"file.ndjson\")\n```\n\nFrom Feather / Arrow IPC\n\n```ruby\nPolars.read_ipc(\"file.arrow\")\n\n# or lazily with\nPolars.scan_ipc(\"file.arrow\")\n```\n\nFrom Avro\n\n```ruby\nPolars.read_avro(\"file.avro\")\n```\n\nFrom Iceberg (requires [iceberg](https://github.com/ankane/iceberg-ruby)) [experimental]\n\n```ruby\nPolars.scan_iceberg(table)\n```\n\nFrom Delta Lake (requires [deltalake-rb](https://github.com/ankane/delta-ruby)) [experimental]\n\n```ruby\nPolars.read_delta(\"./table\")\n\n# or lazily with\nPolars.scan_delta(\"./table\")\n```\n\nFrom a hash\n\n```ruby\nPolars::DataFrame.new({\n  a: [1, 2, 3],\n  b: [\"one\", \"two\", \"three\"]\n})\n```\n\nFrom an array of hashes\n\n```ruby\nPolars::DataFrame.new([\n  {a: 1, b: \"one\"},\n  {a: 2, b: \"two\"},\n  {a: 3, b: \"three\"}\n])\n```\n\nFrom an array of series\n\n```ruby\nPolars::DataFrame.new([\n  Polars::Series.new(\"a\", [1, 2, 3]),\n  Polars::Series.new(\"b\", [\"one\", \"two\", \"three\"])\n])\n```\n\n## Attributes\n\nGet number of rows\n\n```ruby\ndf.height\n```\n\nGet column names\n\n```ruby\ndf.columns\n```\n\nCheck if a column exists\n\n```ruby\ndf.include?(name)\n```\n\n## Selecting Data\n\nSelect a column\n\n```ruby\ndf[\"a\"]\n```\n\nSelect multiple columns\n\n```ruby\ndf[[\"a\", \"b\"]]\n```\n\nSelect first rows\n\n```ruby\ndf.head\n```\n\nSelect last rows\n\n```ruby\ndf.tail\n```\n\n## Filtering\n\nFilter on a condition\n\n```ruby\ndf[Polars.col(\"a\") == 2]\ndf[Polars.col(\"a\") != 2]\ndf[Polars.col(\"a\") \u003e 2]\ndf[Polars.col(\"a\") \u003e= 2]\ndf[Polars.col(\"a\") \u003c 2]\ndf[Polars.col(\"a\") \u003c= 2]\n```\n\nAnd, or, and exclusive or\n\n```ruby\ndf[(Polars.col(\"a\") \u003e 1) \u0026 (Polars.col(\"b\") == \"two\")] # and\ndf[(Polars.col(\"a\") \u003e 1) | (Polars.col(\"b\") == \"two\")] # or\ndf[(Polars.col(\"a\") \u003e 1) ^ (Polars.col(\"b\") == \"two\")] # xor\n```\n\n## Operations\n\nBasic operations\n\n```ruby\ndf[\"a\"] + 5\ndf[\"a\"] - 5\ndf[\"a\"] * 5\ndf[\"a\"] / 5\ndf[\"a\"] % 5\ndf[\"a\"] ** 2\ndf[\"a\"].sqrt\ndf[\"a\"].abs\n```\n\nRounding\n\n```ruby\ndf[\"a\"].round(2)\ndf[\"a\"].ceil\ndf[\"a\"].floor\n```\n\nLogarithm\n\n```ruby\ndf[\"a\"].log # natural log\ndf[\"a\"].log(10)\n```\n\nExponentiation\n\n```ruby\ndf[\"a\"].exp\n```\n\nTrigonometric functions\n\n```ruby\ndf[\"a\"].sin\ndf[\"a\"].cos\ndf[\"a\"].tan\ndf[\"a\"].asin\ndf[\"a\"].acos\ndf[\"a\"].atan\n```\n\nHyperbolic functions\n\n```ruby\ndf[\"a\"].sinh\ndf[\"a\"].cosh\ndf[\"a\"].tanh\ndf[\"a\"].asinh\ndf[\"a\"].acosh\ndf[\"a\"].atanh\n```\n\nSummary statistics\n\n```ruby\ndf[\"a\"].sum\ndf[\"a\"].mean\ndf[\"a\"].median\ndf[\"a\"].quantile(0.90)\ndf[\"a\"].min\ndf[\"a\"].max\ndf[\"a\"].std\ndf[\"a\"].var\n```\n\n## Grouping\n\nGroup\n\n```ruby\ndf.group_by(\"a\").count\n```\n\nWorks with all summary statistics\n\n```ruby\ndf.group_by(\"a\").max\n```\n\nMultiple groups\n\n```ruby\ndf.group_by([\"a\", \"b\"]).count\n```\n\n## Combining Data Frames\n\nAdd rows\n\n```ruby\ndf.vstack(other_df)\n```\n\nAdd columns\n\n```ruby\ndf.hstack(other_df)\n```\n\nInner join\n\n```ruby\ndf.join(other_df, on: \"a\")\n```\n\nLeft join\n\n```ruby\ndf.join(other_df, on: \"a\", how: \"left\")\n```\n\n## Encoding\n\nOne-hot encoding\n\n```ruby\ndf.to_dummies\n```\n\n## Conversion\n\nArray of hashes\n\n```ruby\ndf.rows(named: true)\n```\n\nHash of series\n\n```ruby\ndf.to_h\n```\n\nCSV\n\n```ruby\ndf.to_csv\n# or\ndf.write_csv(\"file.csv\")\n```\n\nParquet\n\n```ruby\ndf.write_parquet(\"file.parquet\")\n```\n\nJSON\n\n```ruby\ndf.write_json(\"file.json\")\n# or\ndf.write_ndjson(\"file.ndjson\")\n```\n\nFeather / Arrow IPC\n\n```ruby\ndf.write_ipc(\"file.arrow\")\n```\n\nAvro\n\n```ruby\ndf.write_avro(\"file.avro\")\n```\n\nIceberg [experimental]\n\n```ruby\ndf.write_iceberg(table, mode: \"append\")\n```\n\nDelta Lake [experimental]\n\n```ruby\ndf.write_delta(\"./table\")\n```\n\nNumo array\n\n```ruby\ndf.to_numo\n```\n\n## Types\n\nYou can specify column types when creating a data frame\n\n```ruby\nPolars::DataFrame.new(data, schema: {\"a\" =\u003e Polars::Int32, \"b\" =\u003e Polars::Float32})\n```\n\nSupported types are:\n\n- boolean - `Boolean`\n- decimal - `Decimal`\n- float - `Float32`, `Float64`\n- integer - `Int8`, `Int16`, `Int32`, `Int64`, `Int128`\n- unsigned integer - `UInt8`, `UInt16`, `UInt32`, `UInt64`, `UInt128`\n- string - `String`, `Categorical`, `Enum`\n- temporal - `Date`, `Datetime`, `Duration`, `Time`\n- nested - `Array`, `List`, `Struct`\n- other - `Binary`, `Object`, `Null`, `Unknown`\n\nGet column types\n\n```ruby\ndf.schema\n```\n\nFor a specific column\n\n```ruby\ndf[\"a\"].dtype\n```\n\nCast a column\n\n```ruby\ndf[\"a\"].cast(Polars::Int32)\n```\n\n## Visualization\n\nAdd [Vega](https://github.com/ankane/vega-ruby) to your application’s Gemfile:\n\n```ruby\ngem \"vega\"\n```\n\nAnd use:\n\n```ruby\ndf.plot(\"a\", \"b\", type: \"line\")\n```\n\nSupports `line`, `pie`, `column`, `bar`, `area`, and `scatter` plots\n\nGroup data\n\n```ruby\ndf.group_by(\"c\").plot(\"a\", \"b\", type: \"line\")\n```\n\nStacked columns or bars\n\n```ruby\ndf.group_by(\"c\").plot(\"a\", \"b\", type: \"column\", stacked: true)\n```\n\nPlot a series [unreleased]\n\n```ruby\ndf[\"a\"].plot.hist\n```\n\nSupports `hist`, `kde`, and `line` plots\n\n## History\n\nView the [changelog](https://github.com/ankane/ruby-polars/blob/master/CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/ruby-polars/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/ruby-polars/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone https://github.com/ankane/ruby-polars.git\ncd ruby-polars\nbundle install\nbundle exec rake compile\nbundle exec rake test\nbundle exec rake test:docs\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fruby-polars","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Fruby-polars","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fruby-polars/lists"}