Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ankane/ruby-polars
Blazingly fast DataFrames for Ruby
https://github.com/ankane/ruby-polars
Last synced: 1 day ago
JSON representation
Blazingly fast DataFrames for Ruby
- Host: GitHub
- URL: https://github.com/ankane/ruby-polars
- Owner: ankane
- License: other
- Created: 2022-11-22T05:59:48.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2025-01-02T17:19:41.000Z (17 days ago)
- Last Synced: 2025-01-10T21:02:31.524Z (8 days ago)
- Language: Ruby
- Size: 2.16 MB
- Stars: 872
- Watchers: 12
- Forks: 35
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-dataframes - polars-ruby - Blazingly fast DataFrames for Ruby. (Libraries)
README
# Ruby Polars
:fire: Blazingly fast DataFrames for Ruby, powered by [Polars](https://github.com/pola-rs/polars)
[![Build Status](https://github.com/ankane/ruby-polars/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/ruby-polars/actions)
## Installation
Add this line to your application’s Gemfile:
```ruby
gem "polars-df"
```## Getting Started
This library follows the [Polars Python API](https://docs.pola.rs/api/python/stable/reference/index.html).
```ruby
Polars.scan_csv("iris.csv")
.filter(Polars.col("sepal_length") > 5)
.group_by("species")
.agg(Polars.all.sum)
.collect
```You can follow [Polars tutorials](https://docs.pola.rs/user-guide/getting-started/) and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
## Reference
- [Series](https://www.rubydoc.info/gems/polars-df/Polars/Series)
- [DataFrame](https://www.rubydoc.info/gems/polars-df/Polars/DataFrame)
- [LazyFrame](https://www.rubydoc.info/gems/polars-df/Polars/LazyFrame)## Examples
### Creating DataFrames
From a CSV
```ruby
Polars.read_csv("file.csv")# or lazily with
Polars.scan_csv("file.csv")
```From Parquet
```ruby
Polars.read_parquet("file.parquet")# or lazily with
Polars.scan_parquet("file.parquet")
```From Active Record
```ruby
Polars.read_database(User.all)
# or
Polars.read_database("SELECT * FROM users")
```From JSON
```ruby
Polars.read_json("file.json")
# or
Polars.read_ndjson("file.ndjson")# or lazily with
Polars.scan_ndjson("file.ndjson")
```From Feather / Arrow IPC
```ruby
Polars.read_ipc("file.arrow")# or lazily with
Polars.scan_ipc("file.arrow")
```From Avro
```ruby
Polars.read_avro("file.avro")
```From Delta Lake (requires [deltalake-rb](https://github.com/ankane/delta-ruby)) [experimental]
```ruby
Polars.read_delta("./table")# or lazily with
Polars.scan_delta("./table")
```From a hash
```ruby
Polars::DataFrame.new({
a: [1, 2, 3],
b: ["one", "two", "three"]
})
```From an array of hashes
```ruby
Polars::DataFrame.new([
{a: 1, b: "one"},
{a: 2, b: "two"},
{a: 3, b: "three"}
])
```From an array of series
```ruby
Polars::DataFrame.new([
Polars::Series.new("a", [1, 2, 3]),
Polars::Series.new("b", ["one", "two", "three"])
])
```## Attributes
Get number of rows
```ruby
df.height
```Get column names
```ruby
df.columns
```Check if a column exists
```ruby
df.include?(name)
```## Selecting Data
Select a column
```ruby
df["a"]
```Select multiple columns
```ruby
df[["a", "b"]]
```Select first rows
```ruby
df.head
```Select last rows
```ruby
df.tail
```## Filtering
Filter on a condition
```ruby
df[Polars.col("a") == 2]
df[Polars.col("a") != 2]
df[Polars.col("a") > 2]
df[Polars.col("a") >= 2]
df[Polars.col("a") < 2]
df[Polars.col("a") <= 2]
```And, or, and exclusive or
```ruby
df[(Polars.col("a") > 1) & (Polars.col("b") == "two")] # and
df[(Polars.col("a") > 1) | (Polars.col("b") == "two")] # or
df[(Polars.col("a") > 1) ^ (Polars.col("b") == "two")] # xor
```## Operations
Basic operations
```ruby
df["a"] + 5
df["a"] - 5
df["a"] * 5
df["a"] / 5
df["a"] % 5
df["a"] ** 2
df["a"].sqrt
df["a"].abs
```Rounding
```ruby
df["a"].round(2)
df["a"].ceil
df["a"].floor
```Logarithm
```ruby
df["a"].log # natural log
df["a"].log(10)
```Exponentiation
```ruby
df["a"].exp
```Trigonometric functions
```ruby
df["a"].sin
df["a"].cos
df["a"].tan
df["a"].asin
df["a"].acos
df["a"].atan
```Hyperbolic functions
```ruby
df["a"].sinh
df["a"].cosh
df["a"].tanh
df["a"].asinh
df["a"].acosh
df["a"].atanh
```Summary statistics
```ruby
df["a"].sum
df["a"].mean
df["a"].median
df["a"].quantile(0.90)
df["a"].min
df["a"].max
df["a"].std
df["a"].var
```## Grouping
Group
```ruby
df.group_by("a").count
```Works with all summary statistics
```ruby
df.group_by("a").max
```Multiple groups
```ruby
df.group_by(["a", "b"]).count
```## Combining Data Frames
Add rows
```ruby
df.vstack(other_df)
```Add columns
```ruby
df.hstack(other_df)
```Inner join
```ruby
df.join(other_df, on: "a")
```Left join
```ruby
df.join(other_df, on: "a", how: "left")
```## Encoding
One-hot encoding
```ruby
df.to_dummies
```## Conversion
Array of hashes
```ruby
df.rows(named: true)
```Hash of series
```ruby
df.to_h
```CSV
```ruby
df.to_csv
# or
df.write_csv("file.csv")
```Parquet
```ruby
df.write_parquet("file.parquet")
```JSON
```ruby
df.write_json("file.json")
# or
df.write_ndjson("file.ndjson")
```Feather / Arrow IPC
```ruby
df.write_ipc("file.arrow")
```Avro
```ruby
df.write_avro("file.avro")
```Delta Lake [experimental]
```ruby
df.write_delta("./table")
```Numo array
```ruby
df.to_numo
```## Types
You can specify column types when creating a data frame
```ruby
Polars::DataFrame.new(data, schema: {"a" => Polars::Int32, "b" => Polars::Float32})
```Supported types are:
- boolean - `Boolean`
- float - `Float64`, `Float32`
- integer - `Int64`, `Int32`, `Int16`, `Int8`
- unsigned integer - `UInt64`, `UInt32`, `UInt16`, `UInt8`
- string - `String`, `Binary`, `Categorical`
- temporal - `Date`, `Datetime`, `Time`, `Duration`
- nested - `List`, `Struct`, `Array`
- other - `Object`, `Null`Get column types
```ruby
df.schema
```For a specific column
```ruby
df["a"].dtype
```Cast a column
```ruby
df["a"].cast(Polars::Int32)
```## Visualization
Add [Vega](https://github.com/ankane/vega-ruby) to your application’s Gemfile:
```ruby
gem "vega"
```And use:
```ruby
df.plot("a", "b")
```Specify the chart type (`line`, `pie`, `column`, `bar`, `area`, or `scatter`)
```ruby
df.plot("a", "b", type: "pie")
```Group data
```ruby
df.group_by("c").plot("a", "b")
```Stacked columns or bars
```ruby
df.group_by("c").plot("a", "b", stacked: true)
```## History
View the [changelog](CHANGELOG.md)
## Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- [Report bugs](https://github.com/ankane/ruby-polars/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/ruby-polars/pulls)
- Write, clarify, or fix documentation
- Suggest or add new featuresTo get started with development:
```sh
git clone https://github.com/ankane/ruby-polars.git
cd ruby-polars
bundle install
bundle exec rake compile
bundle exec rake test
bundle exec rake test:docs
```