Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rocketlaunchr/dataframe-go
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
https://github.com/rocketlaunchr/dataframe-go
data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics
Last synced: 3 days ago
JSON representation
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
- Host: GitHub
- URL: https://github.com/rocketlaunchr/dataframe-go
- Owner: rocketlaunchr
- License: other
- Created: 2018-10-01T12:19:31.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-04-02T10:46:59.000Z (almost 3 years ago)
- Last Synced: 2025-01-04T12:46:49.678Z (10 days ago)
- Topics: data-science, dataframe, dataframes, go, golang, machine-learning, pandas, pandas-dataframe, python, statistics
- Language: Go
- Homepage:
- Size: 1010 KB
- Stars: 1,210
- Watchers: 37
- Forks: 95
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
- awesome-go - dataframe-go - Dataframes for machine-learning and statistics (similar to pandas). (Science and Data Analysis / HTTP Clients)
- awesome-golang-ai - dataframe-go - learning, and data manipulation/exploration. (DataFrames)
- zero-alloc-awesome-go - dataframe-go - Dataframes for machine-learning and statistics (similar to pandas). (Science and Data Analysis / HTTP Clients)
- awesome-ccamel - rocketlaunchr/dataframe-go - DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration (Go)
- go-awesome - dataframe-go - package for data statistics and manipulation (Open source library / Data Structure)
- awesome-go - dataframe-go - Dataframes for machine-learning and statistics (similar to pandas). Stars:`1.2K`. (Science and Data Analysis / HTTP Clients)
- awesome-golang-repositories - dataframe-go - learning, and data manipulation/exploration (Repositories)
- awesome-go-quant - dataframe-go - DataFrame for statistics and data manipulation/exploration (Golang / Numerical Libraries & Data Structures)
- awesome-go - dataframe-go - DataFrame for statistics and data manipulation - ★ 20 (Science and Data Analysis)
- awesome-go-extra - dataframe-go - learning, and data manipulation/exploration|873|76|12|2018-10-01T12:19:31Z|2022-04-02T10:46:59Z| (Science and Data Analysis / HTTP Clients)
README
⭐ the project to show your appreciation. :arrow_upper_right:
Dataframes are used for statistics, machine-learning, and data manipulation/exploration. You can think of a Dataframe as an excel spreadsheet.
This package is designed to be light-weight and intuitive.⚠️ The package is production ready but the API is not stable yet. Once Go 1.18 (Generics) is introduced, the **ENTIRE** package will be rewritten. For example, there will only be 1 generic Series type. After that, version `1.0.0` will be tagged.
It is recommended your package manager locks to a commit id instead of the master branch directly. ⚠️
# Features
1. Importing from CSV, JSONL, Parquet, MySQL & PostgreSQL
2. Exporting to CSV, JSONL, Excel, Parquet, MySQL & PostgreSQL
3. Developer Friendly
4. Flexible - Create custom Series (custom data types)
5. Performant
6. Interoperability with [gonum package](https://godoc.org/gonum.org/v1/gonum).
7. [pandas sub-package](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) ![Help Required](https://img.shields.io/badge/help-required-blueviolet)
8. Fake data generation
9. Interpolation (ForwardFill, BackwardFill, Linear, Spline, Lagrange)
10. Time-series Forecasting (SES, Holt-Winters)
11. Math functions
12. Plotting (cross-platform)See [Tutorial](https://github.com/rocketlaunchr/dataframe-go#tutorial) here.
## Installation
```
go get -u github.com/rocketlaunchr/dataframe-go
``````go
import dataframe "github.com/rocketlaunchr/dataframe-go"
```# DataFrames
## Creating a DataFrame
```go
s1 := dataframe.NewSeriesInt64("day", nil, 1, 2, 3, 4, 5, 6, 7, 8)
s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2, nil, nil, 84.2, 72, 89)
df := dataframe.NewDataFrame(s1, s2)fmt.Print(df.Table())
OUTPUT:
+-----+-------+---------+
| | DAY | SALES |
+-----+-------+---------+
| 0: | 1 | 50.3 |
| 1: | 2 | 23.4 |
| 2: | 3 | 56.2 |
| 3: | 4 | NaN |
| 4: | 5 | NaN |
| 5: | 6 | 84.2 |
| 6: | 7 | 72 |
| 7: | 8 | 89 |
+-----+-------+---------+
| 8X2 | INT64 | FLOAT64 |
+-----+-------+---------+```
[![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/eC5HYAEHjNI)## Insert and Remove Row
```go
df.Append(nil, 9, 123.6)
df.Append(nil, map[string]interface{}{
"day": 10,
"sales": nil,
})df.Remove(0)
OUTPUT:
+-----+-------+---------+
| | DAY | SALES |
+-----+-------+---------+
| 0: | 2 | 23.4 |
| 1: | 3 | 56.2 |
| 2: | 4 | NaN |
| 3: | 5 | NaN |
| 4: | 6 | 84.2 |
| 5: | 7 | 72 |
| 6: | 8 | 89 |
| 7: | 9 | 123.6 |
| 8: | 10 | NaN |
+-----+-------+---------+
| 9X2 | INT64 | FLOAT64 |
+-----+-------+---------+
```
[![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/xwW_410vQ2p)## Update Row
```go
df.UpdateRow(0, nil, map[string]interface{}{
"day": 3,
"sales": 45,
})```
## Sorting
```go
sks := []dataframe.SortKey{
{Key: "sales", Desc: true},
{Key: "day", Desc: true},
}df.Sort(ctx, sks)
OUTPUT:
+-----+-------+---------+
| | DAY | SALES |
+-----+-------+---------+
| 0: | 9 | 123.6 |
| 1: | 8 | 89 |
| 2: | 6 | 84.2 |
| 3: | 7 | 72 |
| 4: | 3 | 56.2 |
| 5: | 2 | 23.4 |
| 6: | 10 | NaN |
| 7: | 5 | NaN |
| 8: | 4 | NaN |
+-----+-------+---------+
| 9X2 | INT64 | FLOAT64 |
+-----+-------+---------+
```
[![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/lsJkKw3ZUJq)## Iterating
You can change the step and starting row. It may be wise to lock the DataFrame before iterating.
The returned value is a map containing the name of the series (`string`) and the index of the series (`int`) as keys.
```go
iterator := df.ValuesIterator(dataframe.ValuesOptions{0, 1, true}) // Don't apply read lock because we are write locking from outside.
df.Lock()
for {
row, vals, _ := iterator()
if row == nil {
break
}
fmt.Println(*row, vals)
}
df.Unlock()OUTPUT:
0 map[day:1 0:1 sales:50.3 1:50.3]
1 map[sales:23.4 1:23.4 day:2 0:2]
2 map[day:3 0:3 sales:56.2 1:56.2]
3 map[1: day:4 0:4 sales:]
4 map[day:5 0:5 sales: 1:]
5 map[sales:84.2 1:84.2 day:6 0:6]
6 map[day:7 0:7 sales:72 1:72]
7 map[day:8 0:8 sales:89 1:89]
```
[![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/eqjvu-vO8sr)## Statistics
You can easily calculate statistics for a Series using the [gonum](https://godoc.org/gonum.org/v1/gonum/stat) or [montanaflynn/stats](https://godoc.org/github.com/montanaflynn/stats) package.
`SeriesFloat64` and `SeriesTime` provide access to the exported `Values` field to seamlessly interoperate with external math-based packages.
### Example
Some series provide easy conversion using the `ToSeriesFloat64` method.
```go
import "gonum.org/v1/gonum/stat"s := dataframe.NewSeriesInt64("random", nil, 1, 2, 3, 4, 5, 6, 7, 8)
sf, _ := s.ToSeriesFloat64(ctx)
```### Mean
```go
mean := stat.Mean(sf.Values, nil)
```### Median
```go
import "github.com/montanaflynn/stats"
median, _ := stats.Median(sf.Values)
```### Standard Deviation
```go
std := stat.StdDev(sf.Values, nil)
```## Plotting (cross-platform)
```go
import (
chart "github.com/wcharczuk/go-chart"
"github.com/rocketlaunchr/dataframe-go/plot"
wc "github.com/rocketlaunchr/dataframe-go/plot/wcharczuk/go-chart"
)sales := dataframe.NewSeriesFloat64("sales", nil, 50.3, nil, 23.4, 56.2, 89, 32, 84.2, 72, 89)
cs, _ := wc.S(ctx, sales, nil, nil)graph := chart.Chart{Series: []chart.Series{cs}}
plt, _ := plot.Open("Monthly sales", 450, 300)
graph.Render(chart.SVG, plt)
plt.Display(plot.None)
<-plt.Closed```
Output:
## Math Functions
```go
import "github.com/rocketlaunchr/dataframe-go/math/funcs"res := 24
sx := dataframe.NewSeriesFloat64("x", nil, utils.Float64Seq(1, float64(res), 1))
sy := dataframe.NewSeriesFloat64("y", &dataframe.SeriesInit{Size: res})
df := dataframe.NewDataFrame(sx, sy)fn := funcs.RegFunc("sin(2*𝜋*x/24)")
funcs.Evaluate(ctx, df, fn, 1)
```
[![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/f4GfS2rUjaM)Output:
## Importing Data
The `imports` sub-package has support for importing csv, jsonl, parquet, and directly from a SQL database. The `DictateDataType` option can be set to specify the true underlying data type. Alternatively, `InferDataTypes` option can be set.
### CSV
```go
csvStr := `
Country,Date,Age,Amount,Id
"United States",2012-02-01,50,112.1,01234
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,17,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-05-07,NA,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United States",2012-02-01,32,321.31,54320
Spain,2012-02-01,66,555.42,00241
`
df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr))OUTPUT:
+-----+----------------+------------+-------+---------+-------+
| | COUNTRY | DATE | AGE | AMOUNT | ID |
+-----+----------------+------------+-------+---------+-------+
| 0: | United States | 2012-02-01 | 50 | 112.1 | 1234 |
| 1: | United States | 2012-02-01 | 32 | 321.31 | 54320 |
| 2: | United Kingdom | 2012-02-01 | 17 | 18.2 | 12345 |
| 3: | United States | 2012-02-01 | 32 | 321.31 | 54320 |
| 4: | United Kingdom | 2015-05-07 | NaN | 18.2 | 12345 |
| 5: | United States | 2012-02-01 | 32 | 321.31 | 54320 |
| 6: | United States | 2012-02-01 | 32 | 321.31 | 54320 |
| 7: | Spain | 2012-02-01 | 66 | 555.42 | 241 |
+-----+----------------+------------+-------+---------+-------+
| 8X5 | STRING | TIME | INT64 | FLOAT64 | INT64 |
+-----+----------------+------------+-------+---------+-------+
```
[![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/7hyUXnRy1pR)## Exporting Data
The `exports` sub-package has support for exporting to csv, jsonl, parquet, Excel and directly to a SQL database.
## Optimizations
* If you know the number of rows in advance, you can set the capacity of the underlying slice of a series using `SeriesInit{}`. This will preallocate memory and provide speed improvements.
# Generic Series
Out of the box, there is support for `string`, `time.Time`, `float64` and `int64`. Automatic support exists for `float32` and all types of integers. There is a convenience function provided for dealing with `bool`. There is also support for `complex128` inside the `xseries` subpackage.
There may be times that you want to use your own custom data types. You can either implement your own `Series` type (more performant) or use the **Generic Series** (more convenient).
## civil.Date
```go
import "time"
import "cloud.google.com/go/civil"sg := dataframe.NewSeriesGeneric("date", civil.Date{}, nil, civil.Date{2018, time.May, 01}, civil.Date{2018, time.May, 02}, civil.Date{2018, time.May, 03})
s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2)df := dataframe.NewDataFrame(sg, s2)
OUTPUT:
+-----+------------+---------+
| | DATE | SALES |
+-----+------------+---------+
| 0: | 2018-05-01 | 50.3 |
| 1: | 2018-05-02 | 23.4 |
| 2: | 2018-05-03 | 56.2 |
+-----+------------+---------+
| 3X2 | CIVIL DATE | FLOAT64 |
+-----+------------+---------+```
# Tutorial
## Create some fake data
Let's create a list of 8 "fake" employees with a name, title and base hourly wage rate.
```go
import "golang.org/x/exp/rand"
import "rocketlaunchr/dataframe-go/utils/faker"src := rand.NewSource(uint64(time.Now().UTC().UnixNano()))
df := faker.NewDataFrame(8, src, faker.S("name", 0, "Name"), faker.S("title", 0.5, "JobTitle"), faker.S("base rate", 0, "Number", 15, 50))
``````go
+-----+----------------+----------------+-----------+
| | NAME | TITLE | BASE RATE |
+-----+----------------+----------------+-----------+
| 0: | Cordia Jacobi | Consultant | 42 |
| 1: | Nickolas Emard | NaN | 22 |
| 2: | Hollis Dickens | Representative | 22 |
| 3: | Stacy Dietrich | NaN | 43 |
| 4: | Aleen Legros | Officer | 21 |
| 5: | Adelia Metz | Architect | 18 |
| 6: | Sunny Gerlach | NaN | 28 |
| 7: | Austin Hackett | NaN | 39 |
+-----+----------------+----------------+-----------+
| 8X3 | STRING | STRING | INT64 |
+-----+----------------+----------------+-----------+
```## Apply Function
Let's give a promotion to everyone by doubling their salary.
```go
s := df.Series[2]applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
return 2 * val.(int64)
})dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})
``````go
+-----+----------------+----------------+-----------+
| | NAME | TITLE | BASE RATE |
+-----+----------------+----------------+-----------+
| 0: | Cordia Jacobi | Consultant | 84 |
| 1: | Nickolas Emard | NaN | 44 |
| 2: | Hollis Dickens | Representative | 44 |
| 3: | Stacy Dietrich | NaN | 86 |
| 4: | Aleen Legros | Officer | 42 |
| 5: | Adelia Metz | Architect | 36 |
| 6: | Sunny Gerlach | NaN | 56 |
| 7: | Austin Hackett | NaN | 78 |
+-----+----------------+----------------+-----------+
| 8X3 | STRING | STRING | INT64 |
+-----+----------------+----------------+-----------+
```## Create a Time series
Let's inform all employees separately on sequential days.
```go
import "rocketlaunchr/dataframe-go/utils/utime"mts, _ := utime.NewSeriesTime(ctx, "meeting time", "1D", time.Now().UTC(), false, utime.NewSeriesTimeOptions{Size: &[]int{8}[0]})
df.AddSeries(mts, nil)
``````go
+-----+----------------+----------------+-----------+--------------------------------+
| | NAME | TITLE | BASE RATE | MEETING TIME |
+-----+----------------+----------------+-----------+--------------------------------+
| 0: | Cordia Jacobi | Consultant | 84 | 2020-02-02 23:13:53.015324 |
| | | | | +0000 UTC |
| 1: | Nickolas Emard | NaN | 44 | 2020-02-03 23:13:53.015324 |
| | | | | +0000 UTC |
| 2: | Hollis Dickens | Representative | 44 | 2020-02-04 23:13:53.015324 |
| | | | | +0000 UTC |
| 3: | Stacy Dietrich | NaN | 86 | 2020-02-05 23:13:53.015324 |
| | | | | +0000 UTC |
| 4: | Aleen Legros | Officer | 42 | 2020-02-06 23:13:53.015324 |
| | | | | +0000 UTC |
| 5: | Adelia Metz | Architect | 36 | 2020-02-07 23:13:53.015324 |
| | | | | +0000 UTC |
| 6: | Sunny Gerlach | NaN | 56 | 2020-02-08 23:13:53.015324 |
| | | | | +0000 UTC |
| 7: | Austin Hackett | NaN | 78 | 2020-02-09 23:13:53.015324 |
| | | | | +0000 UTC |
+-----+----------------+----------------+-----------+--------------------------------+
| 8X4 | STRING | STRING | INT64 | TIME |
+-----+----------------+----------------+-----------+--------------------------------+
```## Filtering
Let's filter out our senior employees (they have titles) for no reason.
```go
filterFn := dataframe.FilterDataFrameFn(func(vals map[interface{}]interface{}, row, nRows int) (dataframe.FilterAction, error) {
if vals["title"] == nil {
return dataframe.DROP, nil
}
return dataframe.KEEP, nil
})seniors, _ := dataframe.Filter(ctx, df, filterFn)
``````go
+-----+----------------+----------------+-----------+--------------------------------+
| | NAME | TITLE | BASE RATE | MEETING TIME |
+-----+----------------+----------------+-----------+--------------------------------+
| 0: | Cordia Jacobi | Consultant | 84 | 2020-02-02 23:13:53.015324 |
| | | | | +0000 UTC |
| 1: | Hollis Dickens | Representative | 44 | 2020-02-04 23:13:53.015324 |
| | | | | +0000 UTC |
| 2: | Aleen Legros | Officer | 42 | 2020-02-06 23:13:53.015324 |
| | | | | +0000 UTC |
| 3: | Adelia Metz | Architect | 36 | 2020-02-07 23:13:53.015324 |
| | | | | +0000 UTC |
+-----+----------------+----------------+-----------+--------------------------------+
| 4X4 | STRING | STRING | INT64 | TIME |
+-----+----------------+----------------+-----------+--------------------------------+
```## Other useful packages
- [awesome-svelte](https://github.com/rocketlaunchr/awesome-svelte) - Resources for killing react
- [dbq](https://github.com/rocketlaunchr/dbq) - Zero boilerplate database operations for Go
- [electron-alert](https://github.com/rocketlaunchr/electron-alert) - SweetAlert2 for Electron Applications
- [google-search](https://github.com/rocketlaunchr/google-search) - Scrape google search results
- [igo](https://github.com/rocketlaunchr/igo) - A Go transpiler with cool new syntax such as fordefer (defer for for-loops)
- [mysql-go](https://github.com/rocketlaunchr/mysql-go) - Properly cancel slow MySQL queries
- [react](https://github.com/rocketlaunchr/react) - Build front end applications using Go
- [remember-go](https://github.com/rocketlaunchr/remember-go) - Cache slow database queries
- [testing-go](https://github.com/rocketlaunchr/testing-go) - Testing framework for unit testing#
### Legal Information
The license is a modified MIT license. Refer to `LICENSE` file for more details.
**© 2018-21 PJ Engineering and Business Solutions Pty. Ltd.**