https://github.com/ptiger10/pd

A fast, tested, and predictable way to clean, aggregate, and transform data
https://github.com/ptiger10/pd

analytics data go spreadsheet

Last synced: 23 days ago
JSON representation

A fast, tested, and predictable way to clean, aggregate, and transform data

Host: GitHub
URL: https://github.com/ptiger10/pd
Owner: ptiger10
License: mit
Created: 2019-04-15T20:39:15.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2019-07-20T03:00:20.000Z (over 6 years ago)
Last Synced: 2025-08-15T16:16:27.009Z (6 months ago)
Topics: analytics, data, go, spreadsheet
Language: Go
Homepage:
Size: 6.79 MB
Stars: 35
Watchers: 4
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # pd

[![Go Report Card](https://goreportcard.com/badge/github.com/ptiger10/pd)](https://goreportcard.com/report/github.com/ptiger10/pd) 

[![GoDoc](https://godoc.org/github.com/ptiger10/pd?status.svg)](https://godoc.org/github.com/ptiger10/pd) 

[![Build Status](https://travis-ci.org/ptiger10/pd.svg?branch=master)](https://travis-ci.org/ptiger10/pd)

[![codecov](https://codecov.io/gh/ptiger10/pd/branch/master/graph/badge.svg)](https://codecov.io/gh/ptiger10/pd)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

pd (informally known as "GoPandas") is a library for cleaning, aggregating, and transforming data using Series and DataFrames. GoPandas combines a flexible API familiar to Python pandas users with the qualities of Go, including type safety, predictable error handling, and fast concurrent processing.

The API is still version 0 and subject to major revisions. Use in production code at your own risk.

Some notable features of GoPandas:

* flexible constructor that supports float, int, string, bool, time.Time, and interface Series

* seamlessly handles null data and type conversions

* well-suited to either the Jupyter notebook style of data exploration or conventional programming

* advanced filtering, grouping, and pivoting

* hierarchical indexing (i.e., multi-level indexes and columns)

* reads from either CSV or any spreadsheet or tabular data structured as [][]interface (e.g., Google Sheets)

* complete test coverage

* minimal dependencies (total package size is <10MB, compared to Pandas at >200MB)

* uses concurrent processing to achieve faster speeds than Pandas on many fundamental operations, and the performance differential becomes more pronounced with scale (6x+ superior performance summing two columns in a 500k row spreadsheet - see the most recent [benchmarking table](benchmarking/profiler/comparison_summary.txt)

## Getting Started

Check out the Jupyter notebook examples in the [guides](https://github.com/ptiger10/pd/tree/master/guides). Github sometimes has trouble rendering .ipynb, backup views are here: [Series](https://nbviewer.jupyter.org/github/ptiger10/pd/blob/master/guides/Series.ipynb?flush_cache=true), [DataFrame](https://nbviewer.jupyter.org/github/ptiger10/pd/blob/master/guides/DataFrame.ipynb?flush_cache=true), [Options](https://nbviewer.jupyter.org/github/ptiger10/pd/blob/master/guides/Options.ipynb?flush_cache=true).

To run the Jupyter notebooks yourself, I recommend lgo (Docker required)

* `cd guides/docker`

* start: `./up.sh`

* stop: `./down.sh`

* rebuild package to newest version: `./up.sh -r`

## Replicating Benchmark Tests

* Requires Python 3.x and pandas

* Download data from [here](https://github.com/ptiger10/pdTestData) and save in benchmarking/profiler

* `go run -tags=benchmarks benchmarking/profiler/main.go`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ptiger10/pd

Awesome Lists containing this project

README