Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/JordiPolo/dataframe

Package providing functionality similar to Python's Pandas or R's data.frame()
https://github.com/JordiPolo/dataframe

Last synced: 2 months ago
JSON representation

Package providing functionality similar to Python's Pandas or R's data.frame()

Awesome Lists containing this project

README

        

# Dataframe
[![Build
Status](https://travis-ci.org/JordiPolo/dataframe.svg?branch=master)](https://travis-ci.org/JordiPolo/dataframe)

DataFrame is a library that implements an API similar to [Python's Pandas](http://pandas.pydata.org/) or [R's data.frame()](http://www.r-tutor.com/r-introduction/data-frame).

## Installation

Add `dataframe` to your list of dependencies in `mix.exs`:

```
def deps do
[{:dataframe, "~> 0.1.0"}]
end
```

## Usage

### Tutorials

- [Lesson 1](tutorial/lesson1.md)

### Creation
```elixir
data = DataFrame.new(DataFrame.Table.build_random(6,4), [1,3,4,5], DataFrame.DateRange.new("2016-09-12", 6))
```

output:
```
1 3 4 5
2016-09-12 0.3216495192 0.3061978162 0.5240627861 0.3014870998
2016-09-13 0.7085624128 0.1027917034 0.0274851281 0.4999253931
2016-09-14 0.5409299230 0.7234486655 0.0902951353 0.9265397862
2016-09-15 0.8144437609 0.7566869039 0.5943981962 0.4555049347
2016-09-16 0.0228473208 0.9033617026 0.6984988237 0.9858222366
2016-09-17 0.6401066584 0.2700256640 0.4256911712 0.1085587668
```

### Exploring
```elixir
DataFrame.head(data, 2)
```
```
1 3 4 5
2016-09-12 0.3216495192 0.3061978162 0.5240627861 0.3014870998
2016-09-13 0.7085624128 0.1027917034 0.0274851281 0.4999253931
```

```elixir
DataFrame.tail(data, 1)
```
```
1 3 4 5
2016-09-17 0.6401066584 0.2700256640 0.4256911712 0.1085587668
```

```elixir
DataFrame.describe(data)
```
```
1 3 4 5
count 6 6 6 6
mean 0.6465539263 0.5159964091 0.3872831261 0.3932447202
std 0.1529956837 0.3280592207 0.1795171140 0.3121805879
min 0.4016542004 0.0206350637 0.0337014209 0.0177659020
25% 0.6282734986 0.5048574951 0.3799407685 0.2747983874
50% 0.7006870983 0.6401629955 0.4141661547 0.4043847826
75% 0.7412280866 0.6620905719 0.4517382532 0.4916518963
max 0.8024114094 0.9682031054 0.6199458675 0.8934404147
```

### Transposing

```elixir
DataFrame.transpose(data)
```
```
2016-09-12 2016-09-13 2016-09-14 2016-09-15 2016-09-16 2016-09-17
1 0.3216495192 0.7085624128 0.5409299230 0.8144437609 0.0228473208 0.6401066584
3 0.3061978162 0.1027917034 0.7234486655 0.7566869039 0.9033617026 0.2700256640
4 0.5240627861 0.0274851281 0.0902951353 0.5943981962 0.6984988237 0.4256911712
5 0.3014870998 0.4999253931 0.9265397862 0.4555049347 0.9858222366 0.1085587668
```

### Sorting

Sorting index (defaults bigger to smaller)
```elixir
DataFrame.sort_index(data)
```
```
1 3 4 5
2016-09-17 0.6401066584 0.2700256640 0.4256911712 0.1085587668
2016-09-16 0.0228473208 0.9033617026 0.6984988237 0.9858222366
2016-09-15 0.8144437609 0.7566869039 0.5943981962 0.4555049347
2016-09-14 0.5409299230 0.7234486655 0.0902951353 0.9265397862
2016-09-13 0.7085624128 0.1027917034 0.0274851281 0.4999253931
2016-09-12 0.3216495192 0.3061978162 0.5240627861 0.3014870998
```

Sorting by a column (false to sort smaller to bigger)
```elixir
DataFrame.sort_values(data, 4, false)
```
```
1 3 4 5
2016-09-13 0.7085624128 0.1027917034 0.0274851281 0.4999253931
2016-09-14 0.5409299230 0.7234486655 0.0902951353 0.9265397862
2016-09-17 0.6401066584 0.2700256640 0.4256911712 0.1085587668
2016-09-12 0.3216495192 0.3061978162 0.5240627861 0.3014870998
2016-09-15 0.8144437609 0.7566869039 0.5943981962 0.4555049347
2016-09-16 0.0228473208 0.9033617026 0.6984988237 0.9858222366
```

### Selecting

By name:
```elixir
DataFrame.loc(data, DataFrame.DateRange.new("2016-09-15", 2), [3,4])
```
```
3 4
2016-09-15 0.5417848216 0.5546980818
2016-09-16 0.6621771048 0.5763923325
```

A specific data by name:
```elixir
DataFrame.at(data, "2016-09-15", 4)
```
```
0.5546980818725673
```

By position:
```elixir
DataFrame.iloc(data, 4..6, 2..4)
```
```
4 5
2016-09-16 0.6984988237 0.9858222366
2016-09-17 0.4256911712 0.1085587668
```

```elixir
DataFrame.iat(data, 0, 0)
```
```
0.31553155828919915
```

The library is in very early stages of development. No effort has been made to optimize its performance. Expect it to be slow.

### Plotting

If you have Python and Matplotlib you can plot the data in your Dataframe.
Check out the [Explot](https://github.com/JordiPolo/explot) package for installation details.

Let's plot the cummulative sum of the values:

```
data |> DataFrame.cumsum |> DataFrame.plot
```

Will give us this graph:
![](readme_example.png)

## Development

Run tests
```
mix test
```

## TODO

- Deal with exceptions (negative numbers as input, etc.)
- Setting of subtable data
- Types of columns (no stat data on text, etc)