An open API service indexing awesome lists of open source software.

https://github.com/xaliphostes/dataframe

A minimalist Python Panda like library in pure C++
https://github.com/xaliphostes/dataframe

algebra cplusplus cpp cpp23 functional-programming geometry mathematics pandas-dataframe pandas-python statistics

Last synced: 2 months ago
JSON representation

A minimalist Python Panda like library in pure C++

Awesome Lists containing this project

README

        


Logo dataframe


Linux support
macOS support
Windows support


Language
License

# Simple and efficient C++ Dataframe Library (header only)


drawing



Example of interpolated scalar field (left: real, right: interpolation vrt to black points)

```c++
auto scattered = random_uniform(50, Vector2{-1.0, -1.0}, Vector2{1.0, 1.0});
auto values = map([](const Vector2 &p) {return sin(p[0]*2) * cos(p[1]*2);}, scattered);
auto grid = from_dims<2>({100, 100}, {0, 0}, {2.0, 2.0});
auto interp = interpolate_field(grid, scattered, values);
```


drawing



Example of distance field computation

```c++
auto ref_pts = random_uniform(10, Vector2{-1.0, -1.0}, Vector2{1.0, 1.0});
auto grid = from_dims<2>({100, 100}, {0.0, 0.0}, {2.0, 2.0});
auto distances = distance_field<2>(grid, ref_pts);
```


drawing



Example of heat diffusion using HarmonicDiffusion

#
### ***...Work in progress for linear algebra, stats and geo(metry, logy, physic...) operations...***
#


A modern C++ library for data manipulation with a focus on functional programming patterns and type safety.

Only headers. No linking!

# [Read the doc (in progress...)](https://xaliphostes.github.io/dataframe/)

## Features

- Generic series container (`Serie`) for any data type (similar to a column in Excel sheet)
- DataFrame for managing multiple named series
- Rich functional operations (map, reduce, filter, etc.)
- Parallel processing capabilities
- Type-safe operations with compile-time checks
- Modern C++ design (C++23 if available)
- Use [Eigen](https://eigen.tuxfamily.org/index.php?title=Main_Page) for libear algebra (will install it automatically)
- Use [CGAL](https://www.cgal.org/) if needed (read [this to install CGAL](CGAL_INSTALL.md))
- Implements the functions such as `chain`, `chunk`, `compose`, `concat`, `filter`, `find`, `flatten`, `format`, `forEach`, `groupBy`, `map_if`, `map`, `memoise`, `merge`, `ones`, `orderBy`, `parallel_map`, `partition`, `pipe` with operator `|`, `print`, `range`, `reduce`, `reject`, `skip`, `slice`, `sort`, `split`, `switch`, `take`, `unique`, `unzip`, `whenAll`, `where`, `zeros`, `zip`.

## Core Concepts

### `Serie`

A type-safe container for sequences of data with functional operations:
- Supports any data type
- Provides functional operations (map, reduce, filter)
- Enables chaining operations using pipe syntax

For comparison, the main difference is that while Excel columns can contain mixed types and empty cells, a Serie is strongly typed and all elements must be of the same type, making it more suitable for type-safe data processing.

### `Dataframe`

A container for managing multiple named series:
- Type-safe storage of different series types
- Named access to series
- Dynamic addition and removal of series

## Examples

### Basic Series Operations

```cpp
#include
#include
#include

// Create series with default values
df::Serie ints(5); // Creates [0,0,0,0,0]
df::Serie doubles(3); // Creates [0.0,0.0,0.0]

// Create series with specific values
df::Serie ones(4, 1); // Creates [1,1,1,1]
df::Serie pi(3, 3.14); // Creates [3.14,3.14,3.14]

// -------------------------------------------

// Create a serie of numbers
df::Serie numbers{1, 2, 3, 4, 5};

// Map operation: double each number
// Note: "size_t index" is optional
auto doubled = numbers.map([](int n, size_t index) { return n * 2; });

// Filter operation: keep only even numbers
auto evens = numbers | df::bind_filter([](int n) { return n % 2 == 0; });

// Create a reusable pipeline using chaining operations
auto pipeline = df::bind_map([](int n) { return n * 2; }) |
df::bind_filter([](int n) { return n > 5; });

// Apply the pipeline to the numbers serie
auto result = pipeline(numbers);
```

### Operator overloading

```cpp
#include

// Create a serie of numbers
df::Serie s1{1, 2, 3, 4, 5};
df::Serie s2{1, 2, 3, 4, 5};
df::Serie s3{1, 2, 3, 4, 5};
df::Serie s4{1, 2, 3, 4, 5};

auto s = (s1 + s2) * s3 / s4;
```

### Linear algebra

```cpp
#include
#include
#include

// Three sym tensor in 3D
// Row storage format, i.e., {xx, xy, xz, yy, yz, zz}
//
df::Serie serie({
{2, 4, 6, 3, 6, 9},
{1, 2, 3, 4, 5, 6},
{9, 8, 7, 6, 5, 4}
});

auto [values, vectors] = df::eigenSystem(serie);

df::forEach([](const EigenVectorType<3>& v) {
std::cout << "1st eigen vector: " << v[0] << std::endl ;
std::cout << "2nd eigen vector: " << v[1] << std::endl ;
std::cout << "3rd eigen vector: " << v[2] << std::endl ;
}, vectors);
```

### Parallel Processing (whenAll)

The library provides several ways to perform parallel computations on Series.

The parallel processing functions are particularly useful for:
- Large datasets where computation can be distributed
- CPU-intensive operations on each element
- Processing multiple series simultaneously
- Operations that can be executed independently

Note that for small datasets, the overhead of parallel execution might outweigh the benefits. Consider using parallel operations when:
- The dataset size is large (typically > 10,000 elements)
- The operation per element is computationally expensive
- The operation doesn't require maintaining order-dependent state

```cpp
#include

// Process multiple series in parallel with transformation
df::Serie s1{1.0, 2.0, 3.0, ...};
df::Serie s2{4.0, 5.0, 6.0, ...};

auto result = df::whenAll([](const df::Serie& s) {
return s.map([](double x) { return x * 2; });
}, {s1, s2});

// Parallel processing with tuple results
auto [r1, r2] = df::whenAll(s1, s2);
```

### Working with Custom Types

```cpp
struct Point3D {
double x, y, z;
};

// Create a serie of 3D points
df::Serie points{{0,0,0}, {1,1,1}, {2,2,2}};

// Transform points
auto translated = df::map(([](const Point3D& p) {
return Point3D{p.x + 1, p.y + 1, p.z + 1};
}, points);

// Get the norms according to (0,0,0)
auto norms = df::map(([](const Point3D& p) {
return std::sqrt{std::pow(p.x, 2), std::pow(p.y, 2), std::pow(p.z, 2)};
}, points);
```

### Dataframe Usage

```cpp
#include

// Create a Dataframe
df::Dataframe dataframe;

// Add different types of series
dataframe.add("integers", df::Serie{1, 2, 3, 4, 5});
dataframe.add("doubles", df::Serie{1.1, 2.2, 3.3, 4.4, 5.5});

// Access series with type safety
const auto& ints = dataframe.get("integers");
const auto& dbls = dataframe.get("doubles");

for (const auto& [name, serie] : dataframe) {
// Work with name and serie
}

// Remove a series
dataframe.remove("integers");
```

### 3D Mesh Example

```cpp
#include
#include
#include
#include
#include

// Define types for clarity
using Point = std::array;
using Triangle = std::array;

// Create a simple mesh
df::Dataframe mesh;

// Create vertices
df::Serie vertices{
{0.0, 0.0, 0.0},
{1.0, 0.0, 0.0},
{0.0, 1.0, 0.0},
{0.0, 0.0, 1.0}
};

// Create triangles
df::Serie triangles{
{0, 1, 2},
{0, 2, 3},
{0, 3, 1},
{1, 3, 2}
};

// Add to DataFrame
mesh.add("vertices", vertices);
mesh.add("triangles", triangles);

// Transform vertices
auto transformed_vertices = df::map([](const Point& p) {
return Point{p[0] * 2.0, p[1] * 2.0, p[2] * 2.0};
}, vertices);
mesh.add("transformed_vertices", transformed_vertices);

// Add attributes at vertices
mesh.add("norm", df::norm(vertices));
mesh.add("normal", df::normals(vertices));
```

## Installation

Header-only library. Simply include the headers that you need in your project.

## Requirements

- C++23 or later
- Modern C++ compiler (GCC, Clang, MSVC)

## License

MIT License - See LICENSE file for details.

## Contact
[email protected]