https://github.com/xaliphostes/dataframe

A minimalist Python Panda like library in pure C++
https://github.com/xaliphostes/dataframe

algebra cplusplus cpp cpp23 functional-programming geometry mathematics pandas-dataframe pandas-python statistics

Last synced: 24 days ago
JSON representation

A minimalist Python Panda like library in pure C++

Host: GitHub
URL: https://github.com/xaliphostes/dataframe
Owner: xaliphostes
License: mit
Created: 2023-02-25T08:41:09.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-09-22T12:05:19.000Z (about 2 months ago)
Last Synced: 2025-10-25T06:46:16.159Z (24 days ago)
Topics: algebra, cplusplus, cpp, cpp23, functional-programming, geometry, mathematics, pandas-dataframe, pandas-python, statistics
Language: C++
Homepage:
Size: 8.97 MB
Stars: 5
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


  





  

  

  





  

  



# Simple and efficient C++ Dataframe Library (header only)



    





Example of interpolated scalar field (left: real, right: interpolation vrt to black points)

```c++

auto scattered = random_uniform(50, Vector2{-1.0, -1.0}, Vector2{1.0, 1.0});

auto values    = map([](const Vector2 &p) {return sin(p[0]*2) * cos(p[1]*2);}, scattered);

auto grid      = from_dims<2>({100, 100}, {0, 0}, {2.0, 2.0});

auto interp    = interpolate_field(grid, scattered, values);

```





    





Example of distance field computation

```c++

auto ref_pts   = random_uniform(10, Vector2{-1.0, -1.0}, Vector2{1.0, 1.0});

auto grid      = from_dims<2>({100, 100}, {0.0, 0.0}, {2.0, 2.0});

auto distances = distance_field<2>(grid, ref_pts);

```





    





Example of heat diffusion using HarmonicDiffusion



# 

### ***...Work in progress for linear algebra, stats and geo(metry, logy, physic...) operations...***

#




A modern C++ library for data manipulation with a focus on functional programming patterns and type safety.

Only headers. No linking!

# [Read the doc (in progress...)](https://xaliphostes.github.io/dataframe/)

## Features

- Generic series container (`Serie`) for any data type (similar to a column in Excel sheet)

- DataFrame for managing multiple named series

- Rich functional operations (map, reduce, filter, etc.)

- Parallel processing capabilities

- Type-safe operations with compile-time checks

- Modern C++ design (C++20)

- Use [Eigen](https://eigen.tuxfamily.org/index.php?title=Main_Page) for libear algebra (will install it automatically)

- Use [CGAL](https://www.cgal.org/) if needed (read [this to install CGAL](CGAL_INSTALL.md))

- Implements the functions such as `chain`, `chunk`, `compose`, `concat`, `filter`, `find`, `flatten`, `format`, `forEach`, `groupBy`, `map_if`, `map`, `memoise`, `merge`, `ones`, `orderBy`, `parallel_map`, `partition`, `pipe` with operator `|`, `print`, `range`, `reduce`, `reject`, `skip`, `slice`, `sort`, `split`, `switch`, `take`, `unique`, `unzip`, `whenAll`, `where`, `zeros`, `zip`.

## Core Concepts

### `Serie`

A type-safe container for sequences of data with functional operations:

- Supports any data type

- Provides functional operations (map, reduce, filter)

- Enables chaining operations using pipe syntax

For comparison, the main difference is that while Excel columns can contain mixed types and empty cells, a Serie is strongly typed and all elements must be of the same type, making it more suitable for type-safe data processing.

### `Dataframe`

A container for managing multiple named series:

- Type-safe storage of different series types

- Named access to series

- Dynamic addition and removal of series

## Examples

### Basic Series Operations

```cpp

#include 

#include 

#include 

// Create series with default values

df::Serie ints(5);        // Creates [0,0,0,0,0]

df::Serie doubles(3);  // Creates [0.0,0.0,0.0]

// Create series with specific values

df::Serie ones(4, 1);     // Creates [1,1,1,1]

df::Serie pi(3, 3.14); // Creates [3.14,3.14,3.14]

// -------------------------------------------

// Create a serie of numbers

df::Serie numbers{1, 2, 3, 4, 5};

// Map operation: double each number

// Note: "size_t index" is optional

auto doubled = numbers.map([](int n, size_t index) { return n * 2; });

// Filter operation: keep only even numbers

auto evens = numbers | df::bind_filter([](int n) { return n % 2 == 0; });

// Create a reusable pipeline using chaining operations

auto pipeline = df::bind_map([](int n) { return n * 2; }) |

                df::bind_filter([](int n) { return n > 5; });

// Apply the pipeline to the numbers serie

auto result = pipeline(numbers);

```

### Operator overloading

```cpp

#include 

// Create a serie of numbers

df::Serie s1{1, 2, 3, 4, 5};

df::Serie s2{1, 2, 3, 4, 5};

df::Serie s3{1, 2, 3, 4, 5};

df::Serie s4{1, 2, 3, 4, 5};

auto s = (s1 + s2) * s3 / s4;

```

### Linear algebra

```cpp

#include 

#include 

#include 

// Three sym tensor in 3D

// Storage format is {xx, xy, xz, yy, yz, zz}

//

df::Serie serie({

    {2, 4, 6, 3, 6, 9}, 

    {1, 2, 3, 4, 5, 6},

    {9, 8, 7, 6, 5, 4}

});

auto [values, vectors] = df::eigenSystem(serie);

df::forEach([](const EigenVectorType<3>& v) {

    std::cout << "1st eigen vector: " << v[0] << std::endl ;

    std::cout << "2nd eigen vector: " << v[1] << std::endl ;

    std::cout << "3rd eigen vector: " << v[2] << std::endl ;

}, vectors);

```

### Parallel Processing (whenAll)

The library provides several ways to perform parallel computations on Series.

The parallel processing functions are particularly useful for:

- Large datasets where computation can be distributed

- CPU-intensive operations on each element

- Processing multiple series simultaneously

- Operations that can be executed independently

Note that for small datasets, the overhead of parallel execution might outweigh the benefits. Consider using parallel operations when:

- The dataset size is large (typically > 10,000 elements)

- The operation per element is computationally expensive

- The operation doesn't require maintaining order-dependent state

```cpp

#include 

// Process multiple series in parallel with transformation

df::Serie s1{1.0, 2.0, 3.0, ...};

df::Serie s2{4.0, 5.0, 6.0, ...};

auto result = df::whenAll([](const df::Serie& s) { 

    return s.map([](double x) { return x * 2; }); 

}, {s1, s2});

// Parallel processing with tuple results

auto [r1, r2] = df::whenAll(s1, s2);

```

### Working with Custom Types

```cpp

struct Point3D {

    double x, y, z;

};

// Create a serie of 3D points

df::Serie points{{0,0,0}, {1,1,1}, {2,2,2}};

// Transform points

auto translated = df::map(([](const Point3D& p) {

    return Point3D{p.x + 1, p.y + 1, p.z + 1};

}, points);

// Get the norms according to (0,0,0)

auto norms = df::map(([](const Point3D& p) {

    return std::sqrt{std::pow(p.x, 2), std::pow(p.y, 2), std::pow(p.z, 2)};

}, points);

```

### Dataframe Usage

```cpp

#include 

// Create a Dataframe

df::Dataframe dataframe;

// Add different types of series

dataframe.add("integers", df::Serie{1, 2, 3, 4, 5});

dataframe.add("doubles", df::Serie{1.1, 2.2, 3.3, 4.4, 5.5});

// Access series with type safety

const auto& ints = dataframe.get("integers");

const auto& dbls = dataframe.get("doubles");

for (const auto& [name, serie] : dataframe) {

    // Work with name and serie

}

// Remove a series

dataframe.remove("integers");

```

### 3D Mesh Example

```cpp

#include 

#include 

#include 

#include 

#include 

// Define types for clarity

using Point    = std::array;

using Triangle = std::array;

// Create a simple mesh

df::Dataframe mesh;

// Create vertices

df::Serie vertices{

    {0.0, 0.0, 0.0},

    {1.0, 0.0, 0.0},

    {0.0, 1.0, 0.0},

    {0.0, 0.0, 1.0}

};

// Create triangles

df::Serie triangles{

    {0, 1, 2},

    {0, 2, 3},

    {0, 3, 1},

    {1, 3, 2}

};

// Add to DataFrame

mesh.add("vertices", vertices);

mesh.add("triangles", triangles);

// Transform vertices

auto transformed_vertices = df::map([](const Point& p) {

    return Point{p[0] * 2.0, p[1] * 2.0, p[2] * 2.0};

}, vertices);

mesh.add("transformed_vertices", transformed_vertices);

// Add attributes at vertices

mesh.add("norm", df::norm(vertices));

mesh.add("normal", df::normals(vertices));

```

## Installation

Header-only library. Simply include the headers that you need in your project.

## Requirements

- C++20 or later

- Modern C++ compiler (GCC, Clang, MSVC)

## License

MIT License - See LICENSE file for details.

## Contact

fmaerten@gmail.com

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/xaliphostes/dataframe

Awesome Lists containing this project

README