https://github.com/aidos-lab/mantra

MANTRA: The Manifold Triangulations Assemblage (A dataset of manifold triangulations)
https://github.com/aidos-lab/mantra

dataset graph-dataset graph-learning iclr2025 topological-data-analysis topological-deep-learning

Last synced: 3 months ago
JSON representation

MANTRA: The Manifold Triangulations Assemblage (A dataset of manifold triangulations)

Host: GitHub
URL: https://github.com/aidos-lab/mantra
Owner: aidos-lab
License: bsd-3-clause
Created: 2024-06-03T13:08:31.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-04-22T07:49:12.000Z (3 months ago)
Last Synced: 2025-04-22T08:24:29.115Z (3 months ago)
Topics: dataset, graph-dataset, graph-learning, iclr2025, topological-data-analysis, topological-deep-learning
Language: Python
Homepage: https://aidos.group/mantra/
Size: 24.8 MB
Stars: 4
Watchers: 2
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md

Awesome Lists containing this project

README

        # MANTRA: Manifold Triangulations Assembly

[![Maintainability](https://api.codeclimate.com/v1/badges/82f86d7e2f0aae342055/maintainability)](https://codeclimate.com/github/aidos-lab/MANTRA/maintainability) ![GitHub contributors](https://img.shields.io/github/contributors/aidos-lab/MANTRA) ![GitHub](https://img.shields.io/github/license/aidos-lab/MANTRA) 

![image](_static/manifold_triangulation_orbit.gif)

## Getting the Dataset

The raw MANTRA dataset consisting of the $2$ and $3$ manifolds with up to $10$ vertices 

is provided [here](https://github.com/aidos-lab/mantra/releases/latest). 

For machine learning applications and research, we provide a custom [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/stable/) dataset in the form of a python package. 

For installations via pip, run  

The raw datasets, consisting of the 2 and 3 manifolds with up to 10

vertices, can be manually downloaded 

[here](https://github.com/aidos-lab/mantra/releases/latest). 

A pytorch geometric wrapper for the dataset is installable via the following 

command.

```python

pip install mantra-dataset

```

After installation the dataset can be used with the follwing snippet.

```python

from mantra.datasets import ManifoldTriangulations

dataset = ManifoldTriangulations(root="./data", manifold="2", version="latest")

```

## Folder Structure

## Data Format

> This section is mostly *information-oriented* and provides a brief

> overview of the data format, followed by a short [example](#example).

Each dataset consists of a list of triangulations, with each

triangulation having the following attributes:

* `id` (required, `str`): This attribute refers to the original ID of

  the triangulation as used by the creator of the dataset (see

  [below](#acknowledgments)). This facilitates comparisons to the

  original dataset if necessary.

* `triangulation` (required, `list` of `list` of `int`): A doubly-nested

  list of the top-level simplices of the triangulation.

* `n_vertices` (required, `int`): The number of vertices in the

  triangulation. This is **not** the number of simplices.

* `name` (required, `str`): A canonical name of the triangulation, such

  as `S^2` for the two-dimensional [sphere](https://en.wikipedia.org/wiki/N-sphere).

  If no canonical name exists, we store an empty string.

* `betti_numbers` (required, `list` of `int`): A list of the [Betti

  numbers](https://en.wikipedia.org/wiki/Betti_number) of the

  triangulation, computed using $Z$ coefficients. This implies that

  [torsion](https://en.wikipedia.org/wiki/Homology_(mathematics))

  coefficients are stored in another attribute.

* `torsion_coefficients` (required, `list` of `str`): A list of the

  [torsion

  coefficients](https://en.wikipedia.org/wiki/Homology_(mathematics)) of

  the triangulation. An empty string `""` indicates that no torsion

  coefficients are available in that dimension. Otherwise, the original

  spelling of torsion coefficients is retained, so a valid entry might

  be `"Z_2"`. 

* `genus` (optional, `int`): For 2-manifolds, contains the

  [genus](https://en.wikipedia.org/wiki/Genus_(mathematics)) of the

  triangulation.

* `orientable` (optional, `bool`): Specifies whether the triangulation

  is [orientable](https://en.wikipedia.org/wiki/Orientability) or not.

### Example

```json

[

  {

    "id": "manifold_2_4_1",

    "triangulation": [

      [1,2,3],

      [1,2,4],

      [1,3,4],

      [2,3,4]

    ],

    "dimension": 2,

    "n_vertices": 4,

    "betti_numbers": [

      1,

      0,

      1

    ],

    "torsion_coefficients": [

      "",

      "",

      ""

    ],

    "name": "S^2",

    "genus": 0,

    "orientable": true

  },

  {

    "id": "manifold_2_5_1",

    "triangulation": [

      [1,2,3],

      [1,2,4],

      [1,3,5],

      [1,4,5],

      [2,3,4],

      [3,4,5]

    ],

    "dimension": 2,

    "n_vertices": 5,

    "betti_numbers": [

      1,

      0,

      1

    ],

    "torsion_coefficients": [

      "",

      "",

      ""

    ],

    "name": "S^2",

    "genus": 0,

    "orientable": true

  }

]

```

### Design Decisions

> This section is *understanding-oriented* and provides additional

> justifications for our data format.

The datasets are converted from their original (mixed) lexicographical

format. A triangulation in lexicographical format could look like this:

```

manifold_lex_d2_n6_#1=[[1,2,3],[1,2,4],[1,3,4],[2,3,5],[2,4,5],[3,4,6],

  [3,5,6],[4,5,6]]

```

A triangulation in *mixed* lexicographical format could look like this:

```

manifold_2_6_1=[[1,2,3],[1,2,4],[1,3,5],[1,4,6],

  [1,5,6],[2,3,4],[3,4,5],[4,5,6]]

```

This format is **hard to parse**. Moreover, any *additional* information

about the triangulations, including information about homology groups or

orientability, for instance, requires additional files.

We thus decided to use a format that permits us to keep everything in

one place, including any additional attributes for a specific

triangulation. A desirable data format needs to satisfy the following

properties:

1. It should be easy to parse and modify, ideally in a number of

   programming languages.

2. It should be human-readable and `diff`-able in order to permit

   simplified comparisons.

3. It should scale reasonably well to larger triangulations.

After some considerations, we decided to opt for `gzip`-compressed JSON

files. [JSON](https://www.json.org) is well-specified and supported in

virtually all major programming languages out of the box. While the

compressed file is *not* human-readable on its own, the uncompressed

version can easily be used for additional data analysis tasks. This also

greatly simplifies maintenance operations on the dataset. While it can

be argued that there are formats that scale even better, they are

not well-applicable to our use case since each triangulation

typically consists of different numbers of top-level simplices. This

rules out column-based formats like [Parquet](https://parquet.apache.org/).

We are open to revisiting this decision in the future.

As for the *storage* of the data as such, we decided to keep only the

top-level simplices (as is done in the original format) since this

substantially saves disk space. The drawback is that the client has to

supply the remainder of the triangulation. Given that the triangulations

in our dataset are not too large, we deem this to be an acceptable

compromise. Moreover, data structures such as [simplex

trees](https://en.wikipedia.org/wiki/Simplex_tree) can be used to

further improve scalability if necessary.

The decision to keep only top-level simplices is **final**.

Finally, our data format includes, whenever possible and available,

additional information about a triangulation, including the [Betti

numbers](https://en.wikipedia.org/wiki/Betti_number) and a *name*,

i.e., a canonical description, of the topological space described

by the triangulation. We opted to minimize any inconvenience that

would arise from having to perform additional parsing operations.

Please use the following citation for our work:

```bibtex

@misc{ballester2024mantramanifoldtriangulationsassemblage,

      title={ {MANTRA}: {T}he {M}anifold {T}riangulations {A}ssemblage}, 

      author={Rub{\'e}n Ballester and Ernst R{\"o}ell and Daniel Bin Schmid and Mathieu Alain and Sergio Escalera and Carles Casacuberta and Bastian Rieck},

      year={2024},

      eprint={2410.02392},

      archivePrefix={arXiv},

      primaryClass={cs.LG},

      url={https://arxiv.org/abs/2410.02392}, 

}

```

## Acknowledgments

This work is dedicated to [Frank H. Lutz](https://www3.math.tu-berlin.de/IfM/Nachrufe/Frank_Lutz/stellar/),

who passed away unexpectedly on November 10, 2023. May his memory be

a blessing.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aidos-lab/mantra

Awesome Lists containing this project

README