Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/meyerls/pegasus

[IROS24] Offical repository for "PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation"
https://github.com/meyerls/pegasus

6dof-pose dataset-generation dataset-generator gaussian-splatting novel-view-synthesis pegasus pose-estimation

Last synced: 2 months ago
JSON representation

[IROS24] Offical repository for "PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation"

Awesome Lists containing this project

README

        

# PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation

Lukas Meyer*, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae

*This work was conducted during an internship at the National Institute
of Advanced Industrial Science and Technology.


| [Webpage](https://meyerls.github.io/pegasus_web) | [Full Paper](https://arxiv.org/abs/2401.02281) | [Ramen Dataset (~50 GB)](https://zenodo.org/records/12624886) | [PEGASET (~50 GB)](https://zenodo.org/records/12625040) |

![Teaser image](assets/title.png)

*We introduce Physical Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset
generation, a versatile dataset generator based on 3D Gaussian Splatting. Preparation starts by separate scanning of
both environments and objects. PEGASUS allows the composition of new scenes by merging the respective underlying
Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the
simulation of natural object placement within a scene by interacting with their
extracted mesh. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining
different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB
images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data
generated by PEGASUS enables pose estimation networks to successfully transfer from synthetic data to real-world data.
Moreover, we introduce the CupNoodle dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical
scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible
with PEGASUS.*





## Funding and Acknowledgments

This paper is one of the achievements of joint research with and is owned
copyrighted material of ROBOT Industrial Basic Technology Collaborative
Innovation Partnership. This research has been supported by the New Energy
and Industrial Technology Development Organization (NEDO), under the
project ID JPNP20016.

## Cloning the Repository

The repository contains submodules, thus please check it out with

```shell
git clone https://github.com/meyerls/PEGASUS.git --recursive # HTTPS
git submodule update --init --recursive
```

## Requirements

The coda has been tested with the following dependencies:

- Python 3.8
- Cuda 11.6
- PyTorch 1.12.1

## Setup

Our default, provided install method is based on Conda package and is provided by the following script. This script has
to be executed in the top layer of the repository. Currently, the setup script has only be tested on Ubuntu 20. An
installation on windows should be possible but will not be provided in this repo.

```shell
./setup.sh
```

## Overview

![Teaser image](assets/pegasus.png)

PEGASUS contains of three main components:

- GS Base Environment reconstruction
- GS Object Reconstruction
- PEGASUS Dataset Extraction

### GS Base Environment reconstruction

Click me

Will be updated soon

### GS Object Reconstruction

Click me

Will be updated soon! Not yet complete

For object reconstruction we provide two different processing weights. The first is scanning objects in the wild by
taking videos from both sides of the object and the second one is using a camera rig to scan the object on a turntable.
The first approach uses [XMEM](https://github.com/hkchengrex/XMem/tree/main) to create a segmentation mask of the
selected object. For scanning one has to place only an aruco marker into the scene to obtain the correct scale.
The turntable approach uses an arbitrary calibration object (I have used a texture rich paper with an aruco marker) to
reuse its precomputed camera poses. A detailed workflow is provided in the following section.

#### In the Wild scanning

The workflow for scanning objects in the wild is:

###### 1. Select Object

Currently it does not work for texture poor objects. Therefore the camera rig is more suitable. The reason is that
computing the poses and also registering images from the bottom view does simply not work with COLMAP. Place the object
onto a planer scene such as a table and make sure to move all around the object.

###### 2. Aruco Marker

Print out an aruco marker and place it next to the object. For scaling the object measure and note down the size of the
square aruco marker. A website to create aruco marker can be found [here](https://chev.me/arucogen/).

###### 3. Scanning

Record two videos with your phone camera or DSLR camera (We have used an iphone 12 in our example). The first video
contains a hemispherical scan of the top view of the object. Try to cover a 360 degree view at 2-3 different height
levels. For the second video this process must be repeated for the flipped object.

###### 4. Segmentation Mask

For extracting the semantic masks of the video we used [XMEM](https://github.com/hkchengrex/XMem).

XMEM can be started from the root directory of PEGASUS:

```shell
python submodules/XMem/interactive_demo.py --video[path to the video] --num_objects 1 --size -1
```

drawing

In the XMEM GUI select the object you want to extract (the object should be highlighted in red). Afterward click the
button *Forward Propagate* (1) to extract the masks. Depending on the video length it takes
around 1-2 min. To save the detected masks click on *Export Overlays as Video* (2) to save the
binary masks as images. More info on how to use XMEM can be
found [here](https://github.com/hkchengrex/XMem/blob/main/docs/DEMO.md).

Note: please select the image size according to your GPU size or the quality you want to get. -1 uses the original image
size. If you set a value it will resize the image according to its shorter side.

###### 6. Dataset Integration

First both extracted images and masks have to be put into a common folder. This folder should be placed in a dataset
folder where multiple reconstructed objects can be stored.

```shell
.
└── bouillon
├── down
│ ├── images
│ ├── masks
└── up
├── images
└── masks

```

To use the scanned object and included it in PEGASUS one has to define the object as a Dataset-Object
in [in_the_wild_dataset.py](src/dataset/in_the_wild_dataset.py). The class (here *Bouillon*) name takes the name of the
object.

```python
class Bouillon(InTheWild):
OBJECT_NAME = 'bouillon'
ID = 201
TYPE = 'object'
RECORDING_TYPE = 'spherical' # 'spherical' or 'hemispherical'
ALPHA = 0.3
DATASET_TYPE = 'wild'
ARUCO_SIZE = 0.037 # in meter

def __init__(self, dataset_path):
super().__init__(dataset_path=Path(dataset_path))
```

drawing

- ```OBJECT_NAME```: folder name of the object. By default it is the video name in the ```./workspace``` folder (this
folder gets generated by XMEM). Please rename to the object name.
- ```ID```: Unique object ID
- ```TYPE```: default type is object. Differs for environment (default: object)
- ```RECORDING_TYPE```: 'spherical' or 'hemispherical' depending on if you also scanned the bottom. This is recommend if
you
have texture-less objects. 'spherical': 2 videos (top & bottom). 'hemispherical': 1 video (top only)
- ```ALPHA```: Value for alpha shape reconstruction (default: 0.3)
- ```DATASET_TYPE```: name for your own dataset (default: wild)
- ```ARUCO_SIZE```: size of the aruco marker in meter(!)

###### 7. GS Reconstruction

```shell
python src/reconstruction/in_the_wild_object_reconstruction.py
```

###### 8. Integrate into PEGASUS

- Tbd

### Available Objects (Ramen Dataset and PEGASET)

We provide two different datasets. The IDs for the *Ramen* dataset are between 101 and 130. The YCB-V IDs are identical to the original YCB-V ids.

#### *Ramen* Dataset

The [*Ramen* Dataset](https://google.de) contains out of 30 cup noodle objects and 9
environments.


drawing



drawing drawing

```shell
.
└── Dataset
├── calibration
│ ├── ...
├── environment
│ ├── ...
├── object
│ ├── ...
└── urdf
└── ...
```

#### PEGASET

The [PEGASET](https://zenodo.org/records/12625040) contains out of the well known 21 YCB-V and 9 environments.


drawing


drawing drawing

```shell
.
└── Dataset
├── calibration
│ ├── ...
├── environment
│ ├── ...
├── object
│ ├── ...
└── urdf
└── ...
```

### PEGASUS Dataset Extraction

Before rendering a dataset the dataset provided for PEGASUS must have been downloaded
from [*Ramen* Dataset](https://zenodo.org/records/12624886) or [PEGASET](https://zenodo.org/records/12625040). If you use both dataset you should merge both into one folder structure.

All objects and environments which are relevant for dataset generation should be added to the ```obj_list``` and ```env_list```.

Parameters:

- ```mode: str```: Either ```"dynamic"``` or ```"static"``` rendering of scene
- ```num_scenes: int```: Number of scenes
- ```num_objects: int```: Maximum number of objects which are placed in the scene. A random number between 1 and ```num_objects``` is choosen.
- ```image_height```:
- ```image_width```:
- ```render_data_points: list```: Types of rendering and data points saved to output. e.g. ```['rgb', 'depth', 'seg_vis', 'seg_sil', 'sem_seg']```
- ```convert_from_scenewise2imagewise: bool```: By default the scene is saved per scene. If you need the data in sceneweise BOP-Format set to ```True```

## BibTex

```
@Article{PEGASUS2024,
author = {Meyer, Lukas and Erich, Floris and Yoshiyasu, Yusuke and Stamminger, Marc and Ando, Noriaki and Domae, Yukiyasu },
title = {PEGASUS: Physical Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation},
journal = {IROS},
month = {October},
year = {2024},
url = {https://meyerls.github.io/pegasus_web}
}
```

Thanks to the authors of [3D Gaussians](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) for their excellent
code, please consider to also cite this repository:

```
@Article{kerbl3Dgaussians,
author = {Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
title = {3D Gaussian Splatting for Real-Time Radiance Field Rendering},
journal = {ACM Transactions on Graphics},
number = {4},
volume = {42},
month = {July},
year = {2023},
url = {https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/}
}
```

And thanks to authors of the BOP Toolkit for their benchmark for 6D object pose estimation interface.