https://github.com/zhexuany/coscene-converter

A tool for converting open-x-embodiment dataset to MCAP and upload data to coScene data platform.
https://github.com/zhexuany/coscene-converter

Last synced: 10 months ago
JSON representation

A tool for converting open-x-embodiment dataset to MCAP and upload data to coScene data platform.

Host: GitHub
URL: https://github.com/zhexuany/coscene-converter
Owner: zhexuany
Created: 2025-07-20T11:48:17.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-07-20T11:54:11.000Z (11 months ago)
Last Synced: 2025-07-20T13:22:12.574Z (11 months ago)
Language: Python
Size: 31.3 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# coScene Converter

Tools for converting robotics datasets to MCAP format for use with coScene and Foxglove.

## Overview

coScene Converter is a Python library that converts robotics datasets, particularly from the Open-X-Embodiment collection, into MCAP format for visualization in coScene. The tool provides a flexible schema-based approach to handle different dataset structures and formats.

## Features

- Conversion of Open-X-Embodiment datasets to MCAP format
- Support for images, depth data, transforms, and robot state
- Integration with coScene for interactive visualization
- Extensible schema system for supporting different dataset formats
- Dataset structure exploration tools

## Installation

```bash
pip install -e .
```

## Usage

### Converting a Dataset

To convert a dataset episode to MCAP format:

```bash
python -m cli --dataset berkeley_autolab_ur5 --episode 1
```

Options:
- `--dataset DATASET`: Dataset name to convert (e.g., berkeley_autolab_ur5, stanford_robocook_converted_externally_to_rlds)
- `--episode EPISODE`: Episode number to convert (default: 1)
- `--batch`: Process multiple episodes in batch mode
- `--start START`: Start episode number for batch mode (default: 1)
- `--end END`: End episode number for batch mode (default: 10)
- `--output-dir OUTPUT_DIR`: Output directory for generated MCAP files (default: mcap_files)
- `--live`: Show live preview during conversion
- `--rate RATE`: Playback rate in Hz for live preview (default: 5.0)
- `--verbose`: Enable verbose output with step information

### Exploring Dataset Structure

To explore the structure of a dataset before conversion:

```bash
python scripts/dataset_structure_explorer.py --dataset stanford_robocook_converted_externally_to_rlds
```

**Important**: The dataset name must exactly match a registered dataset name in the Open-X-Embodiment collection, as these names are used to load datasets from `tensorflow_datasets`. You can verify registered dataset names in the [Open-X-Embodiment Dataset Spreadsheet](https://docs.google.com/spreadsheets/d/1rPBD77tk60AEIGZrGSODwyyzs5FgCU9Uz3h-3_t2A9g/edit?gid=0) under the "Registered Dataset Name" column.

This will generate a JSON file with the dataset structure that can be used to create a new schema. Since the nature of different datasets varies significantly, it's essential to understand the underlying meaning of each dataset's fields and structure to create an appropriate schema.

## Creating a New Dataset Schema

To add support for a new dataset:

1. Run the dataset structure explorer to understand the dataset format:
```bash
python scripts/dataset_structure_explorer.py --dataset your_dataset_name
```
Remember that `your_dataset_name` must exactly match a registered dataset name in the Open-X-Embodiment collection.

2. Analyze the generated JSON structure carefully to understand:
- The semantic meaning of each field in the dataset
- The relationships between different data elements
- How the dataset represents robot state, sensor data, and actions

3. Create a new schema file in `common/dataset_schemas/your_dataset_name.py`

4. Implement the schema class following the pattern in existing schemas like `berkeley_autolab_ur5.py` or `stanford_robocook_converted_externally_to_rlds.py`

5. Your schema class should:
- Inherit from `DefaultSchema` or `DatasetSchema`
- Implement `setup_channels()` to define the channels for your dataset
- Implement `process_step()` to process each step of data
- Optionally implement `print_step_info()` for debugging

## Project Structure

- `cli.py`: Command-line interface for the converter
- `common/`: Common utilities and schema definitions
- `schemas.py`: Base schema classes and common schema definitions
- `dataset_schemas/`: Dataset-specific schema implementations
- `open_x_embodiment/`: Tools for working with Open-X-Embodiment datasets
- `data_loader.py`: Functions for loading datasets
- `converter.py`: Functions for converting datasets to MCAP
- `scripts/`: Utility scripts
- `dataset_structure_explorer.py`: Tool for exploring dataset structures

## Contributing

To add support for a new dataset:

1. Explore the dataset structure using the explorer tool
2. Create a new schema file based on the existing examples
3. Test your schema with the converter

## License

Apache 2 License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zhexuany/coscene-converter

Awesome Lists containing this project

README