Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tri-ml/dgp
ML Dataset Governance Policy for Autonomous Vehicle Datasets
https://github.com/tri-ml/dgp
autonomous-driving data-management deep-learning machine-learning pytorch
Last synced: 3 days ago
JSON representation
ML Dataset Governance Policy for Autonomous Vehicle Datasets
- Host: GitHub
- URL: https://github.com/tri-ml/dgp
- Owner: TRI-ML
- License: mit
- Created: 2020-05-11T17:14:43.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-08-06T14:59:02.000Z (3 months ago)
- Last Synced: 2024-08-06T17:13:35.450Z (3 months ago)
- Topics: autonomous-driving, data-management, deep-learning, machine-learning, pytorch
- Language: Python
- Homepage: https://tri-ml.github.io/dgp/
- Size: 91.4 MB
- Stars: 93
- Watchers: 6
- Forks: 63
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://www.tri.global/)
# Dataset Governance Policy (DGP)
[![build-docker](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml)
[![license](https://img.shields.io/github/license/TRI-ML/dgp.svg)](https://github.com/TRI-ML/dgp/blob/master/LICENSE)
[![open-issues](https://img.shields.io/github/issues/TRI-ML/dgp.svg)](https://github.com/TRI-ML/dgp/issues)
![coverage badge](./docs/coverage.svg)
[![docs](https://img.shields.io/badge/documentation-beta-red)](https://tri-ml.github.io/dgp/)To ensure the traceability, reproducibility and standardization for all ML
datasets and models generated and consumed within Toyota Research Institute
(TRI), we developed the Dataset-Governance-Policy (DGP) that codifies the schema
and maintenance of all TRI's Autonomous Vehicle (AV) datasets.
## Components
- [Schema](dgp/proto/README.md):
[Protobuf](https://developers.google.com/protocol-buffers)-based schemas for
raw data, annotations and dataset management.
- [DataLoaders](dgp/datasets): Universal PyTorch DatasetClass to load all
DGP-compliant datasets.
- [CLI](dgp/README.md): Main CLI for handling DGP datasets and the entrypoint of
visulization tools.## Getting Started
Please see [Getting Started](docs/GETTING_STARTED.md) for environment setup.
Getting started is as simple as initializing a dataset-class with the relevant
dataset JSON, raw data sensor names, annotation types, and split information.
Below, we show a few examples of initializing a Pytorch dataset for multi-modal
learning from 2D bounding boxes, and 3D bounding boxes.```python
from dgp.datasets import SynchronizedSceneDataset# Load synchronized pairs of camera and lidar frames, with 2d and 3d
# bounding box annotations.
dataset = SynchronizedSceneDataset('_v0.0.json',
datum_names=('camera_01', 'lidar'),
requested_annotations=('bounding_box_2d', 'bounding_box_3d'),
split='train')
```## Examples
A list of starter scripts are provided in the [examples](examples/) directory.
- [examples/load_dataset.py](examples/load_dataset.py): Simple example script to
load a multi-modal dataset based on the **Getting Started** section above.## Build and run tests
You can build the base docker image and run the tests within
[docker container](docs/GETTING_STARTED.md#markdown-header-develop-within-docker)
via:```sh
make docker-build
make docker-run-tests
```## Contributing
We appreciate all contributions to DGP! To learn more about making a
contribution to DGP, please see [Contribution Guidelines](docs/CONTRIBUTING.md).## CI Ecosystem
| Job | CI | Notes |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| docker-build | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/build-docker.yml) | Docker build and push to [container registry](https://github.com/TRI-ML/dgp/pkgs/container/dgp) |
| pre-merge | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/pre-merge.yml) | Pre-merge testing |
| doc-gen | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/doc-gen.yml) | [GitHub Pages](https://tri-ml.github.io/dgp/) doc generation |
| coverage | [![Build Status](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml/badge.svg)](https://github.com/TRI-ML/dgp/actions/workflows/coverage.yml) | Code coverage metrics and badge generation |## 💬 Where to file bug reports
| Type | Platforms |
| ----------------------- | ------------------------------------------------------------ |
| 🚨 **Bug Reports** | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |
| 🎁 **Feature Requests** | [GitHub Issue Tracker](https://github.com/TRI-ML/dgp/issues) |## 👩💻 The Team 👨💻
DGP is developed and currently maintained by _Quincy Chen, Arjun Bhargava, Chao
Fang, Chris Ochoa and Kuan-Hui Lee_ from ML-Engineering team at
[Toyota Research Institute (TRI)](https://www.tri.global/), with contributions
coming from ML-Research team at TRI,
[Woven Planet](https://www.woven-planet.global/en) and
[Parallel Domain](https://paralleldomain.com/).