https://github.com/allencellmodeling/quilt3distribute
People commonly work with tabular datasets, people want to share their data, this makes that easier through Quilt3.
https://github.com/allencellmodeling/quilt3distribute
Last synced: 6 months ago
JSON representation
People commonly work with tabular datasets, people want to share their data, this makes that easier through Quilt3.
- Host: GitHub
- URL: https://github.com/allencellmodeling/quilt3distribute
- Owner: AllenCellModeling
- License: other
- Created: 2019-05-25T00:26:07.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-07-22T14:32:27.000Z (over 3 years ago)
- Last Synced: 2025-07-31T21:49:29.275Z (7 months ago)
- Language: Python
- Homepage:
- Size: 6.47 MB
- Stars: 5
- Watchers: 4
- Forks: 1
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# quilt3distribute
[](https://github.com/AllenCellModeling/quilt3distribute/actions)
[](https://AllenCellModeling.github.io/quilt3distribute)
[](https://codecov.io/gh/AllenCellModeling/quilt3distribute)
[](https://doi.org/10.5281/zenodo.3382259)

People commonly work with tabular datasets, people want to share their data, this makes that easier through Quilt3.
---
## Features
* Automatically determines which files to upload based off CSV headers. (Explicit override available)
* Simple interface for attaching metadata to each file based off the manifest contents.
* Groups metadata for files that are referenced multiple times.
* Validates and runs basic cleaning operations on your dataset manifest CSV.
* Optionally add license details and usage instructions to your dataset README.
* Parses README for any referenced files and packages them up as well.
* Support for adding extra files not contained in the manifest.
* Constructs an "associates" map that is placed into each files metadata for quick navigation around the package.
* Enforces that the metadata attached to each file is standardized across the package for each file column.
## Quick Start
Construct a csv (or pandas dataframe) dataset manifest ([Example](quilt3distribute/tests/data/example.csv)):
| CellId | Structure | 2dReadPath | 3dReadPath |
|--------|-----------|------------|------------|
| 1 | lysosome | 2d/1.png | 3d/1.tiff |
| 2 | laminb1 | 2d/2.png | 3d/2.tiff |
| 3 | golgi | 2d/3.png | 3d/3.tiff |
| 4 | myosin | 2d/4.png | 3d/4.tiff |
```python
from quilt3distribute import Dataset
# Create the dataset
ds = Dataset(
dataset="single_cell_examples.csv",
name="single_cell_examples",
package_owner="jacksonb",
readme_path="single_cell_examples.md"
)
# Optionally add common additional requirements
ds.add_usage_doc("https://docs.quiltdata.com/walkthrough/reading-from-a-package")
ds.add_license("https://www.allencell.org/terms-of-use.html")
# Optionally indicate column values to use for file metadata
ds.set_metadata_columns(["CellId", "Structure"])
# Optionally rename the columns on the package level
ds.set_column_names_map({
"2dReadPath": "images_2d",
"3dReadPath": "images_3d"
})
# Distribute
pkg = ds.distribute(push_uri="s3://quilt-jacksonb", message="Initial dataset example")
```
***Returns:***
```
(remote Package)
└─README.md
└─images_2d
└─03cdf019_1.png
└─148ddc09_2.png
└─2b2cf361_3.png
└─312a0367_4.png
└─images_3d
└─a0ce6e01_1.tiff
└─c360072c_2.tiff
└─d9b55cba_3.tiff
└─eb29e6b3_4.tiff
└─metadata.csv
└─referenced_files
└─some_file_referenced_by_the_readme.png
```
***Example Metadata:***
```python
pkg["images_2d"]["03cdf019_1.png"].meta
```
```json
{
"CellId": 1,
"Structure": "lysosome",
"associates": {
"images_2d": "images_2d/03cdf019_1.png",
"images_3d": "images_3d/a0ce6e01_1.tiff"
}
}
```
## Installation
**Stable Release:** `pip install quilt3distribute`
**Development Head:** `pip install git+https://github.com/AllenCellModeling/quilt3distribute.git`
### Credits
This package was created with Cookiecutter. [Original repository](https://github.com/audreyr/cookiecutter)
***Free software: Allen Institute Software License***