Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/robvanvolt/DALLE-tools

DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.
https://github.com/robvanvolt/DALLE-tools

dataset-preparation datasets webdataset

Last synced: about 1 month ago
JSON representation

DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.

Awesome Lists containing this project

README

        

# DALLE tools

DALLE-tools is a github repository with useful tools to categorize, annotate or check the sanity of your datasets.

## Installation

Just clone this repository to your folder and use one of the following commands in the section underneath.

### WebDataset Annotator

```python
python annotator.py
```

Press to switch to the next page, to change the annotation category or click on the image to add it to the current cateogry and save it in annotations.json. Please upload your annotations.json by creating a push request into community_annotations folder into the folder of the dataset you used (e.g. YFCC100m, or LAION400m etc.), so everyone can use the data for better dataset annotations!
If you want to continue to annotate a dataset where someone else already started, just copy the annotations.json from the community_annotations
folder and the used dataset into the root directory and run the annotator!

![Screenshot](screenshot.png)

### WebDataset aligner

```python
python aligner.py
```

This tool helps to align the shuffled keys, so the WebDataset module can read your datasets correctly.
You just need to specify the keys you want to look for and keep in your new dataset.

## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

## License
[MIT](https://choosealicense.com/licenses/mit/)