Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/robvanvolt/DALLE-tools
DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.
https://github.com/robvanvolt/DALLE-tools
dataset-preparation datasets webdataset
Last synced: about 1 month ago
JSON representation
DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.
- Host: GitHub
- URL: https://github.com/robvanvolt/DALLE-tools
- Owner: robvanvolt
- Created: 2021-11-30T19:24:57.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-03-09T00:57:35.000Z (almost 3 years ago)
- Last Synced: 2024-08-04T03:11:07.474Z (5 months ago)
- Topics: dataset-preparation, datasets, webdataset
- Language: Python
- Homepage:
- Size: 3.84 MB
- Stars: 15
- Watchers: 0
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# DALLE tools
DALLE-tools is a github repository with useful tools to categorize, annotate or check the sanity of your datasets.
## Installation
Just clone this repository to your folder and use one of the following commands in the section underneath.
### WebDataset Annotator
```python
python annotator.py
```Press to switch to the next page, to change the annotation category or click on the image to add it to the current cateogry and save it in annotations.json. Please upload your annotations.json by creating a push request into community_annotations folder into the folder of the dataset you used (e.g. YFCC100m, or LAION400m etc.), so everyone can use the data for better dataset annotations!
If you want to continue to annotate a dataset where someone else already started, just copy the annotations.json from the community_annotations
folder and the used dataset into the root directory and run the annotator!![Screenshot](screenshot.png)
### WebDataset aligner
```python
python aligner.py
```This tool helps to align the shuffled keys, so the WebDataset module can read your datasets correctly.
You just need to specify the keys you want to look for and keep in your new dataset.## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.## License
[MIT](https://choosealicense.com/licenses/mit/)