{"id":13737702,"url":"https://github.com/robvanvolt/DALLE-tools","last_synced_at":"2025-05-08T15:31:02.640Z","repository":{"id":43308014,"uuid":"433558550","full_name":"robvanvolt/DALLE-tools","owner":"robvanvolt","description":"DALLE-tools provided useful dataset utilities to improve you workflow with WebDatasets.","archived":false,"fork":false,"pushed_at":"2022-03-09T00:57:35.000Z","size":4027,"stargazers_count":15,"open_issues_count":1,"forks_count":8,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-11-15T06:32:18.885Z","etag":null,"topics":["dataset-preparation","datasets","webdataset"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robvanvolt.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-11-30T19:24:57.000Z","updated_at":"2024-04-08T21:10:11.000Z","dependencies_parsed_at":"2022-09-06T16:21:30.825Z","dependency_job_id":null,"html_url":"https://github.com/robvanvolt/DALLE-tools","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robvanvolt%2FDALLE-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robvanvolt%2FDALLE-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robvanvolt%2FDALLE-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robvanvolt%2FDALLE-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robvanvolt","download_url":"https://codeload.github.com/robvanvolt/DALLE-tools/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253095907,"owners_count":21853507,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset-preparation","datasets","webdataset"],"created_at":"2024-08-03T03:01:57.797Z","updated_at":"2025-05-08T15:31:02.166Z","avatar_url":"https://github.com/robvanvolt.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# DALLE tools\n\nDALLE-tools is a github repository with useful tools to categorize, annotate or check the sanity of your datasets.\n\n## Installation\n\nJust clone this repository to your folder and use one of the following commands in the section underneath.\n\n### WebDataset Annotator\n\n```python\npython annotator.py\n```\n\nPress \u003cspace\u003e to switch to the next page, \u003cc\u003e to change the annotation category or click on the image to add it to the current cateogry and save it in annotations.json. Please upload your annotations.json by creating a push request into community_annotations folder into the folder of the dataset you used (e.g. YFCC100m, or LAION400m etc.), so everyone can use the data for better dataset annotations!\nIf you want to continue to annotate a dataset where someone else already started, just copy the annotations.json from the community_annotations\nfolder and the used dataset into the root directory and run the annotator!\n\n![Screenshot](screenshot.png)\n\n### WebDataset aligner\n\n```python\npython aligner.py\n```\n\nThis tool helps to align the shuffled keys, so the WebDataset module can read your datasets correctly.\nYou just need to specify the keys you want to look for and keep in your new dataset.\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\n## License\n[MIT](https://choosealicense.com/licenses/mit/)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobvanvolt%2FDALLE-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobvanvolt%2FDALLE-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobvanvolt%2FDALLE-tools/lists"}