Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ZachNagengast/LAION-Dalle-Scraper
Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel
https://github.com/ZachNagengast/LAION-Dalle-Scraper
Last synced: 3 months ago
JSON representation
Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel
- Host: GitHub
- URL: https://github.com/ZachNagengast/LAION-Dalle-Scraper
- Owner: ZachNagengast
- License: apache-2.0
- Fork: true (EduardoPach/LAION-Dalle-Scraper)
- Created: 2023-10-05T20:00:58.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-10T16:00:43.000Z (over 1 year ago)
- Last Synced: 2024-08-01T08:19:07.298Z (6 months ago)
- Language: Python
- Size: 44.9 KB
- Stars: 11
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome - ZachNagengast/LAION-Dalle-Scraper - Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel (Python)
README
## ⚠️ Main repo has been moved to [LAION-AI/Discord-Scrapers](https://github.com/LAION-AI/Discord-Scrapers)
This repo is now just for running backups
# LAION-Dalle-Scraper
Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel
This is currently syncing to huggingface here: https://huggingface.co/datasets/laion/dalle-3-dataset
### Environment Setup
#### Environment Variables (add to your github repo secrets)
- `DISCORD_TOKEN` - Discord bot token with read access and "MESSAGE CONTENT INTENT" toggled on
- `HF_DATASET_NAME` - Name of the dataset to sync to on huggingface
- `HF_TOKEN` - Huggingface token with write access to the dataset#### config.json
- `channel_id` - ID of the discord channel to scrape
- `limit` - Number of messages to scrape per request
- `hf_dataset_name` - Falback dataset name incase ENV is not set