https://github.com/codingforentrepreneurs/web-scraping-with-django-celery
Learn how to schedule regular web scraping, save the data, and more with Django & Celery.
https://github.com/codingforentrepreneurs/web-scraping-with-django-celery
bright-data celery django proxy selenium selenium-python web-scraping
Last synced: 2 months ago
JSON representation
Learn how to schedule regular web scraping, save the data, and more with Django & Celery.
- Host: GitHub
- URL: https://github.com/codingforentrepreneurs/web-scraping-with-django-celery
- Owner: codingforentrepreneurs
- License: mit
- Created: 2024-03-04T18:53:33.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-18T22:18:31.000Z (about 2 years ago)
- Last Synced: 2024-03-18T23:29:39.590Z (about 2 years ago)
- Topics: bright-data, celery, django, proxy, selenium, selenium-python, web-scraping
- Language: Jupyter Notebook
- Homepage:
- Size: 901 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Web Scraping on a Schedule with Django & Celery
Learn how to schedule regular web scraping, save the data, and more with Django & Celery.
Topics:
- Django
- Celery
- Selenium
- Scraped Data to Database via Django
- Reliable Web Scraping with Selenium + Bright Data
References:
- [Celery + Redis + Django configuration guide](https://www.codingforentrepreneurs.com/blog/celery-redis-django/)
- Django + Celery Redis [blank project code](https://github.com/codingforentrepreneurs/Django-Celery-Redis)
- Django + Jupyter Setup Module [short + code](https://www.codingforentrepreneurs.com/shorts/django-setup-for-use-in-jupyter-notebooks/)
Requirements:
- Django experience such as Try Django (on [YouTube](https://www.youtube.com/playlist?list=PLEsfXFp6DpzRMby_cSoWTFw8zaMdTEXgL) or on [CFE](https://www.codingforentrepreneurs.com/topics/try-django/)) or [Your First Django Project](https://www.codingforentrepreneurs.com/courses/your-first-django-project/).
- Redis Instance
- Setup Redis on Windows [blog post](https://www.codingforentrepreneurs.com/blog/redis-on-windows/)
- Setup Redis on MacOS or Linux [blog post](https://www.codingforentrepreneurs.com/blog/install-redis-mac-and-linux)
- Setup Redis on Remote Virtual Machine [blog post](https://www.codingforentrepreneurs.com/blog/remote-redis-servers-for-development/)
- How I use Redis for new projects [short + code](https://www.codingforentrepreneurs.com/shorts/how-i-use-redis-for-new-projects-with-docker-compose/)
- A Bright Data Account [$25 credit for new accounts](https://brdta.com/justin)
## Getting Started
```bash
git clone https://github.com/codingforentrepreneurs/Django-Celery-Redis
mv Django-Celery-Redis scrape-scheduler
cd scrape-scheduler
```
`macos/linux`
```
python3 -m venv venv
source venv/bin/activate
```
`windows`
```
c:\Python311\python.exe -m venv venv
.\venv\Scripts\activate
```
Install requirements
```bash
python -m pip install pip --upgrade
python -m pip install -r requirements.txt
```
Run a local redis instance via Docker Compose
```bash
docker compose -f compose.yaml up -d
```
This will give us `redis://localhost:6170`
Create `.env` in `src/.env` with:
```bash
CELERY_BROKER_REDIS_URL="redis://localhost:6170"
DEBUG=True
```
Navigate into your Django root:
```bash
cd src/
ls
```
You should see at least `cfehome/` and `manage.py`.
Run your project in 2 terminals:
- `python manage.py runserver`
- `celery -A cfehome worker --beat`
Let's go!