https://github.com/hazembz/mlh-demo
A hybrid websites scraping system.
https://github.com/hazembz/mlh-demo
celery python redis scraping selenium web-automation
Last synced: 2 months ago
JSON representation
A hybrid websites scraping system.
- Host: GitHub
- URL: https://github.com/hazembz/mlh-demo
- Owner: HazemBZ
- Created: 2023-07-05T10:09:54.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-07-07T09:57:02.000Z (almost 2 years ago)
- Last Synced: 2025-06-29T18:51:08.260Z (12 months ago)
- Topics: celery, python, redis, scraping, selenium, web-automation
- Language: Python
- Homepage:
- Size: 104 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Architecture
## Components
**Tasks Manager**: Orchestrates running Scenarios using a task queue;
Arranges the run order of scenarios (in sequence, in parallel),
and handles pre/post scenario runs (resources cleanup; notifying Web client with progress, etc)
**Task queue**: Runs jobs sent by the Tasks Manager
**Scenario**: A unit that handles every steps required to achieve the intended objective:
- extracting data using scraping/received request
- starting/stopping an agent that interact with a web app
**Agent**: A frontend client to interact with apps
that dynamically generate their content (Single page apps, etc)
**Templates**: Data formats (form inputs, selection menu dictionaries, etc),
reverse engineered to correctly parse and transfer data to the targeted website
## Setup
**Run containers**
```
docker-compose up
```
## Testing
If you're really interested in a poc run, then you can test the system with
one of the automated interactions.
1.install httpie
2.Send a pre-populated post request
```
http localhost:8004/announce < request.json
```
3.Access "https://www.menzili.tn/connexion" using the "tilzxivqvzjmubrgsh@cazlp.com" as email and "tilzxivqvzjmubrgsh" password, count to 10 😉 and refresh.
4.You can use run `docker-compose logs web` to get more logged info and access info about created tasks @`localhost:5555` for flower dashboard
5.Free resources `docker-compose down`