Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hanan-nawaz/daraz-price-comparator
Daraz Price Comparator πποΈ Compare prices effortlessly! π Find deals in π΅π° Pakistan, π§π© Bangladesh, π³π΅ Nepal, and π±π° Sri Lanka. Python, Requests, BeautifulSoup, and Airflow power smart shopping. Save time, shop savvy! ππ‘
https://github.com/hanan-nawaz/daraz-price-comparator
beautifulsoup bs4 darazpk python requests-python telemart webscraping
Last synced: about 2 months ago
JSON representation
Daraz Price Comparator πποΈ Compare prices effortlessly! π Find deals in π΅π° Pakistan, π§π© Bangladesh, π³π΅ Nepal, and π±π° Sri Lanka. Python, Requests, BeautifulSoup, and Airflow power smart shopping. Save time, shop savvy! ππ‘
- Host: GitHub
- URL: https://github.com/hanan-nawaz/daraz-price-comparator
- Owner: Hanan-Nawaz
- License: mit
- Created: 2023-09-29T13:11:57.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-25T17:40:05.000Z (12 months ago)
- Last Synced: 2024-01-25T17:58:25.106Z (12 months ago)
- Topics: beautifulsoup, bs4, darazpk, python, requests-python, telemart, webscraping
- Language: Python
- Homepage: https://hanannawaz.com
- Size: 13.5 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Daraz Price ComparatorπποΈ
Welcome to the Daraz Price Comparator repository! πποΈ This project automates price search and comparison on Daraz across Pakistan, Nepal, Sri Lanka, and Bangladesh. Powered by Python, Requests, BeautifulSoup, and Airflow, it simplifies the task of finding budget-friendly deals. Shop smartly with this efficient tool! π‘π
## Overview
### Core Features
1. **Dynamic E-commerce Search**: After user input, Initiate a Product search on Daraz.
2. **Automated Price Comparison**: The scraper automates Product searches and compares prices across multiple listings.
3. **Airflow Workflow**: Integrated with Apache Airflow for scheduled and automated execution.
4. **Data Storage**: Extracted Product details and prices are stored in a structured format.
## Project Process
This repository houses the code and configuration for the Daraz Price ComparatorπποΈ project. The workflow encompasses initiating a search on Daraz across Pakistan, Nepal, Sri Lanka, and Bangladesh. It extracts details from the first page and efficiently compares prices. The entire process is seamlessly orchestrated using Airflow for automation, ensuring a hassle-free shopping experience. π»ππ
### 1. E-commerce Search
- User enter its desired product name, Initiating a Product search on Daraz (Only first page).
### 2. Web Scraping
- Python scripts leverage Requests and BeautifulSoup for web scraping to extract product details and prices from both platforms.
### 3. Data Comparison (ETL)
- Extracted data is compared to identify the Product with the lowest price.
- We compare data of both platforms, Converting Currency and other relevant parameters and store in MongoDb.### 4. Airflow Integration
- Airflow is configured to automate the entire process at scheduled intervals.
## Technology Stack
- **Web Scraping**: Python (Requests, BeautifulSoup)
- **Automation**: Apache Airflow
- **Data Storage**: Structured format for extracted data (MongoDb)## File Structure
- `ecommerce-product-comparator/`: Contains the code and configuration for the Daraz Price ComparatorπποΈ project.
- `process-imgs/`: Contains helping images related to the project.
- `ecommerce-product-comparator-logo.jpg`: Logo image for the project.
- `ecommerce-product-comparator-process.jpg`: Process/Roadmap image for the project.
- `main.py`: Contains code, generally calling other classes.
- `code/`: Conatins all the code files.
- `daraz_scraper.py`: Conatins code to scrape from Daraz.
- `extract.py`: Contains code to call both scrapers.
- `transform.py`: Contains code to implement Extract, Transform and Load.
- `load_to_mongodb.py`: Conatins code to load data into MongoDb using PyMongo.
- `README.md`: You are here, providing an overview of the project.## Getting Started
To replicate this project, follow these steps:
1. Clone this project.
2. Run the Python scripts for web scraping and data comparison.
3. Optionally, integrate with Airflow for automated and scheduled execution.
4. The naming convention i use is `snaking case`.
Feel free to explore the project directory for more details on implementation and configuration.
## Contributors
- [Abdul Hanan Nawaz](https://www.github.com/Hanan-Nawaz)
## License
This project is licensed under the [MIT License](LICENSE).
## π Links
[![portfolio](https://img.shields.io/badge/my_portfolio-000?style=for-the-badge&logo=ko-fi&logoColor=white)](https://hanannawaz.com/)
[![linkedin](https://img.shields.io/badge/linkedin-0A66C2?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/abdulhanan0/)
[![twitter](https://img.shields.io/badge/twitter-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white)](https://twitter.com/HananNawaz0/)