https://github.com/rifkyiqbal52/data-analytics-projects
web scraping online store lazada.co.id, search running shoes
https://github.com/rifkyiqbal52/data-analytics-projects
beautifulsoup pandas phyton postgresql scraping scraping-websites selenium
Last synced: about 2 months ago
JSON representation
web scraping online store lazada.co.id, search running shoes
- Host: GitHub
- URL: https://github.com/rifkyiqbal52/data-analytics-projects
- Owner: rifkyiqbal52
- Created: 2025-06-23T12:56:27.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-23T14:01:14.000Z (about 1 year ago)
- Last Synced: 2025-06-23T14:28:25.128Z (about 1 year ago)
- Topics: beautifulsoup, pandas, phyton, postgresql, scraping, scraping-websites, selenium
- Language: Jupyter Notebook
- Homepage:
- Size: 55.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web Scraping & Analysis: Running Shoes on Lazada
As part of my training, I was assigned the role of a Data Engineer working on a data pipeline/ETL project. My main task was to extract data from a website, process it, and store it in a PostgreSQL database.
For this project, I built a web scraping tool to gather product data from Lazada, specifically focusing on running shoes, which are currently trending due to the growing interest in running and fitness.
This project helped me understand the real-world workflow of a Data Engineer β from data extraction and cleaning to storage and analysis.
---
## π― Objectives
- Scrape product data related to running shoes from Lazada.
- Clean and process the collected data.
- Store the structured data in a PostgreSQL database using pgAdmin4.
- Perform basic analysis to understand product distribution and popularity.
## π οΈ Tools
- Python: Main programming language
- Pandas: Data manipulation and analysis
- BeautifulSoup: HTML parsing for scraping static content
- Selenium: Automating browser actions and scraping dynamic content
- PostgreSQL: Database for storing the cleaned data
- pgAdmin4: GUI for PostgreSQL database management
## π Collected Data Includes:
the data I scraped was up to 10 slides, resulting in 400 rows and 6 columns :
- Product_Name
- Price
- Seller Location
- Sold
- Rating
- Review
## π Outcome
By the end of this project, I was able to simulate a real-world ETL (Extract, Transform, Load) process and gain hands-on experience in:
1. Building web scrapers with Selenium & BeautifulSoup
2. Structuring and cleaning data with Pandas
3. Using PostgreSQL for data storage
4. Understanding the workflow of a data engineering project
π Check the [notebooks folder](notebooks/) for the Jupyter Notebook.
π View [data folder](data/) for raw and cleaned datasets.
## π Note
This project is for educational purposes only. It complies with Lazadaβs terms of use and was not used for commercial purposes.