An open API service indexing awesome lists of open source software.

https://github.com/rifkyiqbal52/data-analytics-projects

web scraping online store lazada.co.id, search running shoes
https://github.com/rifkyiqbal52/data-analytics-projects

beautifulsoup pandas phyton postgresql scraping scraping-websites selenium

Last synced: about 2 months ago
JSON representation

web scraping online store lazada.co.id, search running shoes

Awesome Lists containing this project

README

          

# Web Scraping & Analysis: Running Shoes on Lazada
As part of my training, I was assigned the role of a Data Engineer working on a data pipeline/ETL project. My main task was to extract data from a website, process it, and store it in a PostgreSQL database.

For this project, I built a web scraping tool to gather product data from Lazada, specifically focusing on running shoes, which are currently trending due to the growing interest in running and fitness.

This project helped me understand the real-world workflow of a Data Engineer β€” from data extraction and cleaning to storage and analysis.

---

## 🎯 Objectives
- Scrape product data related to running shoes from Lazada.
- Clean and process the collected data.
- Store the structured data in a PostgreSQL database using pgAdmin4.
- Perform basic analysis to understand product distribution and popularity.

## πŸ› οΈ Tools
- Python: Main programming language
- Pandas: Data manipulation and analysis
- BeautifulSoup: HTML parsing for scraping static content
- Selenium: Automating browser actions and scraping dynamic content
- PostgreSQL: Database for storing the cleaned data
- pgAdmin4: GUI for PostgreSQL database management

## πŸ“ˆ Collected Data Includes:
the data I scraped was up to 10 slides, resulting in 400 rows and 6 columns :
- Product_Name
- Price
- Seller Location
- Sold
- Rating
- Review

## πŸš€ Outcome
By the end of this project, I was able to simulate a real-world ETL (Extract, Transform, Load) process and gain hands-on experience in:
1. Building web scrapers with Selenium & BeautifulSoup
2. Structuring and cleaning data with Pandas
3. Using PostgreSQL for data storage
4. Understanding the workflow of a data engineering project

πŸ“ Check the [notebooks folder](notebooks/) for the Jupyter Notebook.

πŸ“‚ View [data folder](data/) for raw and cleaned datasets.

## πŸ“Œ Note
This project is for educational purposes only. It complies with Lazada’s terms of use and was not used for commercial purposes.