Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vgvr0/dia-supermarket-scraper
A Python script for web scraping various product categories from an online supermarket (dia.es) and saving product details into a CSV file
https://github.com/vgvr0/dia-supermarket-scraper
beautifulsoup chromedriver dia-scraper scraper scraping seleniumbase seleniun-python supermarket-scraper supermarket-scraping supermercado-dia-scraper undetected-chromedriver
Last synced: about 2 months ago
JSON representation
A Python script for web scraping various product categories from an online supermarket (dia.es) and saving product details into a CSV file
- Host: GitHub
- URL: https://github.com/vgvr0/dia-supermarket-scraper
- Owner: vgvr0
- Created: 2024-05-02T17:31:04.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-06-23T15:23:27.000Z (7 months ago)
- Last Synced: 2024-06-23T16:38:40.642Z (7 months ago)
- Topics: beautifulsoup, chromedriver, dia-scraper, scraper, scraping, seleniumbase, seleniun-python, supermarket-scraper, supermarket-scraping, supermercado-dia-scraper, undetected-chromedriver
- Language: Jupyter Notebook
- Homepage:
- Size: 322 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web Scraping Dia Supermarket
This Python script utilizes web scraping techniques to extract product information from the DIA supermarket website.
![Día Supermarket](SupermercadoDia.png)
---
## Requirements
- Python 3
- Selenium
- BeautifulSoup
- Chrome Web Driver
- SeleniumBase
---#### Usage
1. Install the required Python packages.
2. Download and install the Chrome Web Driver. Make sure the Chrome Web Driver executable is in your system's PATH.
3. Run the script using Python.---
#### Code Explanation
1. **Importing Libraries**:
- The script imports necessary libraries such as `os`, `re`, `csv`, `time`, `random`, `sqlite3`, `keyboard`, `funcionesAux` (assumed to be a custom module), `datetime`, `BeautifulSoup`, and `seleniumbase`.2. **Function Definition**:
- `dia_csv`: This function saves data to a CSV file.3. **Setting Up WebDriver**:
- The script initializes a Chrome WebDriver instance.4. **Scraping DIA Website**:
- The script navigates to the DIA website and accepts cookies.
- It scrapes product categories and subcategories, storing their URLs for further scraping.
- For each subcategory, it scrapes product details such as name, image, and link.---
#### Example
Here's an example demonstrating how to use the script:
```python
from seleniumbase import Driverdriver = Driver(
browser="chrome",
uc=True,
headless2=False,
incognito=False,
agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
do_not_track=True,
undetectable=True
)driver.maximize_window()
# Run web scraping script
# (Add web scraping code here)driver.close()