https://github.com/hugo-hattori/deal_seeker_webscraping
Web Scraping Project utilizing Selenium.
https://github.com/hugo-hattori/deal_seeker_webscraping
automation jupyter jupyter-notebook pandas pandas-dataframe pandas-python python selenium selenium-python selenium-webdriver time web-scraping webdriver-manager win32com
Last synced: 2 months ago
JSON representation
Web Scraping Project utilizing Selenium.
- Host: GitHub
- URL: https://github.com/hugo-hattori/deal_seeker_webscraping
- Owner: Hugo-Hattori
- License: mit
- Created: 2023-08-22T20:10:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-06T21:57:08.000Z (over 1 year ago)
- Last Synced: 2024-12-28T14:36:41.607Z (4 months ago)
- Topics: automation, jupyter, jupyter-notebook, pandas, pandas-dataframe, pandas-python, python, selenium, selenium-python, selenium-webdriver, time, web-scraping, webdriver-manager, win32com
- Language: Jupyter Notebook
- Homepage:
- Size: 149 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deal Seeker Web Scraping
## Project Scenario Description
Being responsible for researching the best prices for the company's inputs and products, it is necessary
to constantly search suppliers websites for the products available and the price, after all, each of them
may have promotions at different times and with different values.For this project we will be using Google Shopping and Buscapé for price research.
### Packages used:
+ selenium
+ webdriver_manager
+ pandas
+ win32com.client
+ time## Project's Objective
To create a Python Script capable of accessing suppliers websites and evaluate products prices based on
a previously established threshold, finding the cheapest products and updating the data in a spreadsheet.
And at last automatically sending the e-mail containing the best offers.## Project's Input
This project utilizes an Excel file (buscas.xlsx) as an input. The Excel file contains 4 columns:
+ Product Name;
+ Banned Words (here we can specify certain names we don't want to appear on our research);
+ Minimum Price (minimum threshold);
+ Maximum Price (maximum threshold).So the research is customizable through this Excel
file by altering each row to the user's need.## End Result
The project's output will be an Excel file and a e-mail containing a dataframe with 3 columns of information:
Product Name (as advertised on site), Product's Price and the URL.
Note: this is a project developed for academic purposes, therefore this Project's Scenario is fictitious.