Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tezcatlipoca0000/trevi-spider
A tailored program that scrapes data from the website and updates my DB
https://github.com/tezcatlipoca0000/trevi-spider
gmail-api pandas python3 scraping-python selenium-python
Last synced: 29 days ago
JSON representation
A tailored program that scrapes data from the website and updates my DB
- Host: GitHub
- URL: https://github.com/tezcatlipoca0000/trevi-spider
- Owner: Tezcatlipoca0000
- License: gpl-3.0
- Created: 2024-02-18T14:12:45.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-09-07T14:16:43.000Z (4 months ago)
- Last Synced: 2024-09-07T15:44:49.274Z (4 months ago)
- Topics: gmail-api, pandas, python3, scraping-python, selenium-python
- Language: Python
- Homepage:
- Size: 28.3 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Trevi Spider
## What it does:
It can execute three distinct functions to populate the local business database with carefully extracted data from the provider's website:
- It scrapes the price data from the provider's website.
- Retrieves the placed ordered via email to the provider.
- Creates a dataframe with the new data when available.## What technologies does it need:
- Python
- google-api-python-client
- google-auth-httplib2
- google-auth-oauthlib
- numpy
- Selenium
- Pandas
- openpyxl## What files does it need (stdin):
- To get placed-order:
- token.json ~ GMAIL API **git ignored**
- credentials.json ~ GMAIL.API **git ignored**
- mygmailaccount.inbox.message_with_pedido.xlsx- To update:
- Provedores Todos.xlsm ~ Local database **git ignored**
- pedido.xlsx ~ Placed-order to provider, retrieved from GMAIL **git ignored**
- trevi_full.xlsx ~ Scraped data from provider's website **git ignored**## What files does it create (stdout):
- After scraping the data:
- trevi_full.xlsx ~ Scraped data from provider's website **git ignored**- After retrieving placed-order:
- pedido.xlsx ~ Placed-order to provider, retrieved from GMAIL **git ignored**- After creating an updtaded dataframe:
- Final.xlsx ~ Updated dataframe with necesary information **git ignored**## What I'm thinking about:
- Uploading all the necessary sample_files.