Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mousazourob/sahib
Python bot that scrapes Walmart's clearance page for electronics to find the best deals and post them on Twitter automatically
https://github.com/mousazourob/sahib
dnspython geckodriver mongodb python selenium webscrapping
Last synced: 22 days ago
JSON representation
Python bot that scrapes Walmart's clearance page for electronics to find the best deals and post them on Twitter automatically
- Host: GitHub
- URL: https://github.com/mousazourob/sahib
- Owner: MousaZourob
- Created: 2020-08-16T20:32:25.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-01-09T20:15:28.000Z (about 4 years ago)
- Last Synced: 2024-11-05T11:16:43.312Z (2 months ago)
- Topics: dnspython, geckodriver, mongodb, python, selenium, webscrapping
- Language: Python
- Homepage: https://twitter.com/SahibBot_
- Size: 1.61 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Sahib
### Overview:
**Python** bot that scrapes Walmart's clearance page for electronics to find the best deals and post them on Twitter automatically. The bot scrapes data using the **Selenium framework** for Python, and the data is then stored in a **MongoDB** collection to be tweeted later. Using Windows Task Scheduler, the script was made into a **CRON job** to tweet 5 deals daily.### Data Flow:
#### Scraping from Walmart:
**1.** Using **Geckodriver**, a **Webdriver for Firefox** is created that opens Walmart's clearance page for electronics
**2.** The products are then scrapped using **Selenium**, and data such as the product's title, new and old price, and when it was scrapped is recorded
**3.** Calculations are done to determine the discount percentage and price, and then using **DNSPython** products are sent to be stored in a **MongoDB** collection#### Scraping from Twitter:
**1.** Using **Geckodriver**, a **Webdriver for Firefox** is created that opens Twitter's log-in page
**2.** The script then checks if **cookies** saved as **JSON** objects for an older log-in exist, and if not logs in normally
**3.** Afterwards a connection with **MongoDB** is established, and the script finds the first 5 postings that haven't been tweeted
**4.** Using **Selenium** tweets are sent out over a 3 minute period, each with one of 8 template messages to publish new deals, and the posting date is saved in the database to not allow duplicate postings to occur
**5.** This script was turned into a **CRON job** using Windows Task Scheduler, and it runs automatically once a day to tweet out 5 deals daily## Demo:
* To view the bot in action, click [here](https://twitter.com/SahibBot_):
![Demo](https://user-images.githubusercontent.com/66835262/104045431-8c008080-51ac-11eb-9d31-7537516b84c5.png)## Libraries and Frameworks Used:
* **Selenium framework for Python:** https://selenium-python.readthedocs.io/
* **Geckodriver:** https://github.com/mozilla/geckodriver/releases
* **MongoDB for Python:** https://www.mongodb.com/blog/post/getting-started-with-python-and-mongodb
* **DNSPython:** https://pypi.org/project/dnspython/