Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/SakshayMahna/WebScraping
Trying out web scraping using different techniques
https://github.com/SakshayMahna/WebScraping
Last synced: 3 months ago
JSON representation
Trying out web scraping using different techniques
- Host: GitHub
- URL: https://github.com/SakshayMahna/WebScraping
- Owner: SakshayMahna
- License: mit
- Created: 2021-08-08T09:06:10.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-08-08T10:22:57.000Z (over 3 years ago)
- Last Synced: 2024-05-28T06:56:34.228Z (6 months ago)
- Language: HTML
- Homepage:
- Size: 6 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Project on Web Scraping
This project consists of 3 different web scraping techniques.## Requirements and Installation
The project has been implemented and run on Windows 10 machine with Python 3.8.8. The requirements can be installed by:```bash
pip install -r requirements.txt
```## Beautiful Soup
`WebScraping-BeautifulSoup` contains web scraping using BeautifulSoup. Articles on Algorithms present on GeeksForGeeks have been scraped.Run the program using:
```bash
python main.py
```## Scrapy
`WebScraping-Scrapy` contains web scraping using Scrapy. There are 2 different scraping projects in this.### Video Games on Flipkart
Run the program using:```bash
spider crawl flipkart_games
```### Watches on Amazon
Run the program using:```bash
spider crawl amazon_watches
```**For scraping through Amazon, User-Agents and Proxy methods have also been used**
## Selenium
`WebScraping-Selenium` contains web scraping using Selenium. Videos present on a Gaming Channel (Insym) have been scraped.Run the program using:
```bash
python main.py
```