Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sarrabenyahia/tuto-webscraping
webscraping course tutorial
https://github.com/sarrabenyahia/tuto-webscraping
api beautifulsoup headers javascript proxy requests scrapy selenium user-agent webscraping
Last synced: 15 days ago
JSON representation
webscraping course tutorial
- Host: GitHub
- URL: https://github.com/sarrabenyahia/tuto-webscraping
- Owner: sarrabenyahia
- Created: 2024-10-08T18:56:32.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-05T10:51:14.000Z (3 months ago)
- Last Synced: 2024-11-05T11:45:30.571Z (3 months ago)
- Topics: api, beautifulsoup, headers, javascript, proxy, requests, scrapy, selenium, user-agent, webscraping
- Language: Python
- Homepage:
- Size: 8.93 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# πΈοΈ Web Scraping Course
Welcome to the Web Scraping Course repository! π This repository contains all the materials needed to follow and understand the course, including source codes, exercises and their solutions, as well as presentation materials.
## π Repository Structure
- π `/bs4` : Contains materials related to BeautifulSoup 4, including code examples and exercises.
- π `/scrapy` : Includes resources for learning and working with Scrapy, such as spiders and project setups.
- π `/selenium` : Stores materials for web scraping with Selenium, including scripts and browser automation examples.
- π `/presentations` : Stores PowerPoint files and other course materials.## π Course Content
This course covers fundamental and advanced aspects of web scraping, including:
- π Introduction to web scraping and its ethics
- π Using Python libraries such as BeautifulSoup and Requests
- π‘οΈ Work around websites protections: user-agents, proxy-rotations, IP address management
- β‘ Asynchronous scraping with Scrapy
- π Scraping dynamic websites with Selenium
- π Best practices and optimizations## π Prerequisites
- π Python 3.9+
- π Familiarity with API requests## π οΈ Installation
1. Clone this repository:
```
git clone https://github.com/sarrabenyahia/web-scraping-course.git
```
2. Install dependencies:
```
pip install -r requirements.txt
```## π Usage
- π Browse the folders to access different course materials.
- π Follow the instructions in each folder to run code examples or complete exercises.## π€ Contributing
Contributions to improve the course content are welcome. Feel free to open an issue or submit a pull request.
## π Your feedback
We appreciate your feedback! Please share your impressions and suggestions by filling out the following form:
[Give your feedback here](https://docs.google.com/forms/d/e/1FAIpQLSfTRzgjomMKIsf4NJ7N-FHa94DfpNMwSRyRfKyEyyUPUYWLdg/viewform?usp=sf_link)