Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/morvanzhou/easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.
https://github.com/morvanzhou/easy-scraping-tutorial

asyncio beautifulsoup crawler crawling distributed-scraper regex requests scraping scrapy urllib

Last synced: 7 days ago
JSON representation

Simple but useful Python web scraping tutorial code.

Awesome Lists containing this project

README

        






# Web scraping tutorials (Python)

In these tutorials, we will learn to build some simple but useful scrapers from scratch. Get to know how we can read web page and select sections you need or even download files.
If you understand Chinese, you are lucky! I made Chinese video + text tutorials for all of these contents. You can find it in [莫烦Python](https://mofanpy.com/).

**Learning from code, I made two options for you.**

1. learn it from [source code](/source_code/)
2. learn it from [jupyter notebook](/notebook/)

## The contents

* Basic concept and package
* [Urllib](/notebook/1-1-urllib.ipynb)
* BeautifulSoup
* [Basic](/notebook/2-1-beautifulsoup-basic.ipynb)
* [CSS](/notebook/2-2-beautifulsoup-css.ipynb)
* [RegEx](/notebook/2-3-beautifulsoup-regex.ipynb)
* [Practice random scraping](/notebook/2-4-practice-baidu-baike.ipynb)
* Requests and Download
* [Requests](/notebook/3-1-requests.ipynb)
* [Download](/notebook/3-2-download.ipynb)
* [Practice download image](/notebook/3-3-practice-download-images.ipynb)
* Speed up scraping
* [Distributed scraping (multiprocessing)](/notebook/4-1-distributed-scraping.ipynb)
* [Asyncio](/notebook/4-2-asyncio.ipynb)
* Advanced
* [Selenium](/notebook/5-1-selenium.ipynb)
* [Scrapy](/notebook/5-2-scrapy.ipynb)

# Donation

*If this does help you, please consider donating to support me for better tutorials. Any contribution is greatly appreciated!*



Paypal



Patreon