Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/morvanzhou/easy-scraping-tutorial
Simple but useful Python web scraping tutorial code.
https://github.com/morvanzhou/easy-scraping-tutorial
asyncio beautifulsoup crawler crawling distributed-scraper regex requests scraping scrapy urllib
Last synced: 7 days ago
JSON representation
Simple but useful Python web scraping tutorial code.
- Host: GitHub
- URL: https://github.com/morvanzhou/easy-scraping-tutorial
- Owner: MorvanZhou
- License: mit
- Created: 2017-12-29T05:28:17.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2024-04-07T14:46:24.000Z (10 months ago)
- Last Synced: 2025-01-19T08:09:10.609Z (14 days ago)
- Topics: asyncio, beautifulsoup, crawler, crawling, distributed-scraper, regex, requests, scraping, scrapy, urllib
- Language: Jupyter Notebook
- Homepage: https://morvanzhou.github.io/tutorials/data-manipulation/scraping/
- Size: 3.26 MB
- Stars: 792
- Watchers: 41
- Forks: 549
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Web scraping tutorials (Python)
In these tutorials, we will learn to build some simple but useful scrapers from scratch. Get to know how we can read web page and select sections you need or even download files.
If you understand Chinese, you are lucky! I made Chinese video + text tutorials for all of these contents. You can find it in [莫烦Python](https://mofanpy.com/).**Learning from code, I made two options for you.**
1. learn it from [source code](/source_code/)
2. learn it from [jupyter notebook](/notebook/)## The contents
* Basic concept and package
* [Urllib](/notebook/1-1-urllib.ipynb)
* BeautifulSoup
* [Basic](/notebook/2-1-beautifulsoup-basic.ipynb)
* [CSS](/notebook/2-2-beautifulsoup-css.ipynb)
* [RegEx](/notebook/2-3-beautifulsoup-regex.ipynb)
* [Practice random scraping](/notebook/2-4-practice-baidu-baike.ipynb)
* Requests and Download
* [Requests](/notebook/3-1-requests.ipynb)
* [Download](/notebook/3-2-download.ipynb)
* [Practice download image](/notebook/3-3-practice-download-images.ipynb)
* Speed up scraping
* [Distributed scraping (multiprocessing)](/notebook/4-1-distributed-scraping.ipynb)
* [Asyncio](/notebook/4-2-asyncio.ipynb)
* Advanced
* [Selenium](/notebook/5-1-selenium.ipynb)
* [Scrapy](/notebook/5-2-scrapy.ipynb)# Donation
*If this does help you, please consider donating to support me for better tutorials. Any contribution is greatly appreciated!*