https://github.com/gesistsa/python-web-data-collection-tutorial
Tutorial of Web data collection with Python.
https://github.com/gesistsa/python-web-data-collection-tutorial
beautifulsoup data-science python web-crawling wikipedia
Last synced: about 1 month ago
JSON representation
Tutorial of Web data collection with Python.
- Host: GitHub
- URL: https://github.com/gesistsa/python-web-data-collection-tutorial
- Owner: gesistsa
- License: apache-2.0
- Created: 2023-07-25T13:53:25.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-23T13:26:51.000Z (over 2 years ago)
- Last Synced: 2025-06-10T19:06:18.564Z (about 1 year ago)
- Topics: beautifulsoup, data-science, python, web-crawling, wikipedia
- Language: Jupyter Notebook
- Homepage:
- Size: 2.58 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Tutorial: Web data collection with Python
[](https://mybinder.org/v2/gh/gesistsa/python-web-data-collection-tutorial/HEAD)
This tutorial is based on the content in the GESIS fall seminar [Automated Web Data Collection with Python](https://training.gesis.org/?site=pDetails&child=full&pID=0x4693CE99CF9F4C0FB26F47EA79E611BA&subID=0x428CC87C985440C695B86BA777535CB4) in 2023 and has two parts.
In the first part we discuss the use of Web API as data source and use the MediaWiki API which powers Wikipedia as an example.
In the second part we discuss how to collect data from static web pages with Python.
There are lecture units and corresponding exercises with solutions for each part.
## Table of content
* Part 1 - Wikipedia
* [Lecture 1 - MediaWiki API](Part%201%20-%20Wikipedia/Lecture%201%20-%20MediaWiki%20API.ipynb)
* [Lecture 2 - Python packages for Wikipedia](Part%201%20-%20Wikipedia/Lecture%202%20-%20Python%20packages%20for%20Wikipedia.ipynb)
* [Exercise 1 - MediaWiki API](Part%201%20-%20Wikipedia/Exercise%201%20-%20MediaWiki%20API.ipynb), [solution](Part%201%20-%20Wikipedia/Exercise%201%20-%20MediaWiki%20API%20-%20solution.ipynb)
* [Exercise 2 - Python packages for Wikipedia](Part%201%20-%20Wikipedia/Exercise%202%20-%20Python%20packages%20for%20Wikipedia.ipynb), [solution](Part%201%20-%20Wikipedia/Exercise%202%20-%20Python%20packages%20for%20Wikipedia%20-%20solution.ipynb)
* Part 2 - Static web scraping
* [Lecture 1 - Static web scraping 1](Part%202%20-%20Static%20web%20scraping/Lecture%201%20-%20Static%20web%20scraping%201.ipynb)
* [Lecture 2 - Static web scraping 2](Part%202%20-%20Static%20web%20scraping/Lecture%202%20-%20Static%20web%20scraping%202.ipynb)
* [Lecture 3 - Static web scraping 3](Part%202%20-%20Static%20web%20scraping/Lecture%203%20-%20Static%20web%20scraping%203.ipynb)
* [Exercise 1 - Static web scraping 1](Part%202%20-%20Static%20web%20scraping/Exercise%201%20-%20Static%20web%20scraping%201.ipynb), [solution](Part%202%20-%20Static%20web%20scraping/Exercise%201%20-%20Static%20web%20scraping%201%20-%20solution.ipynb)
* [Exercise 2 - Static web scraping 2](Part%202%20-%20Static%20web%20scraping/Exercise%202%20-%20Static%20web%20scraping%202.ipynb), [solution](Part%202%20-%20Static%20web%20scraping/Exercise%202%20-%20Static%20web%20scraping%202%20-%20solution.ipynb)
* [Exercise 3 - Static web scraping 3](Part%202%20-%20Static%20web%20scraping/Exercise%203%20-%20Static%20web%20scraping%203.ipynb), [solution](Part%202%20-%20Static%20web%20scraping/Exercise%203%20-%20Static%20web%20scraping%203%20-%20solution.ipynb)