An open API service indexing awesome lists of open source software.

https://github.com/gesistsa/python-web-data-collection-tutorial

Tutorial of Web data collection with Python.
https://github.com/gesistsa/python-web-data-collection-tutorial

beautifulsoup data-science python web-crawling wikipedia

Last synced: about 1 month ago
JSON representation

Tutorial of Web data collection with Python.

Awesome Lists containing this project

README

          

# Tutorial: Web data collection with Python

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/gesistsa/python-web-data-collection-tutorial/HEAD)

This tutorial is based on the content in the GESIS fall seminar [Automated Web Data Collection with Python](https://training.gesis.org/?site=pDetails&child=full&pID=0x4693CE99CF9F4C0FB26F47EA79E611BA&subID=0x428CC87C985440C695B86BA777535CB4) in 2023 and has two parts.
In the first part we discuss the use of Web API as data source and use the MediaWiki API which powers Wikipedia as an example.
In the second part we discuss how to collect data from static web pages with Python.
There are lecture units and corresponding exercises with solutions for each part.

## Table of content

* Part 1 - Wikipedia

* [Lecture 1 - MediaWiki API](Part%201%20-%20Wikipedia/Lecture%201%20-%20MediaWiki%20API.ipynb)
* [Lecture 2 - Python packages for Wikipedia](Part%201%20-%20Wikipedia/Lecture%202%20-%20Python%20packages%20for%20Wikipedia.ipynb)
* [Exercise 1 - MediaWiki API](Part%201%20-%20Wikipedia/Exercise%201%20-%20MediaWiki%20API.ipynb), [solution](Part%201%20-%20Wikipedia/Exercise%201%20-%20MediaWiki%20API%20-%20solution.ipynb)
* [Exercise 2 - Python packages for Wikipedia](Part%201%20-%20Wikipedia/Exercise%202%20-%20Python%20packages%20for%20Wikipedia.ipynb), [solution](Part%201%20-%20Wikipedia/Exercise%202%20-%20Python%20packages%20for%20Wikipedia%20-%20solution.ipynb)

* Part 2 - Static web scraping

* [Lecture 1 - Static web scraping 1](Part%202%20-%20Static%20web%20scraping/Lecture%201%20-%20Static%20web%20scraping%201.ipynb)
* [Lecture 2 - Static web scraping 2](Part%202%20-%20Static%20web%20scraping/Lecture%202%20-%20Static%20web%20scraping%202.ipynb)
* [Lecture 3 - Static web scraping 3](Part%202%20-%20Static%20web%20scraping/Lecture%203%20-%20Static%20web%20scraping%203.ipynb)
* [Exercise 1 - Static web scraping 1](Part%202%20-%20Static%20web%20scraping/Exercise%201%20-%20Static%20web%20scraping%201.ipynb), [solution](Part%202%20-%20Static%20web%20scraping/Exercise%201%20-%20Static%20web%20scraping%201%20-%20solution.ipynb)
* [Exercise 2 - Static web scraping 2](Part%202%20-%20Static%20web%20scraping/Exercise%202%20-%20Static%20web%20scraping%202.ipynb), [solution](Part%202%20-%20Static%20web%20scraping/Exercise%202%20-%20Static%20web%20scraping%202%20-%20solution.ipynb)
* [Exercise 3 - Static web scraping 3](Part%202%20-%20Static%20web%20scraping/Exercise%203%20-%20Static%20web%20scraping%203.ipynb), [solution](Part%202%20-%20Static%20web%20scraping/Exercise%203%20-%20Static%20web%20scraping%203%20-%20solution.ipynb)