{"id":27879387,"url":"https://github.com/gesistsa/python-web-data-collection-tutorial","last_synced_at":"2026-04-29T14:31:45.652Z","repository":{"id":183934698,"uuid":"670639142","full_name":"gesistsa/python-web-data-collection-tutorial","owner":"gesistsa","description":"Tutorial of Web data collection with Python.","archived":false,"fork":false,"pushed_at":"2023-10-23T13:26:51.000Z","size":2705,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-10T19:06:18.564Z","etag":null,"topics":["beautifulsoup","data-science","python","web-crawling","wikipedia"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gesistsa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-07-25T13:53:25.000Z","updated_at":"2023-11-15T15:34:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"ae4e342d-3f06-4698-9758-3c9566225318","html_url":"https://github.com/gesistsa/python-web-data-collection-tutorial","commit_stats":null,"previous_names":["yfiua/python-web-data-collection-tutorial","gesistsa/python-web-data-collection-tutorial"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gesistsa/python-web-data-collection-tutorial","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gesistsa%2Fpython-web-data-collection-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gesistsa%2Fpython-web-data-collection-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gesistsa%2Fpython-web-data-collection-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gesistsa%2Fpython-web-data-collection-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gesistsa","download_url":"https://codeload.github.com/gesistsa/python-web-data-collection-tutorial/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gesistsa%2Fpython-web-data-collection-tutorial/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32429097,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T13:34:34.882Z","status":"ssl_error","status_checked_at":"2026-04-29T13:34:29.830Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","data-science","python","web-crawling","wikipedia"],"created_at":"2025-05-05T03:21:20.333Z","updated_at":"2026-04-29T14:31:45.632Z","avatar_url":"https://github.com/gesistsa.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tutorial: Web data collection with Python\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/gesistsa/python-web-data-collection-tutorial/HEAD)\n\nThis tutorial is based on the content in the GESIS fall seminar [Automated Web Data Collection with Python](https://training.gesis.org/?site=pDetails\u0026child=full\u0026pID=0x4693CE99CF9F4C0FB26F47EA79E611BA\u0026subID=0x428CC87C985440C695B86BA777535CB4) in 2023 and has two parts.\nIn the first part we discuss the use of Web API as data source and use the MediaWiki API which powers Wikipedia as an example.\nIn the second part we discuss how to collect data from static web pages with Python.\nThere are lecture units and corresponding exercises with solutions for each part.\n\n## Table of content\n\n* Part 1 - Wikipedia\n\n  * [Lecture 1 - MediaWiki API](Part%201%20-%20Wikipedia/Lecture%201%20-%20MediaWiki%20API.ipynb)\n  * [Lecture 2 - Python packages for Wikipedia](Part%201%20-%20Wikipedia/Lecture%202%20-%20Python%20packages%20for%20Wikipedia.ipynb)\n  * [Exercise 1 - MediaWiki API](Part%201%20-%20Wikipedia/Exercise%201%20-%20MediaWiki%20API.ipynb), [solution](Part%201%20-%20Wikipedia/Exercise%201%20-%20MediaWiki%20API%20-%20solution.ipynb)\n  * [Exercise 2 - Python packages for Wikipedia](Part%201%20-%20Wikipedia/Exercise%202%20-%20Python%20packages%20for%20Wikipedia.ipynb), [solution](Part%201%20-%20Wikipedia/Exercise%202%20-%20Python%20packages%20for%20Wikipedia%20-%20solution.ipynb)\n\n* Part 2 - Static web scraping\n\n  * [Lecture 1 - Static web scraping 1](Part%202%20-%20Static%20web%20scraping/Lecture%201%20-%20Static%20web%20scraping%201.ipynb)\n  * [Lecture 2 - Static web scraping 2](Part%202%20-%20Static%20web%20scraping/Lecture%202%20-%20Static%20web%20scraping%202.ipynb)\n  * [Lecture 3 - Static web scraping 3](Part%202%20-%20Static%20web%20scraping/Lecture%203%20-%20Static%20web%20scraping%203.ipynb)\n  * [Exercise 1 - Static web scraping 1](Part%202%20-%20Static%20web%20scraping/Exercise%201%20-%20Static%20web%20scraping%201.ipynb), [solution](Part%202%20-%20Static%20web%20scraping/Exercise%201%20-%20Static%20web%20scraping%201%20-%20solution.ipynb)\n  * [Exercise 2 - Static web scraping 2](Part%202%20-%20Static%20web%20scraping/Exercise%202%20-%20Static%20web%20scraping%202.ipynb), [solution](Part%202%20-%20Static%20web%20scraping/Exercise%202%20-%20Static%20web%20scraping%202%20-%20solution.ipynb)\n  * [Exercise 3 - Static web scraping 3](Part%202%20-%20Static%20web%20scraping/Exercise%203%20-%20Static%20web%20scraping%203.ipynb), [solution](Part%202%20-%20Static%20web%20scraping/Exercise%203%20-%20Static%20web%20scraping%203%20-%20solution.ipynb)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgesistsa%2Fpython-web-data-collection-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgesistsa%2Fpython-web-data-collection-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgesistsa%2Fpython-web-data-collection-tutorial/lists"}