Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wikidata/purdue-data-mine-2024

Program materials for WMDE's 2024 Purdue Data Mine project
https://github.com/wikidata/purdue-data-mine-2024

analytics data-analysis data-quality data-science etl open-data python wikidata wikimedia

Last synced: about 1 month ago
JSON representation

Program materials for WMDE's 2024 Purdue Data Mine project

Awesome Lists containing this project

README

        


Project Banner

# WMDE x The Data Mine

This repository contains the program materials and student work for Wikimedia Deutschland's project in the [2024 Purdue Data Mine](https://datamine.purdue.edu/). Students will focus on comparing data from [Wikidata](https://www.wikidata.org/) with external data sources and then derive and report mismatches for the [Wikidata Mismatch Finder](https://www.wikidata.org/wiki/Wikidata:Mismatch_Finder). The corrections of these mismatches by the Wikidata community will then serve to improve Wikidata's data and all downstream projects including [Wikipedia](https://www.wikipedia.org/).

**Note**: The final blogpost for the project can be found [WIP](WIP).

## **Contents**

- [mismatch_generation](https://github.com/Wikidata/Purdue-Data-Mine-2024/tree/main/mismatch_generation)
- Student work to derive mismatches between Wikidata and external sources
- [notebooks](https://github.com/Wikidata/Purdue-Data-Mine-2024/tree/main/notebooks)
- Program materials to introduce Python, Jupyter, Wikidata data access and more

## Mismatch Process

You can see the process to generate mismatches at the following [Mermaid live editor link](https://mermaid.live/edit#pako:eNqFWNtu5DYS_RWiH3Y8gG2sL5kB_LCAb7Mw4Ak8tpN5iPaBktjdXEuklqJsd4IA-ZDdn8uXpKpIiqRa3euHGbVEFuty6lQVf1tUuhaLi8XK8G7Nnm8K1Q-l-_GzMHK5KRSDv1oaUVmpFXu-cm_w7-agWHzRhol33naNOGRyyfRgWLXWvVCs5paXvBdM9uzr0MvqynCpfj2MAt4Ee9NDU7NecFOt2RKE_fnHfx8ukuWMq5otJfzTGd0JY6Xoo4RGvgjckm4wohF47MrooWN3N-zg4fzs08c___gfs2tuYY96SURw9l2-SFSW8aYcWmY1vMs0wNfF4mPccw2Wfxes4ipVHX6JdyuM4g0cG1d7xTes3DC76aRasYeL-Fkq0EuwJydIu1-jTm-i7KUV2fFXcLwLj7NoPFXWQlm5lML0W8eD36L9UT7IbpPFb9KuSYExeuJd9haVDFtIFaHqBCt3Ucg8VL6Axo-DAsf-ZxBmg1bSw_GbF3qszSour_QAdoCf1vqNtVxtnJpszV8FahdXJg6Pfl7C9jrz2O3EY5w12jK93OkHbgQ5StSIB9ozeoT3hFyubFxPuIdcwtWlYEMvlkPjUBGOgtAYBO-2-06mrrt_jJLTPMS_u1zRS7Dseq016hVVJIXXPMsUQD1qN1oMz5VuO7A0xSKo3HJSgq8A_b2d0fdRONTvi7hbAcp5XH-48br9yFvBLh_uPiCmeMgfoVZS5Sg3_pQTxI5_RoqRZBlmG0iBjXWnZRqKA85-erzHEGF-Dj3FhFeV6Hty0MfslG4oG1nBEV85UEk_GAIYiQbecl-bxP1OkCwbwQ4UBLbjss4lDkovl7KSvAGpz15WyzeIC1AI2CquwG9ATEn-rQVo0OtWlLrekKF9BU7vEIg-LRPUGbSvRrG8oUSwEjKkFXatCbgrkTiGor40umWRViYwgzPAam_80B1ZfQS7gMLVLEYOWW9l0wCLm5f-kKgaVQchKslmAiPGAhyAWS9RYwt28UbMoOt0XzZgJcCygwUBrACvxuBmxBBS4YMLOujlEZGyDKEfAUW0FzLjQ5_FM0c7eWAJHpqg0lLgwGuQhUPbTax_Qy_xpgE6GygLa_2mGs3rdJ1AigCrE6YB4lCVIMcC2OLilsCqHTc-3j49Q7aAa3ubqy6XP2qn6r3WLxA5OBkKGxArZw8bAIkCXigNN5uZaNXosdLBL4snFd0rwQcoNUByT1BmZ8J4IyooR_sooi1Bsdv3i6zY9lZD4gP_Wiw76ImkYfBBpJ6hJvmzLJZsTguXi7L7OMkKWJUokfnQnYMk5CyCFJUQckehm26HBqFpaWUPqVKtwSTImhk3XVqoCuVg93pKvO_2FOKGCl4WI8paKDrIDM4bqNLoiVqnXRQVNACogwYQVC1AI5KBXdPBw8nfP30-y1mOWxvy0DmEB0Pg2BzVs67a9sTZvrxP0URlL7otl_IsQnDnXflmgPawfcP_WeeSIFRuoEvC_wQdzi9oSsiy7bTBUg9lp2fUCGqkA8_DmdsQFKfBcSgR-hYXJsx9u5m0hVnb4XYo1jVDEmlpXV67qodq-PqVlQnXduFXWh9YKlXNgmEIc_Qg9DMWvUFNmFaC2iOqHduhJeWW0sAuq-PnFwVkh_2b9ytS1UzUr9PUmQ8ZQubE11Ij7GAUFBBd_huWunK2ZXArV2uL5YYs9UxXS1gPOZjwGKjca2oR8OMvgPwXsQHI_2sEgwbLMy-hMiN88OAcO9MKPernaKRO0iR1VihFWR2a8db5vhxxcAPl_pmoMmqwXRoTXHukRn3cCMdL_Zo3ZTG9aA4Kwcu1_HZ30-8nfkyB72HUSEktzJIzQ2KJKMMuB5V2SMaZIM9XmhAmA6EnQD8Jnn_MGi7Mqp2MhyuzWiANKH6nmMtvaHtqaHuWkQTzMSUZ8k6O2e32lBZHltDsAkLg_wa-TSoyTW3ZZEgpScNJzMO46fSYYb5EuI1HBSTqpBW6fRcVLuLs6eHy8dt9HNMy5Ixnf6PPT8K8SmhPIBbf7sUr2EOdDTWRmd8w0xJcUuLppbMggmXMlzii0KyWjmI76AejGLw-WpqTs828AagiudAW5b0qJrISBsvn_63dP-zLRuCDxOR8zpyAdjpszsXTs780bOfwtjW7-943C8XE3ZRQVI_wIVc172O9D3vhTQinpUqCbE8qjnPnrwrQufvoQbolpy7VBHd5ZWjiyFMqeNcVUVdBw33KeDMQa1Tg4iwYWcr81Isg0Dvc8TttLQX4gLie2u9wl-FDhkNTWl3Qe2ldcNqFcWRa3eO-s6jF3NFBXISyN2vSKuwec3D3K28GkV3qjMvPj9kdjTObuLIniOK4C_2PhFdgDON17XuiRrxyZdPBMMFtia0E1RifTu6CbfyFpy8NXgqM2ikhkKCctHygcdAYC_BD3r_5zyDE6GGVTGwEIr10zTJBXdS7735C8-7Gugnj8rxhd9OYYwlvAE_aElcoxihv58OnfRziQ3fqiQQVC9HcbjV24X6LIIJ-WZ-RXC_F_HS9VA_HB0h-DWHDBhYsAyygDOCGvDmE2bDpjzufjTAWG9FpDJHfICYUuMELtBlMwM5KG_TNjO8-7_edgoqEHHztnkj_DIjxGILk9dPPcXtyHaTBp50GAT6DENDQsKMD3uGYZmiz2xHWQZzogqpmQ4dDPm70LDHxX35xRIvR2W5Xqmt6b3jFjo7-wa4LdU0PN4W6pYcvhbqkh3Bl6O90jvCdvzYcL9VoYbjY8NcY-MrdeLG_JddYhaJswK_xLmS8-zg6IurDr3Qvkn9SOvlKlxGF8hM1vWzLQkFG0bN4L5SfsfFnMuLRaEIvaX5yCp3SC5rnCoVpQb9pWCiUvwLDN65lpamG_BbKJnUh-AbwA0pBZXUKgXgkWfzhiiX-6w9H5ihUoBmyyhcrpwImK_Pudml0l6yFYBbKQ8LpSsAslLMNZLjNMAmf0QO0--f0AJ3GD_QAdPGJHj4vDhetAIKU9eJi8RuiqFgAYlogmQt4rLl5KRaF-h3W8cHqp42qFhfWDOJwMXSYgTeSQxq17uXvfwE2cWie). The resulting diagram is further displayed below:


Mismatch generation process