Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wikidata/purdue-data-mine-2024
Program materials for WMDE's 2024 Purdue Data Mine project
https://github.com/wikidata/purdue-data-mine-2024
analytics data-analysis data-quality data-science etl open-data python wikidata wikimedia
Last synced: about 1 month ago
JSON representation
Program materials for WMDE's 2024 Purdue Data Mine project
- Host: GitHub
- URL: https://github.com/wikidata/purdue-data-mine-2024
- Owner: Wikidata
- License: bsd-3-clause
- Created: 2024-01-08T19:12:01.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-08-14T08:25:30.000Z (4 months ago)
- Last Synced: 2024-08-14T09:46:15.087Z (4 months ago)
- Topics: analytics, data-analysis, data-quality, data-science, etl, open-data, python, wikidata, wikimedia
- Language: Jupyter Notebook
- Homepage:
- Size: 114 MB
- Stars: 0
- Watchers: 5
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# WMDE x The Data Mine
This repository contains the program materials and student work for Wikimedia Deutschland's project in the [2024 Purdue Data Mine](https://datamine.purdue.edu/). Students will focus on comparing data from [Wikidata](https://www.wikidata.org/) with external data sources and then derive and report mismatches for the [Wikidata Mismatch Finder](https://www.wikidata.org/wiki/Wikidata:Mismatch_Finder). The corrections of these mismatches by the Wikidata community will then serve to improve Wikidata's data and all downstream projects including [Wikipedia](https://www.wikipedia.org/).
**Note**: The final blogpost for the project can be found [WIP](WIP).
## **Contents**
- [mismatch_generation](https://github.com/Wikidata/Purdue-Data-Mine-2024/tree/main/mismatch_generation)
- Student work to derive mismatches between Wikidata and external sources
- [notebooks](https://github.com/Wikidata/Purdue-Data-Mine-2024/tree/main/notebooks)
- Program materials to introduce Python, Jupyter, Wikidata data access and more## Mismatch Process
You can see the process to generate mismatches at the following [Mermaid live editor link](https://mermaid.live/edit#pako:eNqFWNtu5DYS_RWiH3Y8gG2sL5kB_LCAb7Mw4Ak8tpN5iPaBktjdXEuklqJsd4IA-ZDdn8uXpKpIiqRa3euHGbVEFuty6lQVf1tUuhaLi8XK8G7Nnm8K1Q-l-_GzMHK5KRSDv1oaUVmpFXu-cm_w7-agWHzRhol33naNOGRyyfRgWLXWvVCs5paXvBdM9uzr0MvqynCpfj2MAt4Ee9NDU7NecFOt2RKE_fnHfx8ukuWMq5otJfzTGd0JY6Xoo4RGvgjckm4wohF47MrooWN3N-zg4fzs08c___gfs2tuYY96SURw9l2-SFSW8aYcWmY1vMs0wNfF4mPccw2Wfxes4ipVHX6JdyuM4g0cG1d7xTes3DC76aRasYeL-Fkq0EuwJydIu1-jTm-i7KUV2fFXcLwLj7NoPFXWQlm5lML0W8eD36L9UT7IbpPFb9KuSYExeuJd9haVDFtIFaHqBCt3Ucg8VL6Axo-DAsf-ZxBmg1bSw_GbF3qszSour_QAdoCf1vqNtVxtnJpszV8FahdXJg6Pfl7C9jrz2O3EY5w12jK93OkHbgQ5StSIB9ozeoT3hFyubFxPuIdcwtWlYEMvlkPjUBGOgtAYBO-2-06mrrt_jJLTPMS_u1zRS7Dseq016hVVJIXXPMsUQD1qN1oMz5VuO7A0xSKo3HJSgq8A_b2d0fdRONTvi7hbAcp5XH-48br9yFvBLh_uPiCmeMgfoVZS5Sg3_pQTxI5_RoqRZBlmG0iBjXWnZRqKA85-erzHEGF-Dj3FhFeV6Hty0MfslG4oG1nBEV85UEk_GAIYiQbecl-bxP1OkCwbwQ4UBLbjss4lDkovl7KSvAGpz15WyzeIC1AI2CquwG9ATEn-rQVo0OtWlLrekKF9BU7vEIg-LRPUGbSvRrG8oUSwEjKkFXatCbgrkTiGor40umWRViYwgzPAam_80B1ZfQS7gMLVLEYOWW9l0wCLm5f-kKgaVQchKslmAiPGAhyAWS9RYwt28UbMoOt0XzZgJcCygwUBrACvxuBmxBBS4YMLOujlEZGyDKEfAUW0FzLjQ5_FM0c7eWAJHpqg0lLgwGuQhUPbTax_Qy_xpgE6GygLa_2mGs3rdJ1AigCrE6YB4lCVIMcC2OLilsCqHTc-3j49Q7aAa3ubqy6XP2qn6r3WLxA5OBkKGxArZw8bAIkCXigNN5uZaNXosdLBL4snFd0rwQcoNUByT1BmZ8J4IyooR_sooi1Bsdv3i6zY9lZD4gP_Wiw76ImkYfBBpJ6hJvmzLJZsTguXi7L7OMkKWJUokfnQnYMk5CyCFJUQckehm26HBqFpaWUPqVKtwSTImhk3XVqoCuVg93pKvO_2FOKGCl4WI8paKDrIDM4bqNLoiVqnXRQVNACogwYQVC1AI5KBXdPBw8nfP30-y1mOWxvy0DmEB0Pg2BzVs67a9sTZvrxP0URlL7otl_IsQnDnXflmgPawfcP_WeeSIFRuoEvC_wQdzi9oSsiy7bTBUg9lp2fUCGqkA8_DmdsQFKfBcSgR-hYXJsx9u5m0hVnb4XYo1jVDEmlpXV67qodq-PqVlQnXduFXWh9YKlXNgmEIc_Qg9DMWvUFNmFaC2iOqHduhJeWW0sAuq-PnFwVkh_2b9ytS1UzUr9PUmQ8ZQubE11Ij7GAUFBBd_huWunK2ZXArV2uL5YYs9UxXS1gPOZjwGKjca2oR8OMvgPwXsQHI_2sEgwbLMy-hMiN88OAcO9MKPernaKRO0iR1VihFWR2a8db5vhxxcAPl_pmoMmqwXRoTXHukRn3cCMdL_Zo3ZTG9aA4Kwcu1_HZ30-8nfkyB72HUSEktzJIzQ2KJKMMuB5V2SMaZIM9XmhAmA6EnQD8Jnn_MGi7Mqp2MhyuzWiANKH6nmMtvaHtqaHuWkQTzMSUZ8k6O2e32lBZHltDsAkLg_wa-TSoyTW3ZZEgpScNJzMO46fSYYb5EuI1HBSTqpBW6fRcVLuLs6eHy8dt9HNMy5Ixnf6PPT8K8SmhPIBbf7sUr2EOdDTWRmd8w0xJcUuLppbMggmXMlzii0KyWjmI76AejGLw-WpqTs828AagiudAW5b0qJrISBsvn_63dP-zLRuCDxOR8zpyAdjpszsXTs780bOfwtjW7-943C8XE3ZRQVI_wIVc172O9D3vhTQinpUqCbE8qjnPnrwrQufvoQbolpy7VBHd5ZWjiyFMqeNcVUVdBw33KeDMQa1Tg4iwYWcr81Isg0Dvc8TttLQX4gLie2u9wl-FDhkNTWl3Qe2ldcNqFcWRa3eO-s6jF3NFBXISyN2vSKuwec3D3K28GkV3qjMvPj9kdjTObuLIniOK4C_2PhFdgDON17XuiRrxyZdPBMMFtia0E1RifTu6CbfyFpy8NXgqM2ikhkKCctHygcdAYC_BD3r_5zyDE6GGVTGwEIr10zTJBXdS7735C8-7Gugnj8rxhd9OYYwlvAE_aElcoxihv58OnfRziQ3fqiQQVC9HcbjV24X6LIIJ-WZ-RXC_F_HS9VA_HB0h-DWHDBhYsAyygDOCGvDmE2bDpjzufjTAWG9FpDJHfICYUuMELtBlMwM5KG_TNjO8-7_edgoqEHHztnkj_DIjxGILk9dPPcXtyHaTBp50GAT6DENDQsKMD3uGYZmiz2xHWQZzogqpmQ4dDPm70LDHxX35xRIvR2W5Xqmt6b3jFjo7-wa4LdU0PN4W6pYcvhbqkh3Bl6O90jvCdvzYcL9VoYbjY8NcY-MrdeLG_JddYhaJswK_xLmS8-zg6IurDr3Qvkn9SOvlKlxGF8hM1vWzLQkFG0bN4L5SfsfFnMuLRaEIvaX5yCp3SC5rnCoVpQb9pWCiUvwLDN65lpamG_BbKJnUh-AbwA0pBZXUKgXgkWfzhiiX-6w9H5ihUoBmyyhcrpwImK_Pudml0l6yFYBbKQ8LpSsAslLMNZLjNMAmf0QO0--f0AJ3GD_QAdPGJHj4vDhetAIKU9eJi8RuiqFgAYlogmQt4rLl5KRaF-h3W8cHqp42qFhfWDOJwMXSYgTeSQxq17uXvfwE2cWie). The resulting diagram is further displayed below: