{"id":23228378,"url":"https://github.com/openethicsai/oedp","last_synced_at":"2025-08-19T15:30:46.443Z","repository":{"id":89916640,"uuid":"388838903","full_name":"OpenEthicsAI/OEDP","owner":"OpenEthicsAI","description":"Open Ethics Data Passport","archived":false,"fork":false,"pushed_at":"2025-04-17T08:14:40.000Z","size":75,"stargazers_count":4,"open_issues_count":5,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-17T22:58:16.933Z","etag":null,"topics":["data-governance","datasets","json-schemas","model-management","oedp","open-ethics","training-data"],"latest_commit_sha":null,"homepage":"https://openethics.ai/oedp/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenEthicsAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-07-23T15:03:49.000Z","updated_at":"2025-02-23T09:06:45.000Z","dependencies_parsed_at":"2024-11-05T08:18:41.512Z","dependency_job_id":"e3276855-1f40-4c95-8e12-5711b80c978e","html_url":"https://github.com/OpenEthicsAI/OEDP","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/OpenEthicsAI/OEDP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenEthicsAI%2FOEDP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenEthicsAI%2FOEDP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenEthicsAI%2FOEDP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenEthicsAI%2FOEDP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenEthicsAI","download_url":"https://codeload.github.com/OpenEthicsAI/OEDP/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenEthicsAI%2FOEDP/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271176103,"owners_count":24712327,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-governance","datasets","json-schemas","model-management","oedp","open-ethics","training-data"],"created_at":"2024-12-19T01:13:34.488Z","updated_at":"2025-08-19T15:30:46.145Z","avatar_url":"https://github.com/OpenEthicsAI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Open Ethics Data Passport (OEDP)\n\n[![DOI](https://zenodo.org/badge/388838903.svg)](https://zenodo.org/badge/latestdoi/388838903)\n\n## This repository\nThis repository scopes the Open Ethics Data Passport (OEDP) and stores its JSON schema.\n\n## Contributors\n- Alice Pavaloiu\n- Nikita Lukianets\n- Abhiram Ravikumar\n\n## Key concepts\n- Data labeler\n- Supervised machine learning\n- Input-output mapping\n- Training data\n- Test data\n- Data model\n- Prediction\n- Bias\n- Fairness\n- Ethics vector\n- Data source\n\n## Problem\nSupervised learning remains one of the most widely used approaches in machine learning. A supervised learning approach requires data annotated by subject-matter experts to train machine learning algorithms. Part of the bias and algorithmic unfairness gets inherited from the historically labeled data. It is a socio-technological phenomenon that happens because people who label the data or the ones who make decisions mapping inputs to outputs unconsciously carry biases already (we are humans). Transparency around who labeled the data and the impact of the profile of the data labeler - their expertise, their personality, and their value hierarchies on the resulting fairness and accuracy properties of the machine learning models remain unknown.\n\n## Purpose\nBring transparency to the systemic properties of the AI models by developing a Data Passport. The Data Passport has a purpose at depicting the origins of the training datasets by bringing a standardized approach to convey information about data annotation processes, data labelers profiles, and correct scoping of the labeler’s job.\n\n## Objectives\n1. Establish a baseline self-disclosure approach to depict the origins of the training data\n2. Develop evaluation methodology to assess data labelers’ profile\n3. Develop a tool that will allow using this evaluation methodology\n4. Make this tool available for market players in data labeling\n5. Combine these approaches into a machine-readable Data Passport\n6. Make Data Passport an integral part of the Open Ethics Transparency Protocol\n\n## OEDP Structutre\n### 1. Product Profile\n    1.1. Data labeling approach – internal (in-house)/ outsourcing/ crowdsourcing/ data labeling service (specialized outsourcing)/ synthetic labeling/ data programming (scripts)\n    1.2. Tools used (if any), automation\n    1.3. Quality assurance – data accuracy, data quality\n    1.4. Process iteration\n    1.5. Hiring, training, and management of data labelers \u0026 integration of new members\n    1.6. Project management (planning, process operationalization, and measurement of success)\n    1.7. Scaling\n    1.8. Outsourcing\n    1.9. Security\n    1.10. Tracking performance\n    1.11. Product Scoping\n\n### 2. Labeler Profile\n    2.1.\tIdentity [name, email]\n    2.2.\tDetails on current project (if disclosable – description, duration, task) [text]\n    2.3.\tType of contract (employee, contractor, 3rd party platform, data-labeling service provider, etc.) \u0026 position in the company if applicable [select]\n    2.4.\tPrevious expertise \u0026 skills in data science (or only in data labeling if we isolate it) [list]\n    2.5.\tTraining provided by the company [list]\n    2.6.\tWhat tools are they using [list]\n\n## Taxonomy of Labeling approaches\n\n|     Approach                             \t|     Description                                                                                                              \t|\n|------------------------------------------\t|------------------------------------------------------------------------------------------------------------------------------\t|\n|     Internal   labeling                  \t|     Assignment of tasks to an in-house data science team                                                                   \t|\n|     Outsourcing                          \t|     Recruitment of   temporary employees on freelance platforms. posting vacancies on social media   and job search sites    \t|\n|     Crowdsourcing                        \t|     Cooperation with freelancers from crowdsourcing   platforms                                                              \t|\n|     Specialized outsourcing companies    \t|     Hiring an external team for a specific project                                                                         \t|\n|     Synthetic labeling                 \t|     Generating data with the same statistical properties and attributes as real data                                                                  \t|\n|     Data programming                     \t|     Using scripts that programmatically label datasets                                                 \t|\n\n## Roadmap and deliverables\n1. Machine-readable Data Passport [JSON Schema]\n2. Visualization for the JSON schema [web-based]\n3. Labeler Profile structure [JSON Schema]\n4. Profile assessment tool with identity management and labeling history [web-based]\n5. Profile management tool [web-based]\n6. Integration into the Open Ethics Transparency Protocol https://openethics.ai/oetp/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenethicsai%2Foedp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenethicsai%2Foedp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenethicsai%2Foedp/lists"}