{"id":16826772,"url":"https://github.com/jdkato/openpdi","last_synced_at":"2025-10-24T13:09:19.491Z","repository":{"id":57449164,"uuid":"153943607","full_name":"jdkato/openpdi","owner":"jdkato","description":"A Python 3 library for decentralized aggregation of data from the Police Data Initiative (PDI).","archived":false,"fork":false,"pushed_at":"2020-12-18T01:35:50.000Z","size":128,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-14T03:43:07.694Z","etag":null,"topics":["data-science","machine-learning","nlp","open-data","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jdkato.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION","codeowners":null,"security":null,"support":null}},"created_at":"2018-10-20T20:02:35.000Z","updated_at":"2021-09-29T17:29:47.000Z","dependencies_parsed_at":"2022-09-14T07:32:20.561Z","dependency_job_id":null,"html_url":"https://github.com/jdkato/openpdi","commit_stats":null,"previous_names":["openpdi/openpdi"],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jdkato%2Fopenpdi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jdkato%2Fopenpdi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jdkato%2Fopenpdi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jdkato%2Fopenpdi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jdkato","download_url":"https://codeload.github.com/jdkato/openpdi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244101955,"owners_count":20398378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","machine-learning","nlp","open-data","python3"],"created_at":"2024-10-13T11:18:22.629Z","updated_at":"2025-10-24T13:09:14.461Z","avatar_url":"https://github.com/jdkato.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenPDI [![Build Status](https://travis-ci.org/OpenPDI/openpdi.svg?branch=master)](https://travis-ci.org/OpenPDI/openpdi) [![code style](https://img.shields.io/badge/code%20style-black-%23000.svg)](https://github.com/OpenPDI/openpdi) [![DOI](https://zenodo.org/badge/153943607.svg)](https://zenodo.org/badge/latestdoi/153943607) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/openpdi)](https://pypi.org/project/openpdi/)\n\nOpenPDI is an unofficial effort to document and standardize data submitted to\nthe [Police Data Initiative][3] (PDI). The goal is to make the data more accessible\nby addressing a number of issues related to a lack of\nstandardization\u0026mdash;namely,\n\n- **File types**: While some agencies make use if the\n  [Socrata Open Data API](https://dev.socrata.com/), many provide their data\n  in raw `.csv`, `.xlsx`, or `.xls` files of varying structures.\n- **Column names**: Many columns that represent the same data (e.g., `race`) are named differently across departments, cities, and states.\n- **Value formats**: Dates, times, and other comparable fields are submitted in\n  many different formats.\n- **Column availability**: It's currently very difficult to identify data\n  sources that contain certain columns\u0026mdash;e.g., *Use of Force* data\n  specifying the hire date of the involved officer(s).\n\n## Getting Started\n\n###### Installation\n\n```shell\n$ pip install openpdi\n```\n\n###### Usage\n\n| Dataset           | ID    | Source                                                      |\n|-------------------|-------|-------------------------------------------------------------|\n| [Use of Force][1] | `uof` | https://www.policedatainitiative.org/datasets/use-of-force/ |\n\n```python\nimport csv\nimport openpdi\n\n# The library has a single entry point:\ndataset = openpdi.Dataset(\n    # The dataset ID (see the table above).\n    \"uof\",\n    # Limit the data sources to a specific state using its two-letter code.\n    #\n    # Default: `scope=[]`.\n    scope=[\"TX\"],\n    # A list of columns that must be provided in every data source included in\n    # this dataset. See `openpdi/meta/{ID}/schema.json` for the available\n    # columns.\n    #\n    # Default: `columns=[]`.\n    columns=[\"reason\"],\n    # If `True`, only return the user-specified columns -- i.e., those listed\n    # in the `columns` parameter.\n    #\n    # Default: `strict=False`.\n    strict=False)\n\n# The names of the agencies included in this dataset:\nprint(dataset.agencies)\n\n# The URLs of the external data sources inlcuded in this dataset:\nprint(dataset.sources)\n\n# `gen` is a generator object for iterating over the CSV-formatted dataset.\ngen = dataset.download()\n\n# Write to a CSV file:\nwith open(\"dataset.csv\", \"w+\") as f:\n    writer = csv.writer(f, delimiter=\",\", quoting=csv.QUOTE_ALL)\n    writer.writerows(gen)\n```\n\n## Datasets\n\nIn an attempt to avoid unnecessary bloat (in terms of GBs), we don't actually store any PDI data in this repository. Instead, we store small, JSON-formatted descriptions of externally hosted datasets\u0026mdash;for example, [`uof/CA/meta.json`](https://github.com/OpenPDI/openpdi/blob/master/openpdi/meta/uof/CA/meta.json):\n\n```json\n[\n    {\n        \"url\": \"https://www.norwichct.org/Archive.aspx?AMID=61\u0026Type=Recent\",\n        \"type\": \"csv\",\n        \"start\": 1,\n        \"columns\": {\n            \"date\": {\n                \"index\": 0,\n                \"specifier\": \"%m/%d/%Y\"\n            },\n            \"city\": {\n                \"raw\": \"Richmond\"\n            },\n            \"state\": {\n                \"raw\": \"CA\"\n            },\n            \"service_type\": {\n                \"index\": 1\n            },\n            \"force_type\": {\n                \"index\": 10\n            },\n            \"light_conditions\": {\n                \"index\": 8\n            },\n            \"weather_conditions\": {\n                \"index\": 7\n            },\n            \"reason\": {\n                \"index\": 2\n            },\n            \"officer_injured\": {\n                \"index\": 6\n            },\n            \"officer_race\": {\n                \"index\": 9\n            },\n            \"subject_injured\": {\n                \"index\": 5\n            },\n            \"aggravating_factors\": {\n                \"index\": 3\n            },\n            \"arrested\": {\n                \"index\": 4\n            }\n        }\n    }\n]\n```\n\nThis file describes a Use of Force (`uof`) dataset from Richmond, CA. Each entry in the `columns` array maps a column from the externally-hosted data to a column in the dataset's schema file ([`uof/schema.json`](https://github.com/OpenPDI/openpdi/blob/master/openpdi/meta/uof/schema.json)).\n\n![flow][4]\n\nThe `schema.json` file assigns a `format` to every possible column in a particular dataset, which is a Python function tasked with standardizing a raw column value (see [`openpdi/validators.py`](https://github.com/OpenPDI/openpdi/blob/master/openpdi/validators.py)).\n\n[1]: https://github.com/jdkato/OpenPDI/tree/master/openpdi/meta/uof\n[2]: https://www.policedatainitiative.org/datasets/use-of-force/\n[3]: https://www.policedatainitiative.org/\n[4]: https://user-images.githubusercontent.com/8785025/49119503-6975ac80-f25d-11e8-9310-802492815b39.png\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjdkato%2Fopenpdi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjdkato%2Fopenpdi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjdkato%2Fopenpdi/lists"}