{"id":18813229,"url":"https://github.com/nellore/deidentify","last_synced_at":"2025-10-27T18:37:21.279Z","repository":{"id":152814458,"uuid":"76735198","full_name":"nellore/deidentify","owner":"nellore","description":"Deidentifies LABS consortium data","archived":false,"fork":false,"pushed_at":"2016-12-23T06:46:15.000Z","size":32,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-30T00:24:52.891Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nellore.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-12-17T16:42:01.000Z","updated_at":"2016-12-17T17:39:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"62432ca9-6d8e-4369-9def-861bda6d9821","html_url":"https://github.com/nellore/deidentify","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nellore%2Fdeidentify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nellore%2Fdeidentify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nellore%2Fdeidentify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nellore%2Fdeidentify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nellore","download_url":"https://codeload.github.com/nellore/deidentify/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239748297,"owners_count":19690237,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T23:36:48.285Z","updated_at":"2025-10-27T18:37:16.245Z","avatar_url":"https://github.com/nellore.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# deidentify\n\nThis repo contains tools for deidentifying LABS consortium data.\n\n`date_eliminator.py` can eliminate years from date fields as well as all fields that are detected to contain days, months, and years from a directory of LABS 2 CSV files according to user input. We applied this script to LABS consortium data (in particular, the ASCII subdirectories from the LABS 2 CD) to remove date-related PHI as characterized in [this](https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/) document. On a first run, the script looks in each CSV for every column that either (1) appears to participate in a date occurring in three consecutive columns or (2) contains a date in `mm/dd/yyyy` or `dd/mm/yyyy` format; it then asks the user whether, respectively (1) the column should be eliminated or (2) the month and day should be removed from every date in the column. Its output is a new directory of CSVs with adjusted and removed date fields as well as a configuration file that allows reproducing the run.\n\nWe used [PyPy](https://bitbucket.org/pypy/pypy) 5.6.0 to run `date_eliminator.py`.\n\nUsage:\n```\npypy date_eliminator.py -i /path/to/input/directory -o /path/to/output/directory\n```\n\nMost of our deidentification is reproducible. To perform reproducible steps, `cat` the configuration file [`date_eliminator.conf`](date_eliminator.conf) into the script, as in\n```\ncat date_eliminator.conf | pypy date_eliminator.py \\\n    -i \"/path/to/Longitudinal Assessment of Bariatric Surgery (LABS-2) Preliminary/ASCII Database\" \\\n    -o /path/to/output/directory\n```\nAll months and days were removed except for `FORMV` fields, where dates simply identified form versions.\n\nWe handled `SW_MINUTE.csv` and `SW_SUMMARY.csv` separately. In particular, in `SW_MINUTE.csv`, we preserved days since some first date in the `CPTRDATE` field so users can recover time series. To reproduce our deidentification of these files, run\n```\npypy sw_edit.py -i /path/to/input/directory -o /path/to/output/directory\n```\nusing the same input and output directories as for `date_eliminator.py`.\n\nAfter running both `date_eliminator.py` and `sw_conf.py`, we navigated to `/path/to/output/directory` and ran\n```\nfor i in $(ls | grep -v SW_); do echo $i; echo '*****'; cut -d',' -f2- $i \\\n    | grep \"[0-9][0-9]*\\-[0-9][0-9]*\"; done | less\n```\nand\n```\nfor i in $(ls | grep -v SW_); do echo $i; echo '*****'; cut -d',' -f2- $i \\\n    | grep \"[0-9][0-9]*/[0-9][0-9]*\"; done | less\n```\nto search for residual expressions of the form `[NUMBER]/[NUMBER]` and `[NUMBER]-[NUMBER]` in all fields besides `FORMV`. We uncovered many instances corresponding to dates in free text fields, and we used [Sublime Text 3](https://www.sublimetext.com/) to replace them with the text \"[REDACTED]\". We also manually inspected `DIB.csv` and `RSI.csv`, using Sublime Text to replace potentially identifying keywords from occupations in the `EMPS` field and study withdrawal reasons in the `*REAS*` fields with the text \"[REDACTED]\". Including scripts to reproduce these replacements would have required putting identifying information in this repo, which explains why our results are only partially reproducible.\n\n# License\nThis software is licensed under the MIT License. See [`LICENSE`](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnellore%2Fdeidentify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnellore%2Fdeidentify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnellore%2Fdeidentify/lists"}