https://github.com/planeshifter/deidentify
De-identification of Protected Health Information according to HIPAA Privacy Rule
https://github.com/planeshifter/deidentify
Last synced: over 1 year ago
JSON representation
De-identification of Protected Health Information according to HIPAA Privacy Rule
- Host: GitHub
- URL: https://github.com/planeshifter/deidentify
- Owner: Planeshifter
- License: gpl-2.0
- Created: 2015-09-09T22:46:39.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2024-05-30T20:29:48.000Z (about 2 years ago)
- Last Synced: 2025-02-27T16:17:30.228Z (over 1 year ago)
- Language: JavaScript
- Size: 612 KB
- Stars: 40
- Watchers: 5
- Forks: 8
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# deidentify
[](https://badge.fury.io/js/deidentify)
[](https://david-dm.org/planeshifter/deidentify)
[](https://david-dm.org/planeshifter/deidentify?type=dev)
#### De-Identification of Free-Text Medical Record Data
| File Processing | Batch Processing |
|:------|:------|
|
|
|
## Table of Contents
- [About](#about)
- [Features](#features)
- [Made using](#made-using)
- [Install](#install)
- [Documentation](#documentation)
- [License](#license)
- [Copyright](#copyright)
## About
> *deidentify* is a tool to remove personal identifiers from free-text medical record data. Detected identifiers are replaced by randomly generated substitutes. Consistency of the data is preserved as the same name, phone number or location will always be mapped to the same replacement.
## Features
- Facilities to remove all relevant identifiers of individuals from medical record information to comply with the [HIPAA "Safe Harbor" rule](http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html)
- Single file and batch processing
- Customizable options
- Persistent data store ensures consistency of mappings and allows re-identification of the de-identified data
## Made using
The *deidentify* tool uses several open-source projects.
The desktop application was created with [nw.js](http://nwjs.io/), formerly called *node-webkit*, and is entirely written in JavaScript.
Our de-identification procedure combines hand-crafted regular expressions with the named entity recognizer (NER) developed by the Stanford Natural Language Processing Group, which provides a Conditional Random Field (CRF) model for detecting the three classes PERSON, ORGANIZATION, LOCATION.
Reference:
> Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by gibbs sampling. In Acl, (1995), 363 – 370. doi:10.3115/1219840.1219885
To generate random replacements for detected identifiers, the [chance.js](http://chancejs.com/) library is used. [NeDB](https://github.com/louischatriot/nedb) is used as a data store, keeping track of the mappings from original identifiers to replacements.
Other used libraries include
- [async](https://github.com/caolan/async)
- [bootstrap](http://getbootstrap.com/)
- [jQuery](http://jquery.com/)
## Prerequisites
This software uses the Stanford NER tool, which requires [Java 1.8](https://java.com/en/) or later.
## Install
Installers for Windows, MacOS and Linux can be downloaded from the [releases page](https://github.com/Planeshifter/deidentify/releases).
## Build from source
Clone the repository via the following command:
``` bash
git clone --recursive https://github.com/Planeshifter/deidentify.git
```
Change into the newly created directory, install npm dependencies and run the `init` script:
``` bash
npm install
npm run init
```
## Run
Start the program by executing the following command from the project directory:
```
npm start
```
---
## License
This project is licensed under the [GNU General Public License v2.0](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html).
## Copyright
Copyright © 2015-2018. Philipp Burckhardt.