https://github.com/norskregnesentral/neuraltextsanitizer

Neural models for detecting and masking personal information from texts
https://github.com/norskregnesentral/neuraltextsanitizer

Last synced: 5 months ago
JSON representation

Neural models for detecting and masking personal information from texts

Host: GitHub
URL: https://github.com/norskregnesentral/neuraltextsanitizer
Owner: NorskRegnesentral
License: mit
Created: 2022-06-01T17:13:49.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-11-25T13:07:21.000Z (almost 3 years ago)
Last Synced: 2025-04-05T20:04:41.633Z (6 months ago)
Language: Python
Size: 4.2 MB
Stars: 15
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # NeuralTextSanitizer

Text sanitization with explicit measures of privacy risk.

For ```python>=3.7```:

* Download the three models from [this link](https://drive.google.com/drive/folders/1p9znczAIruZKvUxY0hLRy5YXyj0SfOYk?usp=sharing) and place them in the SampleData folder

* ```python -m pip install -r requirements.txt```

The input should be a file containing the text(s) to be sanitized. See *sample2.json* and *sample.json* in the SampleData folder for an example input.

| Field  | Description | |

| ------------- | ------------- | ------------- |

| text  | The text to be sanitized  | required |

| target  | The individual to be protected in the text | required |

| annotations| Manual annotated start and end offsets, and semantic label of PII in the text | optional |

To run the whole pipeline, provide the path to an input file as follows:

* ```python sanitize.py SampleData/sample2.json```

The output is a json file containing the masking decisions of each module of the pipeline. More specifically:

| Field  | Description |

| ------------- | ------------- |

| opt_decision  | The masking decisions after the Optimization Algorithm |

| PII  | Personally Identifiable Information in the text| 

| blacklist1| The masking decisions of the Language Model |

| blacklist2| The masking decisions of the Web Query model |

| blacklist3| The masking decisions of the Mask Classifier model |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/norskregnesentral/neuraltextsanitizer

Awesome Lists containing this project

README