https://github.com/insightsengineering/presidio-cli
CLI tool that analyze Text for PII Entities with Microsoft Presidio framework.
https://github.com/insightsengineering/presidio-cli
pii presidio python
Last synced: 10 months ago
JSON representation
CLI tool that analyze Text for PII Entities with Microsoft Presidio framework.
- Host: GitHub
- URL: https://github.com/insightsengineering/presidio-cli
- Owner: insightsengineering
- License: other
- Created: 2021-12-13T11:43:45.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-04-29T16:22:15.000Z (about 2 years ago)
- Last Synced: 2025-03-23T08:17:23.632Z (about 1 year ago)
- Topics: pii, presidio, python
- Language: Python
- Homepage: https://github.com/insightsengineering/presidio-cli
- Size: 137 KB
- Stars: 5
- Watchers: 3
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Presidio CLI
[](https://github.com/insightsengineering/presidio-cli/actions/workflows/test-python.yaml)
[](https://github.com/insightsengineering/presidio-cli/actions/workflows/lint.yaml)
[](https://pypi.python.org/pypi/presidio-cli/)
[](https://badge.fury.io/py/presidio-cli)
[](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity)
[](https://pypi.python.org/pypi/presidio-cli/)
[](https://pypi.python.org/pypi/presidio-cli/)
CLI tool that analyzes text for PII Entities with [Microsoft Presidio framework](https://github.com/microsoft/presidio).
## Prerequisities
`Python` version: 3.8, 3.9, 3.10
`pipenv` app installed:
```shell
# check if app is installed
pipenv --version
# install, if not available
pip install pipenv
```
## Install `presidio-cli` in a virtual env
### Install from Python Package Index
install in current python env
```shell
python -m pip install presidio-cli
```
install required apps and presidio-cli in virtual environment
```shell
pipenv install presidio-cli
```
### Install from source
```shell
# clone from git
git clone https://github.com/insightsengineering/presidio-cli
cd presidio-cli
# install required apps and presidio-cli
pipenv install --deploy --dev
```
## Install language models for `spaCy`
Load models for the English (en) language using the command presented below. For further information please visit section [models](https://spacy.io/models/en).
```shell
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_lg
```
## Configuration file syntax
The default configuration is taken from the `.presidiocli` file in a current directory.
Configuration file supports the following parameters in a yaml file:
- language - by default only models and recognizers for `en` are available.
The list of [languages](https://microsoft.github.io/presidio/analyzer/languages/) can be extended.
- entities - limit list of recognized entities to be listed in parameter. It is mapped directly to `presidio framework`.
List of [supported entities](https://microsoft.github.io/presidio/supported_entities/)
- ignore - list of ignored files/folders/directories based on pattern. It is recommended to ignore `Version Control` files, for example `.git`
Note: a file requires at least one parameter to be set.
An example of yaml configuration file content:
```yaml
---
language: en
ignore: |
.git
*.cfg
entities:
- PERSON
- CREDIT_CARD
- EMAIL_ADDRESS
```
## Run the Presidio CLI
Run the Presidio CLI to execute [Presidio Analyzer](https://microsoft.github.io/presidio/analyzer/)
with specified configuration: language, threshold, entities and ignore pre-configured files/paths.
### Configuration from a file
An example of running script with configuration from a file.
There are two example `.yaml` configuration files in the [`conf`](presidio_cli/conf) directory:
- [default.yaml](presidio_cli/conf/default.yaml) - ignore the `.git` directory
- [limited.yaml](presidio_cli/conf/limited.yaml) - limit list of entities used to only 3 of them, ignore `.git` directory and `.cfg` files.
```shell
# run with default configuration (file `.presidiocli`) in the current directory
presidio .
# run with configuration limited.yaml in the "tests" directory
presidio -c presidio_cli/conf/limited.yaml tests/
# run with configuration limited.yaml in single file only tests/test_analyzer.py
presidio -c presidio_cli/conf/limited.yaml tests/test_analyzer.py
```
### Configuration as a parameter
An example of using configuration as data in parameter:
```shell
# ignore paths .git and *.cfg
presidio -d "ignore: |
.git
*.cfg" tests/
# limit list of entities to CREDIT_CARD
presidio-d "entities:
- CREDIT_CARD" tests/
# equivalent to use -c parameter
presidio -d "$(cat presidio_cli/conf/limited.yaml)" tests/
```
### Formatting output
Output can be formatted using `-f` or `--format` parameter. The default format is `auto`.
Available formats:
- standard - standard output format
```shell
presidio -d "entities:
- PERSON" -f standard tests/conftest.py
# result
tests/conftest.py
34:58 0.85 PERSON
37:33 0.85 PERSON
```
- github - similar to diff function in github
```shell
presidio -d "entities:
- PERSON" -f github tests/conftest.py
# result
::group::tests/conftest.py
::0.85 file=tests/conftest.py,line=34,col=58::34:58 [PERSON]
::0.85 file=tests/conftest.py,line=37,col=33::37:33 [PERSON]
::endgroup::
```
- colored - standard output format but with colors
- parsable - easy to parse automaticaly
```shell
presidio -d "entities:
- PERSON" -f parsable tests/conftest.py
# result
{"entity_type": "PERSON", "start": 57, "end": 62, "score": 0.85, "analysis_explanation": null}
{"entity_type": "PERSON", "start": 32, "end": 37, "score": 0.85, "analysis_explanation": null}
```
- auto - default format, switches automatically between those 2 modes:
- github, if run on github - environment variables `GITHUB_ACTIONS` and `GITHUB_WORKFLOW` are set
- colored, otherwise
### List of all parameters
Simply run the following to get a list of all available options for the CLI:
```shell
presidio --help
```