Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/insightsengineering/presidio-action
Github Action that analyze Text for PII Entities with Microsoft Presidio framework.
https://github.com/insightsengineering/presidio-action
actions pii presidio python
Last synced: 9 days ago
JSON representation
Github Action that analyze Text for PII Entities with Microsoft Presidio framework.
- Host: GitHub
- URL: https://github.com/insightsengineering/presidio-action
- Owner: insightsengineering
- License: apache-2.0
- Created: 2021-12-02T11:53:38.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-21T16:39:45.000Z (over 1 year ago)
- Last Synced: 2024-02-22T15:20:19.897Z (9 months ago)
- Topics: actions, pii, presidio, python
- Homepage: https://github.com/marketplace/actions/presidio-action
- Size: 31.3 KB
- Stars: 2
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Presidio Action
Github Action that analyzes text for PII entities with [Microsoft's Presidio framework](https://microsoft.github.io/presidio/).
## Author
Insights Engineering
## Inputs
* `path`:
_Description_: Path to verify
_Required_: `false`
_Default_: "."
* `configuration-file`:
_Description_: Path to custom configuration file
_Required_: `false`
_Default_: "default"
* `configuration-data`:
_Description_: Configuration data as an inline YAML configuration
_Required_: `false`
_Default_: ""
* `output`:
_Description_: Format of output
_Required_: `false`
_Default_: "auto"
* `publish`:
_Description_: Publish result as a PR comment
_Required_: `false`
_Default_: "true"
* `upload`:
_Description_: Upload results as an artifact
_Required_: `false`
_Default_: "true"
* `presidio-cli-version`:
_Description_: Presidio CLI version
_Required_: `false`
_Default_: "latest"
* `lang-models`:
_Description_: List of additional language models to install
_Required_: `false`
_Default_: ""
* `only-changed-files`:
_Description_: Only run checks for changed files
_Required_: `false`
_Default_: `false`
## Outputs
An output depends on the `output` parameter:
The default format is `auto`.
Available formats:
* standard - standard output format
```shell
tests/conftest.py
34:58 0.85 PERSON
37:33 0.85 PERSON
```* github - similar to diff function in github
```shell
::group::tests/conftest.py
::0.85 file=tests/conftest.py,line=34,col=58::34:58 [PERSON]
::0.85 file=tests/conftest.py,line=37,col=33::37:33 [PERSON]
::endgroup::
```* colored - standard output format but with colors
* parsable - easy to parse automaticaly
```shell
{"entity_type": "PERSON", "start": 57, "end": 62, "score": 0.85, "analysis_explanation": null}
{"entity_type": "PERSON", "start": 32, "end": 37, "score": 0.85, "analysis_explanation": null}
```* auto - default format, switches automatically between those 2 modes:
* github, if run on github - environment variables `GITHUB_ACTIONS` and `GITHUB_WORKFLOW` are set
* colored, otherwise## How it works
Presidio action uses [presidio-cli](https://pypi.org/project/presidio-cli/)
based on presidio-analyzer from [Microsoft Presidio framework](https://github.com/microsoft/presidio)
to check code against undesirable types of data such as 'EMAIL_ADDRESS' or 'PHONE_NUMBER' inside application's code.For more information please see a full [list of supported entities](https://microsoft.github.io/presidio/supported_entities/).
## Usage
Example usage:
```yaml
---
name: Presidio checkon:
push:
branches:
- main
pull_request:
branches:
- mainjobs:
presidio-action:
runs-on: ubuntu-latest
name: Presidio checksteps:
- name: Checkout Code
uses: actions/checkout@v3
with:
# 0 fetch-depth is needed if you set `only-changed-files` to true
# and if you are configuring this check to run on push events
fetch-depth: 0- name: Produce the presidio report
uses: insightsengineering/presidio-action@v1
# all parameters below are optional
with:
# path to project.
# if project does not have a specific 'my-project' path,
# '.' - current folder is a default value
path: "my-project"
# configuration-file - path to file with specific configuration
# or use one of predefined files:
# - default - `conf/default.yaml` file from action repository, check default list of entities
# and ignore content of `.git` folder
# - limited - `conf/limited.yaml` file from action repository, check only PERSON, EMAIL_ADDRESS and CREDIT_CARD
# and ignore `.git` folder and *.cfg files
configuration-file: "my-project/conf/my-presidio-config.yaml"
# configuration-data - content of configuration in raw yaml format.
# Give possibility to prepare own configuration without adding file to project
# any value in this field will block usage of configuration file
configuration-data: |
entities:
- PERSON
threshold: 0.9
# output - specify one of output formats
output: "parsable"
# only-changed-files - only run the check for files that were changed
# NOTE: You must set fetch-depth: 0 in the actions/checkout@v3 step
# for push events while this paramater is set to true
only-changed-files: true```
Example of comment added to the PR:
![Screenshot with PR comment example](example.png)