https://github.com/daedalus/pii-safe
Redact PII from text
https://github.com/daedalus/pii-safe
openai personal-identifiable-information pii pii-redaction pii-safety
Last synced: about 1 month ago
JSON representation
Redact PII from text
- Host: GitHub
- URL: https://github.com/daedalus/pii-safe
- Owner: daedalus
- License: mit
- Created: 2026-04-23T17:48:00.000Z (2 months ago)
- Default Branch: master
- Last Pushed: 2026-05-03T02:08:55.000Z (2 months ago)
- Last Synced: 2026-05-03T04:13:30.735Z (2 months ago)
- Topics: openai, personal-identifiable-information, pii, pii-redaction, pii-safety
- Language: Python
- Homepage:
- Size: 22.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# pii-safe — Redact PII from text
[](https://pypi.org/project/pii-safe/)
[](https://pypi.org/project/pii-safe/)
[](https://github.com/astral-sh/ruff)[](https://deepwiki.com/daedalus/pii-safe)
Uses the [OpenAI Privacy Filter](https://openai.com/index/introducing-openai-privacy-filter/) model to detect and redact personally identifiable information (PII) from text.
## Why hash-based redaction?
Plain `[REDACTED]` placeholders lose all information about which PII values are the same. Using `hash(salt | pii_data)` instead:
- **Consistent identifiers**: The same PII always maps to the same hash, enabling cross-document correlation (e.g., "how many documents mention the same person?")
- **Reversible with salt**: With the salt, you can recompute hashes to identify original PII if needed
- **Salt prevents rainbow table attacks**: Without a salt, hashes could be precomputed for common names/emails to reverse-identify PII from redacted text
## Install
```bash
pip install pii-safe
```
## Usage
```python
from pii_safe import redact_text
text = "mi nombre es Dario Clavijo"
redacted = redact_text(text)
print(redacted) # mi nombre es[REDACTED_]
```
### Salt for hashing
By default, a random 64-character salt is generated at startup. You can specify a salt to ensure consistent hashing across runs:
```python
from pii_safe import redact_text, set_salt
# Option 1: Pass salt to redact_text
redacted = redact_text("mi nombre es Dario Clavijo", salt="my_secret_salt")
# Option 2: Set salt globally
set_salt("my_secret_salt")
redacted = redact_text("mi nombre es Dario Clavijo")
```
### Using the Redacter class
```python
from pii_safe import Redacter
redacter = Redacter(salt="my_secret_salt")
result1 = redacter.redact("mi nombre es Dario Clavijo")
result2 = redacter.redact("el es Dario Clavijo")
# Same PII gets consistent hash within this instance
hash_map = redacter.get_hash_map()
print(hash_map) # {' Dario Clavijo': ''}
```
## CLI
```bash
pii-safe input.txt
pii-safe input.txt -o output.txt
pii-safe input.txt --salt my_secret_salt
```
## Development
```bash
git clone https://github.com/daedalus/pii-safe.git
cd pii-safe
pip install -e ".[test]"
# run tests
pytest
# format
ruff format src/ tests/
# lint
ruff check src/ tests/
# type check
mypy src/
```
## API
### `redact_text(text: str, salt: str | None = None) -> str`
Redacts PII from text using the openai/privacy-filter model.
### `set_salt(salt: str) -> None`
Set the salt for hashing PII in the default redacter.
### `class Redacter`
Context manager for consistent PII-to-hash mapping across calls.
- `__init__(salt: str | None = None)`: Initialize with optional salt
- `redact(text: str) -> str`: Redact PII from text
- `get_hash_map() -> dict[str, str]`: Get PII-to-hash mapping