{"id":23110131,"url":"https://github.com/yevh/anonymizer","last_synced_at":"2025-07-07T12:06:20.095Z","repository":{"id":188705469,"uuid":"679257436","full_name":"yevh/anonymizer","owner":"yevh","description":"Anonymize sensitive data in your datasets.","archived":false,"fork":false,"pushed_at":"2023-08-17T05:06:10.000Z","size":1213,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-17T01:48:01.062Z","etag":null,"topics":["anonymize","anonymized","anonymizer","crypto","cryptography","data-anonymization","data-anonymized","data-science","data-security","dataset","datasets","datasets-csv","datasets-preparation","python","python3","security","sensitive","sensitive-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yevh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-08-16T12:46:25.000Z","updated_at":"2024-09-11T11:12:42.000Z","dependencies_parsed_at":"2023-08-16T15:54:00.368Z","dependency_job_id":null,"html_url":"https://github.com/yevh/anonymizer","commit_stats":null,"previous_names":["yevh/anonymizer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yevh%2Fanonymizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yevh%2Fanonymizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yevh%2Fanonymizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yevh%2Fanonymizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yevh","download_url":"https://codeload.github.com/yevh/anonymizer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237939202,"owners_count":19390763,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymize","anonymized","anonymizer","crypto","cryptography","data-anonymization","data-anonymized","data-science","data-security","dataset","datasets","datasets-csv","datasets-preparation","python","python3","security","sensitive","sensitive-data"],"created_at":"2024-12-17T01:48:11.886Z","updated_at":"2025-02-09T11:22:21.102Z","avatar_url":"https://github.com/yevh.png","language":"Python","readme":"# Data Anonymizer\n\nAnonymize sensitive data in your datasets. It generates pseudonyms for specified columns in a CSV file using a salted SHA-256 hashing function. Integrity checks with HMAC. The script allows for reverting the data back to its original form using encrypted mapping files that are generated during the anonymization process.\n\n## How It Works\n\n- Load or Generate Secret Key (if not exist)\n  - Key should be base64-encoded 32 bytes (256 bits) long\n- Process Input Data File\n- Data Pseudonymization or Reversion\n- Encrypted Mapping Files\n  - During the ```anonymize``` operation, for each specified column, the script creates an encrypted file that maps the pseudonyms back to the original data.\n  - These mapping files are encrypted using the Fernet symmetric encryption scheme, and an HMAC is appended to ensure data integrity.\n- Data Integrity\n  - When reverting data, the script first checks the integrity of the encrypted mapping files by comparing a stored HMAC with a computed HMAC.\n- Output\n\n## Install dependencies\n\n```bash\npip3 install -r requirements.txt\n```\n\n## How to Use\n\n```bash\npython3 anonymizer.py file_path operation --cols column_names --key_path secret_key_path\n```\n- **file_path:** Path to the data file (CSV format)\n- **operation:** ```anonymize``` or ```revert```\n- **--cols:** Specific columns to anonymize or revert (all columns by default)\n- **--key_path:** Path to the secret key file (required)\n\n## Usage Example\n\n1. Generate a data example\n\n```bash\npython3 data.py\n```\n![data!](data.png)\n\n3. Anonymize\n\n```bash\npython3 anonymizer.py data.csv anonymize --key_path secret_key.key\n```\n\n![anonymized!](anonymized.png)\n\n3. Revert\n\n```bash\npython3 anonymizer.py data.csv revert --key_path secret_key.key\n```\n\n![reverted!](reverted.png)\n\n## Security\n\n- **Secret Key Storage** - ensure the secret key file is stored securely. If compromised, an attacker could decrypt the pseudonym mappings and de-anonymize the data.\n- **Encrypted Mapping Files** - ensure that these files are stored in a secure location with restricted access. Access to these files and the secret key allows data de-anonymization. \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyevh%2Fanonymizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyevh%2Fanonymizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyevh%2Fanonymizer/lists"}