{"id":21425858,"url":"https://github.com/levitation-opensource/dataanonymiser","last_synced_at":"2025-06-22T11:08:36.217Z","repository":{"id":239842193,"uuid":"800492235","full_name":"levitation-opensource/DataAnonymiser","owner":"levitation-opensource","description":"Anonymises data inside text files and in sheet files. It recognises and removes various sorts of personally identifiable information (PII). Each removed part is replaced with a suitable generic text, depending on the type of removed data. Currently English and Russian languages are supported. Russian works both with Cyrillic and Latin characters.","archived":false,"fork":false,"pushed_at":"2024-09-12T00:33:01.000Z","size":345,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-22T11:08:27.628Z","etag":null,"topics":["anonymisation","anonymity","anonymity-enhancement","anonymization","anonymization-technique","anonymize","anonymize-strings","anonymized-data","anonymizer","csv","data-filtering","named-entity-recognition","ner","personally-identifiable-information","privacy","privacy-enhancing-technologies","privacy-protection","privacy-tool","privacy-tools","text-filter"],"latest_commit_sha":null,"homepage":"https://www.simplify.ee/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/levitation-opensource.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-14T12:43:22.000Z","updated_at":"2024-09-24T15:48:41.000Z","dependencies_parsed_at":"2024-05-15T18:48:27.506Z","dependency_job_id":"46d313a3-dec5-4455-a8bc-b0ccba5c8893","html_url":"https://github.com/levitation-opensource/DataAnonymiser","commit_stats":null,"previous_names":["levitation-opensource/dataanonymiser"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/levitation-opensource/DataAnonymiser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levitation-opensource%2FDataAnonymiser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levitation-opensource%2FDataAnonymiser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levitation-opensource%2FDataAnonymiser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levitation-opensource%2FDataAnonymiser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/levitation-opensource","download_url":"https://codeload.github.com/levitation-opensource/DataAnonymiser/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levitation-opensource%2FDataAnonymiser/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261282320,"owners_count":23134940,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymisation","anonymity","anonymity-enhancement","anonymization","anonymization-technique","anonymize","anonymize-strings","anonymized-data","anonymizer","csv","data-filtering","named-entity-recognition","ner","personally-identifiable-information","privacy","privacy-enhancing-technologies","privacy-protection","privacy-tool","privacy-tools","text-filter"],"created_at":"2024-11-22T21:38:25.340Z","updated_at":"2025-06-22T11:08:31.194Z","avatar_url":"https://github.com/levitation-opensource.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Anonymiser\r\n\r\nThis software anonymises data inside text files and in sheet files. It recognises and removes various sorts of personally identifiable information (PII). Each removed part is replaced / obfuscated with a suitable generic text, depending on the type of removed data. \r\n\r\nCurrently English and Russian languages are supported. Russian works both with Cyrillic and Latin characters. \r\n\r\nThe language is automatically detected. In case of sheet files, the language of each cell is detected separately. Therefore multi-language sheet files are supported as well.\r\n\r\nCurrently supported sheet file formats are CSV-files, TSV-files, Excel files (XLSX only), and OpenDocument Sheet files (ODS).\r\n\r\n\r\n## Example input and output files\r\n\r\nExample input and output copied to an annotated PDF file: \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/Anonymisation example 1.pdf\"\u003e\u003cu\u003eAnonymisation example 1.pdf\u003c/u\u003e\u003c/a\u003e\r\n\r\nExample input and output file pairs for TXT and CSV file formats in English language, and TXT file format in Russian language with Cyrillic and Latin alphabet:\r\n* \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/data/test_input_en.txt\"\u003e\u003cu\u003edata/test_input_en.txt\u003c/u\u003e\u003c/a\u003e - \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/data/example_output_en.txt\"\u003e\u003cu\u003edata/example_output_en.txt\u003c/u\u003e\u003c/a\u003e\r\n* \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/data/test_input_en.csv\"\u003e\u003cu\u003edata/test_input_en.csv\u003c/u\u003e\u003c/a\u003e - \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/data/example_output_en.csv\"\u003e\u003cu\u003edata/example_output_en.csv\u003c/u\u003e\u003c/a\u003e\r\n* \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/data/test_input_ru_cyr.txt\"\u003e\u003cu\u003edata/test_input_ru_cyr.txt\u003c/u\u003e\u003c/a\u003e - \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/data/example_output_ru_cyr.txt\"\u003e\u003cu\u003edata/example_output_ru_cyr.txt\u003c/u\u003e\u003c/a\u003e\r\n* \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/data/test_input_ru_lat.txt\"\u003e\u003cu\u003edata/test_input_ru_lat.txt\u003c/u\u003e\u003c/a\u003e - \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/data/example_output_ru_lat.txt\"\u003e\u003cu\u003edata/example_output_ru_lat.txt\u003c/u\u003e\u003c/a\u003e\r\n\r\n\r\n## How it works\r\n\r\nThis software uses a combination of Named Entity Recognition (NER) and regular expressions to perform its function.\r\n\r\n\r\n## Usage\r\n\r\nThe configuration options can be found in the file \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/Anonymiser.ini\"\u003e\u003cu\u003eAnonymiser.ini\u003c/u\u003e\u003c/a\u003e\r\n\r\n`python Anonymiser.py \"input_file.txt\"|\"input_file.csv\"|\"input_file.tsv\"|\"input_file.xlsx\"|\"input_file.ods\" [\"output_file.txt\"|\"output_file.csv\"|\"output_file.tsv\"|\"output_file.xlsx\"|\"output_file.ods\"]`\r\n\r\nThe user provided files are expected to be in the same folder as the main Python script, unless an absolute path is provided. If run without arguments then sample files in the `data` folder are used. If the user provides input file name but no output file name then the output file name will be calculated as `input filename` + `_anonymised` + `.input filename extension`.\r\n\r\nIf the CSV file parsing fails or the CSV output seems to have wrong structure, please check and adjust the CSV parsing settings in \u003ca href=\"https://github.com/levitation-opensource/DataAnonymiser/blob/main/Anonymiser.ini\"\u003e\u003cu\u003eAnonymiser.ini\u003c/u\u003e\u003c/a\u003e. More concretely, `CsvDelimiter`, `CsvQuoteChar`, `CsvDoubleQuote`, and `CsvEscapeChar` parameters may need adjustment.\r\n\r\n\r\n## Current project state\r\nReady to use. Is actively developed further.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flevitation-opensource%2Fdataanonymiser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flevitation-opensource%2Fdataanonymiser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flevitation-opensource%2Fdataanonymiser/lists"}