{"id":22050314,"url":"https://github.com/klezvirus/faceless","last_synced_at":"2025-05-08T23:22:37.604Z","repository":{"id":112318193,"uuid":"282538514","full_name":"klezVirus/faceless","owner":"klezVirus","description":"Faceless - Simple Tool for Text-File Anonymization","archived":false,"fork":false,"pushed_at":"2020-07-26T20:17:03.000Z","size":23,"stargazers_count":7,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-31T19:21:14.965Z","etag":null,"topics":["anonymization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/klezVirus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-07-25T22:58:46.000Z","updated_at":"2024-08-12T20:04:06.000Z","dependencies_parsed_at":"2023-05-12T22:15:26.474Z","dependency_job_id":null,"html_url":"https://github.com/klezVirus/faceless","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klezVirus%2Ffaceless","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klezVirus%2Ffaceless/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klezVirus%2Ffaceless/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klezVirus%2Ffaceless/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/klezVirus","download_url":"https://codeload.github.com/klezVirus/faceless/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253161532,"owners_count":21863756,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymization"],"created_at":"2024-11-30T14:21:06.900Z","updated_at":"2025-05-08T23:22:37.584Z","avatar_url":"https://github.com/klezVirus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# faceless\n\nFaceless is a script useful to anonymize a file which contains specific types of potentially sensitive information, such as ip addresses, emails, domain names and others.\n\n## Overview\n\nThe idea behind anonymization is very simple, replace any data which holds sensitive (or potentially sensitive) information with other data with similar properties, but no informational value in the data context used. \n\nA silly but effective approach would be just replacing certain values with random ones. While this may work in many circumstances, the main drawback would be that data would be lost in the process.\n\nAnother approach would be replacing sensitive values with hashes, creating a map which could then be used later to restore the data. While this approach is surely better, just placing hashes in a file containing other data might compromise both its semantic and syntax.\n\nFaceless uses an hybrid approach, random data is generated to replace original values, but respecting the original format (ip -\u003e ip, email -\u003e email). In addition, a map is generate to allow full data recovery in a later phase.\n\n## Usage\n\nUsing faceless is not immediately straightforward, as observable by the help below, but it becomes easy as soon as its internals are exposed.\n\n```\n$ python faceless.py -h\nusage: faceless.py [-h] [-c CHECK] [-u] [-g] [-m MAPPING] [-d] -i FILE [-f FILTERS]\n\nFaceless: A script to anonymize/deanonimize various files \n\noptional arguments:\n  -h, --help            show this help message and exit\n  -c CHECK, --check CHECK\n                        Check common regex\n  -u, --unique          Print just first occurrence\n  -g, --generate        Attempt to automatically generate a mapping\n  -m MAPPING, --mapping MAPPING\n                        Mapping File\n  -d, --debug           Enable debug output\n  -i FILE, --file FILE  File to anonymize\n  -f FILTERS, --filters FILTERS\n                        Filter set in string form\n  -r, --restore         Apply a reverse mapping\n```\n\n#### Key arguments\n\n###### File (-i)\nThe only required field is the input file, which represents the file to be anonymized.\n\n###### Check (-c)\nIf the `-c` flag is set, faceless will check for common regex in the file, showing all the matches in the console screen. the c flag has to be used in combination with the common regex to check for in the file. Currently, the supported regex types are the following:\n\n* Windows path (winpath)\n* Domain Names (domain)\n* IP Addresses (ip)\n* Email Addresses (email)\n* URLs (url)\n* All (*)\n\nThe check can be done using multiple regex types using the separator \"::\", as showed below:\n\n```\n// Search all IP, Domain Names and Email Addresses\n./faceless -i file.xml -c ip::domain::email\n```\n\nThis may be useful to check which matches are good, and to prepare a filter to exclude the false positives. A small example is provided further on.\n\n###### Unique (-u)\nIf the flag `-u` is set, only unique matches will be showed to the console.\n\n###### Generate (-g)\nIf the flag `-g` is set, faceless will try to generate a mapping file. The mapping file is a file used to store tracks of the data changed during the anonymisation process, allowing for a revert. For technical detail about mapping files, please see below. \n\n###### Restore(-r)\nIf the flag `-r` is set, faceless will try to restore the original file, applying a reverse mapping (i.e. the mapping file in a mirrored fashion).\n\n###### Filters (-f)\nThe filter flag can be set to a stringified filter set. A filter set is a data structure used to represent, at a basic level, exclusion and inclusion filters. \n\nThe basic syntax used by the filter string is detailed further on.\n \n#### Filter syntax\n\nA filter set can be represented by a string using the following syntax: \n\n```\nkey_1::[--|++]filter_1_1..[--|++]filter_1_2§§key_2::[--|++]file_2_1..[--|++]file_2_n\n```\n\nWhere:\n\n* `key_i`: A value in [\"ip\", \"domain\", \"winpath\", \"url\", \"email\" ]\n* `filter_i_j`: A Python regex/string\n* `++`: Includes lines that match the filter\n* `--`: Excludes lines that match the filter\n\n###### Filter ordering\n\nThe filter chain used for each regex type is applied taking into account the first regex used, using the following rationale:\n\n* `(Filter 1 == ++ filter)`: Include only matches to ++ filter and exclude every -- selected\n* `(Filter 1 == -- filter)`: Exclude only matches to -- filter and include every ++ selected\n\n**Example**\n\n`ip::--10.0.0.*..++10.0.0.1`\n\nWhen parsed, the above filter would exclude all IP in range 10.0.0.*, except 10.0.0.1.\n\nThe filter logic has been implemented to be able to filter values that resemble a valid regex, but are not.\n\n* Numbered paragraphs match IP regex (e.g. 1.1.1.2)\n* Nasty executables match domain regex (e.g my.nasty.exe)\n\n#### Mapping files\n\nInformally, mapping files are textual files use to track changes made during the anonymisation, associating all the strings replaced in a file with their anonymized value.\n\nA bit more formally, let W be the set of all possible text strings, a mapping is a function:\n```\nm: W -\u003e W\n```\n\nA mapping M can be represented as a list of pairs (x,y), where x is a word in W, while y is a randomly generated replacement of x.\n\n###### Anonymization and mappings\n\nThe algorithm is implemented to generate a mapping prior to attempt a full anonymization. Giving the user the possibility to recheck and manually tamper values. After that, the algorithm proceeds and simply apply the generated mapping.\n\nThis mechanism allows the user to fully restore the data, as long as he does possess the correct mapping for the anonymized file.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklezvirus%2Ffaceless","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fklezvirus%2Ffaceless","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklezvirus%2Ffaceless/lists"}