{"id":19852559,"url":"https://github.com/rosette-api-community/identity-masker","last_synced_at":"2026-06-04T21:31:31.672Z","repository":{"id":84699349,"uuid":"95594961","full_name":"rosette-api-community/identity-masker","owner":"rosette-api-community","description":"Mask entities that potentially reveal personally identifying information.","archived":false,"fork":false,"pushed_at":"2019-04-22T17:48:44.000Z","size":10,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-01-11T13:27:50.852Z","etag":null,"topics":["identity","identity-masker","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rosette-api-community.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-27T19:50:56.000Z","updated_at":"2021-04-30T00:38:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"ec8177aa-fb1f-4880-b25e-f437ee472e01","html_url":"https://github.com/rosette-api-community/identity-masker","commit_stats":{"total_commits":7,"total_committers":2,"mean_commits":3.5,"dds":0.2857142857142857,"last_synced_commit":"a1030f7c042af226cac6626f43feb794b9e57f2e"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Fidentity-masker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Fidentity-masker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Fidentity-masker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Fidentity-masker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rosette-api-community","download_url":"https://codeload.github.com/rosette-api-community/identity-masker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241241407,"owners_count":19932741,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["identity","identity-masker","python"],"created_at":"2024-11-12T14:03:31.211Z","updated_at":"2026-06-04T21:31:31.667Z","avatar_url":"https://github.com/rosette-api-community.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Identity Masker\n\nThis repository includes Python code demonstrating how to use Rosette API entity extraction results to mask personally identifying information in text.\n\n## Setup\n\n### Installing Dependencies with Virtualenv\nThe script is written for Python 3.  If you are alright with installing external Python packages globally, you may skip this section.\n\nYou can install the dependencies using `virtualenv` so that you don't alter your global site packages.\n\nThe process for installing the dependencies using `virtualenv` is as follows for `bash` or similar shells:\n\nEnsure your `virtualenv` is up to date.\n\n    $ pip install -U virtualenv\n\n**Note**: You may need to use `pip3` depending on your Python installation.\n\n`cd` into the repository directory (where this `README.md` file is located) and create a Python 3 virtual environment with:\n\n    $ python3 $(which virtualenv) .\n\nActivate the virtual environment:\n\n    $ source bin/activate\n\nOnce you've activated the virtual environment you can proceed to install the requirements safely without affecting your globabl site packages.\n\n### Installing the Dependencies\nYou can install the dependencies via `pip` (or `pip3` depending on your installation of Python 3) as follows using the provided `requirements.txt`:\n\n    $ pip install -r requirements.txt\n\n## Running `mask_identities.py`\nYou can use the script from the commandline as follows:\n\n    $ ./mask_identities.py -h\n    usage: mask_identities.py [-h] [-i INPUT] [-u] [-k KEY] [-a API_URL]\n                          [-l LANGUAGE] [-t TYPE [TYPE ...]]\n    \n    Mask entities that potentially reveal personally identifying information.\n    Note: You should NOT assume that the results are pefect nor that all\n    personally identifying language has been removed!\n    \n    optional arguments:\n      -h, --help            show this help message and exit\n      -i INPUT, --input INPUT\n                            Path to a file containing input data (if not specified\n                            data is read from stdin) (default: None)\n      -u, --content-uri     Specify that the input is a URI (otherwise load text\n                            from file) (default: False)\n      -k KEY, --key KEY     Rosette API Key (default: None)\n      -a API_URL, --api-url API_URL\n                            Alternative Rosette API URL (default:\n                            https://api.rosette.com/rest/v1/)\n      -l LANGUAGE, --language LANGUAGE\n                            A three-letter (ISO 639-2 T) code that will override\n                            automatic language detection (default: None)\n      -t TYPE [TYPE ...], --entity-types TYPE [TYPE ...]\n                            A list of named entity types to mask (refer to\n                            https://developer.rosette.com/features-and-\n                            functions#entity-extraction-entity-types for a full\n                            description of supported entity types) (default:\n                            ['ORGANIZATION', 'PERSON',\n                            'IDENTIFIER:CREDIT_CARD_NUM', 'IDENTIFIER:EMAIL',\n                            'IDENTIFIER:MONEY', 'IDENTIFIER:PERSONAL_ID_NUM',\n                            'IDENTIFIER:PHONE_NUMBER', 'TEMPORAL:DATE',\n                            'TEMPORAL:TIME', 'IDENTIFIER:LATITUDE_LONGITUDE'])\n\n**Note**: If you prefer not to enter your Rosette API key every time you run the script you can set up an environment variable `$ROSETTE_USER_KEY`.\n\n**Note**: This script is only for demonstration purposes.  You should **NOT** assume that the results are pefect nor that all personally identifying information has been removed!\n\n### Examples\nThe simplest way to use the script is to simply pipe in a string:\n\n    $ echo 'John Smith is accused of stealing $1,000,000.' | ./mask_identities.py\n    Extracting entities via Rosette API ...\n    Done!\n    PERSON1 is accused of stealing IDENTIFIER:MONEY.\n\nIf there are multiple mentions of any the following entity types, they will be indexed so that distinct entities can still be distinguished, even if they can't be identified:\n\n1. `LOCATION`\n2. `ORGANIZATION`\n3. `PERSON`\n4. `PRODUCT`\n5. `TITLE`\n6. `NATIONALITY`\n7. `RELIGION`\n\n\nE.g.:\n\n    $ echo \"John Smith is accused of stealing \\$1,000,000.  Jane Smith was John's accomplice.\" | ./mask_identities.py\n    Extracting entities via Rosette API ...\n    Done!\n    PERSON1 is accused of stealing IDENTIFIER:MONEY.  PERSON2 was PERSON1's accomplice.\n\nYou can also read the input from a file:\n\n    $ echo \"John Smith is accused of stealing \\$1,000,000.  Jane Smith was John's accomplice.\" \u003e stealing.txt\n    $ ./mask_identities.py -i stealing.txt\n    Extracting entities via Rosette API ...\n    Done!\n    PERSON1 is accused of stealing IDENTIFIER:MONEY.  PERSON2 was PERSON1's accomplice.\n\nAdditionally you can rely on Rosette API extract content from a web page by supplying a URL and using the `-u/--content-uri` option:\n\n\n    $ ./mask_identities.py -u -i 'https://www.reuters.com/article/cyber-attack-stgobain-idUSP6N1JF03T' \u003e| news.txt\n    Extracting entities via Rosette API ...\n    Done!\n    $ cat news.txt \n    French company ORGANIZATION1 says has been victim of cyberattack\n    Market News | TEMPORAL:DATE | TEMPORAL:TIME\n    French company ORGANIZATION1 says has been victim of cyberattack\n    PARIS, TEMPORAL:DATE French construction materials company ORGANIZATION2 said on TEMPORAL:DATE that it had been a victim of a cyberattack, and it had isolated its computer systems in order to protect data.\n    \"Along with other big companies, ORGANIZATION1 has been the victim of a cyberattack. As a security measure and in order to protect our data, we have isolated our computer systems,\" said a company spokesman.\n    The spokesman added ORGANIZATION1 was in the process of trying to fix the problem. (Reporting by PERSON1; Writing by PERSON2; Editing by PERSON3)\n\nIf you want to remove additional identifying information from the article by masking `LOCATION` and `NATIONALITY` types you can do so:\n\n    $ ./mask_identities.py -u -i 'https://www.reuters.com/article/cyber-attack-stgobain-idUSP6N1JF03T' -t ORGANIZATION PERSON TEMPORAL:DATE TEMPORAL:TIME LOCATION NATIONALITY \u003e| news.txt\n    Extracting entities via Rosette API ...\n    Done!\n    $ cat news.txt \n    NATIONALITY1 company ORGANIZATION1 says has been victim of cyberattack\n    Market News | TEMPORAL:DATE | TEMPORAL:TIME\n    NATIONALITY1 company ORGANIZATION1 says has been victim of cyberattack\n    LOCATION1, TEMPORAL:DATE NATIONALITY1 construction materials company ORGANIZATION2 said on TEMPORAL:DATE that it had been a victim of a cyberattack, and it had isolated its computer systems in order to protect data.\n    \"Along with other big companies, ORGANIZATION1 has been the victim of a cyberattack. As a security measure and in order to protect our data, we have isolated our computer systems,\" said a company spokesman.\n    The spokesman added ORGANIZATION1 was in the process of trying to fix the problem. (Reporting by PERSON1; Writing by PERSON2; Editing by PERSON3)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosette-api-community%2Fidentity-masker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frosette-api-community%2Fidentity-masker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosette-api-community%2Fidentity-masker/lists"}