{"id":25305719,"url":"https://github.com/philterd/phileas","last_synced_at":"2025-10-18T22:54:41.645Z","repository":{"id":144159332,"uuid":"501256788","full_name":"philterd/phileas","owner":"philterd","description":"The open source PII and PHI redaction engine","archived":false,"fork":false,"pushed_at":"2025-02-02T14:02:16.000Z","size":35655,"stargazers_count":29,"open_issues_count":52,"forks_count":7,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-02T15:18:49.620Z","etag":null,"topics":["anonymize","deidentification","deidentify","java","nppi","personally-identifiable-information","phi","pii","protected-health-information","redact","redaction"],"latest_commit_sha":null,"homepage":"https://www.philterd.ai","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philterd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-08T13:11:35.000Z","updated_at":"2025-02-02T14:02:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"7125c385-b31f-4e97-bff6-29775f5509d7","html_url":"https://github.com/philterd/phileas","commit_stats":null,"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philterd%2Fphileas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philterd%2Fphileas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philterd%2Fphileas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philterd%2Fphileas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philterd","download_url":"https://codeload.github.com/philterd/phileas/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238633978,"owners_count":19504763,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymize","deidentification","deidentify","java","nppi","personally-identifiable-information","phi","pii","protected-health-information","redact","redaction"],"created_at":"2025-02-13T09:53:45.737Z","updated_at":"2025-10-18T22:54:41.638Z","avatar_url":"https://github.com/philterd.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Phileas\n\n[![CodeFactor Quality](https://img.shields.io/codefactor/grade/github/philterd/phileas)](https://www.codefactor.io/repository/github/philterd/phileas)\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=philterd_phileas\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=philterd_phileas)\n\n**For a hosted document redaction service, please visit [Philterd Data Services](https://www.philterd.ai/data-services).**\n\nPhileas is a Java library to deidentify and redact PII, PHI, and other sensitive information from text. Given text or documents (PDF), Phileas analyzes the text searching for sensitive information such as persons' names, ages, addresses, and many other types of information. Phileas is highly configurable through its settings and policies.\n\nWhen sensitive information is identified, Phileas can manipulate the sensitive information in a variety of ways. The information can be replaced, encrypted, anonymized, and more. The user chooses how to manipulate each type of sensitive information. We refer to each of these methods in whole as \"redaction.\"\n\nInformation can be redacted based on the content of the information and other attributes. For example, only certain persons' names, only zip codes meeting some qualification, or IP addresses that match a given pattern.\n\nFor Phileas' documentation, please see https://philterd.github.io/phileas/.\n\n## What Phileas Can Do\n\n* Phileas can identify and redact over 30 types of sensitive information (see list below).\n* Phileas can evaluate conditions when redacting (only zip codes with population less than some value, only ages \u003e 30, only when sentiment is a certain value, etc.).\n* Phileas can perform sentiment and offensiveness classification.\n* Phileas can redact, encrypt, and anonymize sensitive information.\n* Phileas can replace persons names with random names, dates with similar but random dates, etc.\n* Phileas can disambiguate types of sensitive information (i.e. SSN vs. phone number).\n* Phileas can deidentify text consistently (\"John Smith\" is replaced consistently in certain documents).\n* Phileas can shift dates or replace dates with approximate representations (i.e. \"3 months ago\").\n* Phileas uses policies to define what sensitive information to find and how to redact it.\n\n## Supported PII, PHI, and Other Sensitive Information\n\nThis list might be outdated. Please check the individual filter classes for details.\n\n### Persons\n\n* Person's Names - Multiple methods, e.g. NER, dictionary, census data\n* Physician Names\n* First Names\n* Surnames\n\n### Common\n\n* Ages\n* Bank Account Numbers\n* Bitcoin Addresses\n* Credit Cards\n* Currency (USD)\n* Dates (in addition to birthdates and deathdates)\n* (US) Driver's License Numbers\n* Email Addresses\n* IBAN Codes\n* IP Addresses (IPv4 and IPv6)\n* MAC Addresses\n* (US) Passport Numbers\n* Phone Numbers\n* Phone Number Extensions\n* Sections (of a document)\n* SSNs and TINs\n* Tracking Numbers (UPS / FedEx / USPS)\n* URLs\n* VINs\n* Zip Codes\n\n### (US) Locations\n\n* Cities\n* Counties\n* Hospitals\n* Hospital Abbreviations\n* States\n* State Abbreviations\n\n### Custom Filters\n\n* Dictionary\n* Identifier\n\n## Building Phileas\n\nAfter cloning, run `git lfs pull` to download models needed for unit tests. Phileas can then be built with `mvn clean install`.\n\n## Using Phileas\n\nPhileas snapshots and releases as of version 2.12.1 are available in Maven Central. Previous versions were in the [Philterd repository](https://artifacts.philterd.ai).\n\nAdd the Phileas dependency to your project:\n\n```\n\u003cdependency\u003e\n  \u003cgroupId\u003eai.philterd\u003c/groupId\u003e\n  \u003cartifactId\u003ephileas\u003c/artifactId\u003e\n  \u003cversion\u003e3.0.0-SNAPSHOT\u003c/version\u003e\n\u003c/dependency\u003e\n\n```\n\n### Finding and Manipulating Sensitive Information in Text\n\nCreate a `FilterService`, using a `PhileasConfiguration`, and call `filter()` on the service:\n\n```\nProperties properties = new Properties();\nPhileasConfiguration phileasConfiguration = new PhileasConfiguration(properties);\n\nFilterService filterService = new PhileasFilterService(phileasConfiguration);\n\nFilterResponse response = filterService.filter(policies, context, body, MimeType.TEXT_PLAIN);\n```\n\nThe `policies` is a list of `Policy` classes. (See below for more about Policies.) Lastly, we specify that the data is plain text.\n\nThe `response` contains information about the identified sensitive information along with the filtered text.\n\n#### Usage Examples\n\nThe [PhileasFilterServiceTest](https://github.com/philterd/phileas/blob/main/phileas-core/src/test/java/io/philterd/test/phileas/services/PhileasFilterServiceTest.java) and [EndToEndTests](https://github.com/philterd/phileas/blob/main/phileas-core/src/test/java/io/philterd/test/phileas/services/EndToEndTests.java) test classes have examples of how to configure Phileas and filter text.\n\n### Finding and Redacting Sensitive Information in a PDF Document\n\nCreate a `FilterService`, using a `PhileasConfiguration`, and call `filter()` on the service:\n\n```\nPhileasConfiguration phileasConfiguration = ConfigFactory.create(PhileasConfiguration.class);\n\nFilterService filterService = new PhileasFilterService(phileasConfiguration);\n\nBinaryDocumentFilterResponse response = filterService.filter(policies, context, body, MimeType.APPLICATION_PDF, MimeType.IMAGE_JPEG);\n```\n\nThe `policies` is a list of `Policy` classes which are created by deserializing a policy from JSON. (See below for more about Policies.) The `body` is the text you are filtering. Lastly, we specify that the data is plain text.\n\nThe `response` contains a zip file of the images generated by redacting the PDF document.\n\n### Policies\n\nA policy is an instance of a `Policy` class that tells Phileas the types of sensitive information to identify, and what to do with the sensitive information when found. A policy describes the entire filtering process, from what filters to apply, terms to ignore, to everything in between. Phileas can apply one or more policies when `filter()` is called. The list of policies will be applied in order as they were added to the list.\n\nFor examples on creating a policy, look at [EndToEndTestsHelper](https://github.com/philterd/phileas/blob/main/phileas-core/src/test/java/io/philterd/test/phileas/services/EndToEndTestsHelper.java). The [PhileasFilterServiceTest](https://github.com/philterd/phileas/blob/main/phileas-core/src/test/java/io/philterd/test/phileas/services/PhileasFilterServiceTest.java) and [EndToEndTests](https://github.com/philterd/phileas/blob/main/phileas-core/src/test/java/io/philterd/test/phileas/services/EndToEndTests.java) test classes have examples of how to configure Phileas and filter text.\n\nPolicies can be de/serialized to JSON. Here is a basic (but valid) policy that identifies and redacts ages:\n\n```\n{\n  \"name\": \"default\",\n  \"ignored\": [],\n  \"identifiers\": {\n    \"age\": {\n      \"ageFilterStrategies\": [{\n        \"strategy\": \"REDACT\",\n        \"redactionFormat\": \"{{{REDACTED-%t}}}\"\n      }]\n    }\n  }\n}\n```\n\nThere is a long list of `identifiers` that can be applied, and each identifier has several possible `strategy` values. In this case, when a age is found, it is redacted by being replaced with the text `{{{REDACTED-age}}}`. The `%t` is a placeholder for the type of filter. In this case, it is the literal text `age`.\n\n## Powered by Phileas\n\nPhileas is the underlying core of [Philter](https://www.philterd.ai/philter/), a turnkey text redaction engine which is built on top of Phileas and provides an API for redacting text. Philter runs entirely within your cloud and never transmits data outside of your cloud. Custom AI models are available for domains like healthcare, legal, and news. Philter is also [open source](https://github.com/philterd/philter).\n\n* [Philter on the AWS Marketplace](https://aws.amazon.com/marketplace/pp/B07YVB8FFT?ref=_ptnr_philterd)\n* [Philter on the Google Cloud Marketplace](https://console.cloud.google.com/marketplace/product/philterd-public/philter)\n* [Philter on the Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/philterdllc1687189098111.philter?tab=Overview)\n* On-prem deployments by contacting us at [https://www.philterd.ai/](https://www.philterd.ai). \n\nPhileas also powers [Airlock](https://www.philterd.ai/airlock), an AI policy layer to prevent the disclosure of sensitive information, such as PII and PHI, in your AI applications.\n\n* [Airlock on the AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-inkh5a3kbhtf2)\n* [Airlock on the Google Cloud Marketplace](https://console.cloud.google.com/marketplace/product/philterd-public/airlock)\n* [Airlock on the Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/philterdllc1687189098111.airlock?tab=Overview)\n* On-prem deployments by contacting us at [https://www.philterd.ai/](https://www.philterd.ai). \n\n## License\n\nAs of Phileas 2.2.1, Phileas is licensed under the Apache License, version 2.0. Previous versions were under a proprietary license.\n\nCopyright 2024-2025 Philterd, LLC.\n\nCopyright 2018-2023 Mountain Fog, Inc.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilterd%2Fphileas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilterd%2Fphileas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilterd%2Fphileas/lists"}