{"id":18736754,"url":"https://github.com/digitalslidearchive/imagedephi","last_synced_at":"2025-04-12T19:31:49.911Z","repository":{"id":64881590,"uuid":"564450230","full_name":"DigitalSlideArchive/ImageDePHI","owner":"DigitalSlideArchive","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-24T12:15:11.000Z","size":2728,"stargazers_count":9,"open_issues_count":15,"forks_count":0,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-10T16:15:53.994Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DigitalSlideArchive.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-10T18:41:30.000Z","updated_at":"2025-03-21T19:15:33.000Z","dependencies_parsed_at":"2024-03-19T15:00:20.437Z","dependency_job_id":"00aa19dc-1d89-4370-8c4c-d4082d844a78","html_url":"https://github.com/DigitalSlideArchive/ImageDePHI","commit_stats":null,"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitalSlideArchive%2FImageDePHI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitalSlideArchive%2FImageDePHI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitalSlideArchive%2FImageDePHI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigitalSlideArchive%2FImageDePHI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DigitalSlideArchive","download_url":"https://codeload.github.com/DigitalSlideArchive/ImageDePHI/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248621135,"owners_count":21134753,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T15:22:20.998Z","updated_at":"2025-04-12T19:31:49.896Z","avatar_url":"https://github.com/DigitalSlideArchive.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ImageDePHI\nImageDePHI is an application to redact personal data (PHI) from whole slide images (WSIs).\n\n\u003e This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. 75N91022C00033\n\n## Installation\n* Download the [latest ImageDePHI release](https://github.com/DigitalSlideArchive/ImageDePHI/releases/latest).\n\n* Unzip the downloaded file, which will extract the executable named `imagedephi` (or `imagedephi.exe` on Windows).\n\n* Please note that on Linux, only Ubuntu 20.04+ is supported.\n\n## Usage\nFor an in-depth walkthrough, check out the [demo](./docs/demo.md).\n\nFrom a command line, execute the application to get full usage help.\n\nAlternatively **on Windows only**, directly open `imagdephi.exe` in Windows Explorer to launch the ImageDePHI GUI.\n\nIf running on macOS, you may need to [add the executable to the list of trusted software](https://support.apple.com/guide/mac-help/apple-cant-check-app-for-malicious-software-mchleab3a043/mac) to launch ImageDePHI in the same way you would any other registered app.\n\n# Rules\nImage redaction is determined by a set of rules. By default, the base set of rules are used. These rules are provided by the `imagedephi` package and can be found [here](https://github.com/DigitalSlideArchive/ImageDePHI/blob/main/imagedephi/base_rules.yaml).\n\n## Rule Application\nAll runs of `imagedephi` use the provided base set of rules as a foundation. End users can use the ruleset framework to build custom rulesets that handle additional or custom metadata not covered by the base rules, or override the behavior of the base rule set.\n\nOverride rule sets can be specified by using the `-R my_ruleset.yaml` or `--override-rules my_ruleset.yaml` option. This option is available for both the `imagedephi run` and `imagedephi plan` commands. Override rules sets are not provided by `imagedephi`, and must de defined by the end user.\n\nWhen `imagedephi` determines the steps to redact a file, it checks each piece of metadata in the file. For each piece of metadata found this way, it will first consult the override rule set, if present, for an applicable rule. If the override rule set does not contain a rule for that piece of metadata, the program will check the base ruleset.\n\nIf neither the override rule set or base rule set cover a piece of metadata, redaction will fail, and the program will list the metadata that it could not redact. There is no default behavior for unknown metadata.\n\n### Redaction Profiles\n\n#### Strict Redaction\nFor whole slide image formats based on the tiff standard, `imagedephi` allows a strict type of redaction. Using the `--profile strict` option when calling `imagedephi` from the CLI will use this mode. In this mode, only tags strictly required by the tiff standard will remain, and all other metadata will be stripped from the images. For a full list of metadata tags that will remain after strict redaction, see the [minimum rules file](https://github.com/DigitalSlideArchive/ImageDePHI/blob/main/imagedephi/minimum_rules.yaml).\n\n#### Fuzzing Dates and Times\nUsing the `--profile dates` option will replace dates, times, datetimes, and UTC offsets with values that semantically represent those things but with less precison than the original value. Dates will preserve the year, but the month and day will be set to January 1st. Times will be set to midnight and UTC offsets to +0000. Rules for this profile can be found in [modify_dates_rules.yaml](https://github.com/DigitalSlideArchive/ImageDePHI/blob/main/imagedephi/modify_dates_rules.yaml). For DICOM images, the [Attribute Confidentiality Profiles](https://dicom.nema.org/dicom/2013/output/chtml/part15/chapter_E.html) were used to determine which tags should be modified according to this profile.\n\n\n## Ruleset Format Overview\nIn order to read the base rules and build your own custom rule sets, it is important to understand the format in which rulesets are specified. Rulesets are defined by `.yaml` files (one ruleset per file), and are a dictionary with the following top-level tags: `name`, `description`, `output_file_name`, `tiff`, `svs`, and `dicom`.\n\n### Generic Properties\nThe following three properties belong to the rulesets themselves, and don't influence redaction behavior.\n\n#### `name`\nProvide a name for a ruleset. This is used by the `imagedephi plan` command to specify which ruleset is being used to redact a particular piece of metadata.\n\n#### `description`\nYou can add a description to your custom rulesets. This is not used by the program, but can be helpful to communicate what cases your custom rulesets are designed for.\n\n#### `output_file_name`\nSpecify how the output files should be named here. The base ruleset contains the value `study_slide`. In this case, if the input slides are named: `john_smith_lung.svs` and `john_smith_pancreas.svs`, the redacted output images will be named `study_slide_1.svs` and `study_slide_2.svs`.\n\n### Other Top-level Properties\n\n#### `strict`\nThe `strict` property of rulesets is used to denote that ALL unspecified tags should be deleted. This is supported for `tiff` and `svs` files. An example of using the strict flag can be seen in the `minimum_rules.yaml` rule set.\n\n### File Format Rules\nRedaction behavior is specified per file type. Currently pure `tiff` files, Aperio (`.svs`), and DICOM files are supported. Each image type has its own groups of data that can be redacted. For example, Aperio images have `tiff` metadata, certain associated images, and additional metadata specified in the `ImageDescription` tag. `svs` rulesets take the following shape:\n\n\n```yaml\nsvs:\n    associated_images:\n        ...\n    metadata:\n        ...\n    image_description:\n        ...\n```\n\nEach group is a dictionary whose keys represent a way to identify a specific piece of metadata or specific associated image, and whose values are dictionaries that define redaction behavior. Each entry (key-value pair) in the dictionary is a \"rule.\" Take the following `associated_image` rule from the base ruleset\n\n```yaml\nsvs:\n    ...\n    associated_images:\n        label:\n            action: replace\n            replace_with: blank_image\n    ...\n```\n\nThis describes how `imagedephi` handles `label` images for Aperio files by default. Since label images frequently contain PHI, but are required by the Aperio (.svs) format, they are replaced with a black square of the same size.\n\n#### Image Rules\n\nImage rules take the following form:\n\n```yaml\n\u003cimage_key\u003e:\n    action:\n```\n\nWhere `image_key` identifies a particular associated image. For a catch-all rule, use the key `default`.\n\nImage rules can have the following actions:\n\n* `replace`: Replace an image with another. If specified, a value for `replace_with` must also be provided\n* `keep`: Does nothing. The associated image matching this key will be included in the output file\n* `delete`: The image will not be included in the output file\n\nFor image rules, the only supported value of `replace_with` is `blank_image`.\n\n#### Metadata Rules\n\nMetadata rules take the following form:\n\n```yaml\n\u003cmetadata_key\u003e:\n    action:\n```\n\nWhere `metadata_key` identifies a piece of metadata. Possible values for this key depend on the type of metadata being redacted. For example, rules listed under\n\n```yaml\ntiff:\n    metadata:\n```\nhave `metadata_keys` for particular tiff tags (e.g. `ImageDescription`, `ImageWidth`).\n\nAvailable actions for metadata rules are:\n\n* `delete`: the metadata will not appear in the output file\n* `keep`: the metadata will appear unchanged in the output file\n* `replace`: replace the metadata with a specified value. If this is the `action`, additional fields are required.\n* `check_type`: This will either keep the metadata if the type matches or delete the metadata if the type does not match. Requires additional fields\n* `modify_date`: This will fuzz dates, times, datetimes, and time zone offsets. See the \"Profiles\" section for more details.\n\n##### `replace` rules\nRequire the additional property `replace_with`. The value specified by the `replace_with` key will be used to override the metadata in the output image.\n\n##### `check_type` rules\nUse the additional properties:\n* `expected_type`: one of `integer`, `number`, `text`, `rational`\n* `expected_count` (optional): if the piece of metadata can contain multiple values, specify how many are expected using this property. Defaults to `1`. If the `expected_type` is `rational`, this should be the expected number of rationals. That is, an `expected_count` of 1 would match with 2 integer values in the metadata.\n\n### Supported Formats\nCurrently, `imagedephi` supports redaction of the following types of files:\n* TIFF\n* Aperio (a tiff-like format, typically uses the extension `.svs`)\n* DICOM\n\n#### Tiff\nTiff rules have the following shape:\n\n```yaml\ntiff:\n    associated_images:\n        ...\n    metadata:\n        ...\n```\n\nThe keys for the `metadata` rules are the names of tiff tags defined by the tiff standard.\n\n#### Aperio\nAperio format rules have the following shape:\n\n```yaml\nsvs:\n    associated_images:\n        ...\n    metadata:\n        ...\n    image_description:\n        ...\n```\n\nThe keys for the `metadata` rules are the names of tiff tags defined by the tiff standard. Names are case insensitive and common variations are accepted, e.g. `GrayResponseUnit` and `GreyResponseUnit` are both accepted\n\nFor Aperio files, additional metadata is stored as key-value pairs in the `ImageDescription` tag. See more information about this [here](https://openslide.org/formats/aperio/). Each key in the `image_description` section is a key found in this `ImageDescription` string.\n\n#### DICOM\nDICOM format rules are much the same:\n\n```yaml\ndicom:\n    associated_images:\n        ...\n    custom_metadata_action: ...\n    metadata:\n        ...\n```\n\nNote that here there is an eplicit format-level setting for dealing with custom metadata. Any tag with an odd group number is considered custom metadata. This can be set to `keep`, `delete` or `use_rule`.\n\n* `keep`: Retain the custom metadata value after redaction. Rules for custom tags specified in the `metadata` section take precedence over this setting.\n* `delete`: Delete the custom metadata tag from the image. Rules fro custom tags specified in the `metadata` section take precedence over this setting.\n* `use_rule`: This mode will fall back to rules specified for each piece of custom metadata in the `metadata` section of the rule set. If a custom metadata tag with no corresponding rule is encountered, the image will not be redacted, as the redaction plan would be considered incomplete.\n\nAdditionally, DICOM redaction supports additional redaction operations.\n\n* `empty`: Replace the tag's value with `None`.\n* `replace_dummy`: Replace the tag's value with a dummy value, which is dependant on the original value type. For example, if the tag's value is a string, the dummy value is the empty string. If the tag's value is an integer, the dummy value is 0.\n* `replace_uid`: If the tag's value is a UID, it will be replaced with a randomly generated UID of the form `\"2.25.\u003cuuid\u003e\"` where `\u003cuuid\u003e` is a UUID generated a run time. The new custom UID is stored by Image DePHI and used to replace other UIDs that share the same initial value. This way, if a UID is used in different tags within an image, they all get the same replacement value.\n\n## Related Projects\n\nOther efforts related to anonimyzing medical images include:\n\n- [`dicom-anonymizer`](https://github.com/KitwareMedical/dicom-anonymizer): A python tool for anonymizing DICOM files\n- [WSI DeID](https://github.com/DigitalSlideArchive/DSA-WSI-DeID): A workflow built onto the [Digital Slide Archive](https://github.com/DigitalSlideArchive/digital_slide_archive/?tab=readme-ov-file#digital-slide-archive) for redacting medical images.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalslidearchive%2Fimagedephi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdigitalslidearchive%2Fimagedephi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalslidearchive%2Fimagedephi/lists"}