{"id":16759540,"url":"https://github.com/cyrildever/redacted","last_synced_at":"2025-04-10T17:14:55.230Z","repository":{"id":153441988,"uuid":"352397031","full_name":"cyrildever/redacted","owner":"cyrildever","description":"Redacting classified documents","archived":false,"fork":false,"pushed_at":"2025-01-31T08:26:36.000Z","size":1205,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-02T16:50:16.598Z","etag":null,"topics":["classified","data-masking","documents","executables","golang","javascript","library","python","redacted","typescript"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cyrildever.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-28T17:44:36.000Z","updated_at":"2025-01-31T08:26:40.000Z","dependencies_parsed_at":"2023-05-31T17:00:19.346Z","dependency_job_id":"fef20b83-f4be-42b7-a72f-14b3305fb656","html_url":"https://github.com/cyrildever/redacted","commit_stats":{"total_commits":101,"total_committers":1,"mean_commits":101.0,"dds":0.0,"last_synced_commit":"a813c8f286587f0d92cb82df02ee00cc7945c659"},"previous_names":[],"tags_count":35,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyrildever%2Fredacted","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyrildever%2Fredacted/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyrildever%2Fredacted/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyrildever%2Fredacted/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cyrildever","download_url":"https://codeload.github.com/cyrildever/redacted/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248261916,"owners_count":21074225,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classified","data-masking","documents","executables","golang","javascript","library","python","redacted","typescript"],"created_at":"2024-10-13T04:08:23.364Z","updated_at":"2025-04-10T17:14:55.207Z","avatar_url":"https://github.com/cyrildever.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# redacted\n_Redacting classified documents_\n\n![GitHub tag (latest by date)](https://img.shields.io/github/v/tag/cyrildever/redacted)\n![GitHub last commit](https://img.shields.io/github/last-commit/cyrildever/redacted)\n![GitHub issues](https://img.shields.io/github/issues/cyrildever/redacted)\n![npm](https://img.shields.io/npm/dw/redacted-ts)\n![NPM](https://img.shields.io/npm/l/redacted-ts)\n![PyPI - Version](https://img.shields.io/pypi/pyversion/redacted-py)\n\nThis repository holds the code base for my `redacted` libraries and executables.\nIt is mainly based off my [Feistel cipher for Format-Preserving Encryption](https://github.com/cyrildever/feistel) to which I added a few tools to handle document, database and file manipulation to ease out the operation.\n\n### Motivation\n\nIn some fields (like healthcare for instance), protecting the privacy of data whilst being able to conduct in-depth studies is both vital and mandatory. Redacting documents and databases is therefore the obligatory passage.\nWith `redacted`, I provide a simple yet secure tool to help redacting documents based on either a dictionary, a record layout or a tag to decide which parts should actually be redacted.\n\nAs of the latest version, this repository comes with four different flavours:\n* Executables (to use on either Linux, MacOS or Windows environments);\n* A Go library;\n* A Python library;\n* A Scala library to use in the JVM (which is not yet available on Maven Central Repository);\n* A TypeScript library (which is also available on [NPM](https://www.npmjs.com/package/redacted-ts)).\n\n\n### Usage\n\nYou can use either a dictionary or a tag (or both) to identify the words you want to redact in a document.\nThe tag should be placed before any word that should be redacted. The default tag is the tilde character (`~`).\n\nFor example, the following sentence will only see the word `tagged` redacted: `\"This is a ~tagged sentence\"`.\n\n#### 1. Executables\n\n```\nUsage of ./redacted:\n  -b    add to use both dictionary and tag\n  -d string\n        the optional path to the dictionary of words to redact\n  -h string\n        the hash engine for the round function (default \"sha-256\")\n  -i string\n        the path to the document to be redacted\n  -k string\n        the optional key for the FPE scheme (leave it empty to use default)\n  -o string\n        the name of the output file\n  -r int\n        the number of rounds for the Feistel cipher (default 10)\n  -t string\n        the optional tag that prefixes words to redact (default \"~\")\n  -x    add to expand a redacted document\n```\nThe dictionary file must contain a list of word separated by a space.\n\nDownload the version for the platform of your choice then execute the following command:\n```console\n$ ./redacted -i=myFile.txt -o=myRedactedFile.txt -d=myDictionary.txt -b\n```\n\n@also Installation procedure [here](go/INSTALL.md)\n\n__IMPORTANT: Do not use with input texts having lines longer than 65536 characters.__\n\n##### \u003cu\u003eAlternative using Java and the redacted JAR\u003c/u\u003e\n\n```console\n$ java -cp path/to/redacted.jar com.cyrildever.redacted.Main -i=myFile.txt -o=myRedactedFile.txt -d=myDictionary.txt -b\n```\n\n#### \u003cu\u003eAlternative using the TypeScript CLI\u003c/u\u003e\n\n```console\n$ redacted -i myFile.txt -o myRedactedFile.txt -d myDictionary.txt -b\n```\n\n@see Installation procedure [here](ts/cli/README.md)\n\n\n#### \u003cu\u003eAlternative using Python\u003c/u\u003e\n\n```console\n$ python3 -m redacted -i=myFile.txt -o=myRedactedFile.txt -d=myDictionary.txt -b\n```\n\n#### 2. Libraries\n\n\u003cu\u003eGo\u003c/u\u003e\n\n```console\n$ go get github.com/cyrildever/redacted/go\n```\n\n```golang\nimport (\n    \"github.com/cyrildever/feistel\"\n    \"github.com/cyrildever/redacted/go/core\"\n    \"github.com/cyrildever/redacted/go/model\"\n)\n\n// Load dictionary\ndic, err := model.FileToDictionary(\"/path/to/dictionary.txt\")\n\n// Prepare FPE cipher\ncipher := feistel.NewFPECipher(hashEngine, key, rounds)\n\n// Instantiate redactor\nredactor := core.NewRedactorWithDictionary(dic, cipher)\n\n// Redact a line\nredacted := redactor.Redact(line)\nfmt.Println(redacted)\n\n// Expand a redacted line\nassert.Equal(t, redactor.Expand(redacted), line)\n```\nSee the [`Dictionary`](model/dictionary.go) and the [`Redactor`](core/redactor.go) implementations to use other kinds of dictionaries (as a slice or from a string) and/or redactors (with or without tag and dictionary).\n\nNB: You may use any other kind of Format-Preserving Encryption library as long as it respects the following interface:\n```golang\ntype FPE interface {\n    Decrypt(base256.Readable) (string, error)\n    Encrypt(string) (base256.Readable, error)\n}\n```\n_See my implementation of the `base256.Readable` string type alias in its [module](https://github.com/cyrildever/feistel/common/utils/base256)._\n\nTo build in 64-bits (after cloning the repository and assuming you are on MacOS):\n\n_(for MacOS)_\n```console\n$ cd go\n$ GOOS=darwin GOARCH=amd64 go build -o bin/redacted main.go\n```\n\n_(for Linux)_\n```console\n$ brew install FiloSottile/musl-cross/musl-cross --with-arm\n$ git clone https://github.com/cyrildever/redacted.git \u0026\u0026 cd redacted/go\n$ CGO_ENABLED=1 GOOS=linux GOARCH=amd64 CC=\"x86_64-linux-musl-gcc\" go build -o bin/redacted-linux --ldflags '-w -linkmode external -extldflags \"-static\"' main.go\n```\n\u0026ensp;\u0026ensp;\u0026ensp;_@see [https://github.com/FiloSottile/homebrew-musl-cross](https://github.com/FiloSottile/homebrew-musl-cross)_\n\n_(for Windows)_\n```console\n$ brew install mingw-w64\n$ git clone https://github.com/cyrildever/redacted.git \u0026\u0026 cd redacted/go\n$ CGO_ENABLED=1 GOOS=windows GOARCH=amd64 CC=\"x86_64-w64-mingw32-gcc\" go build -o bin/redacted.exe main.go\n```\n\n\u003cu\u003ePython\u003c/u\u003e\n\n```console\n$ pip install redacted-py\n```\n\n```python\nfrom redacted import DefaultRedactor, Dictionary\nfrom feistel import FPECipher, SHA_256\n\nsource = \"Some text ~tagged or using words in a dictionary\"\n\ncipher = FPECipher(SHA_256, key, 10)\nredactor = DefaultRedactor(cipher)\nredacted = redactor.redact(source)\n\nexpanded = redactor.expand(redacted)\nassert expanded == source, \"Original data should equal ciphered then deciphered data\"\n\ncleansed = redactor.clean(expanded)\nassert cleansed == \"Some text tagged or using words in a dictionary\", \"Cleaning should remove any tag mark\"\n```\n\n\n\u003cu\u003eScala\u003c/u\u003e\n\nIn a Scala 2.12 project:\n```sbt\nlibraryDependencies ++= Seq(\n    \"com.cyrildever\" %% \"feistel-jar\" % \"1.5.7\",\n    \"com.cyrildever\" %% \"redacted\" % \"1.0.8\"\n)\n```\n\n```scala\nimport com.cyrildever.feistel.common.utils.hash.Engine._\nimport com.cyrildever.feistel.Feistel\nimport com.cyrildever.redacted.core.Redactor\n\nval source = \"Some text ~tagged or using words in a dictionary\"\n\nval cipher = Feistel.FPECipher(SHA_256, key, 10)\nval redactor = Redactor(dictionary, tag, cipher, true)\nval redacted = redactor.redact(source)\n\nval expanded = redactor.expand(redacted)\nassert(expanded == source)\n```\n\n_NB: You might need to provide the expected BouncyCastle JAR file, eg. `bcprov-jdk15to18-1.80.jar`._\n\n\n\u003cu\u003eTypeScript/JavaScript\u003c/u\u003e\n\n```console\n$ npm install redacted-ts\n```\n\n```typescript\nimport { DefaultRedactor, Dictionary } from 'redacted-ts'\nimport { FPECipher, SHA_256 } from 'feistel-cipher'\n\nconst source = 'Some text ~tagged or using words in a dictionary'\n\nconst cipher = new FPECipher(SHA_256, key, 10)\nconst redactor = DefaultRedactor(cipher)\nconst redacted = redactor.redact(source)\n\nconst expanded = redactor.expand(redacted)\nassert(expanded === source)\n\nconst cleansed = redactor.clean(expanded)\nassert(cleansed === 'Some text tagged or using words in a dictionary')\n```\n\n### License\n\nThe use of the `redacted` libraries and executables are subject to fees for commercial purpose and to the respect of the [BSD-2-Clause-Patent license](LICENSE). \\\nPlease [contact me](mailto:cdever@pep-s.com) to get further information.\n\n\n\u003chr /\u003e\n\u0026copy; 2021-2025 Cyril Dever. All rights reserved.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyrildever%2Fredacted","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyrildever%2Fredacted","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyrildever%2Fredacted/lists"}