{"id":25042352,"url":"https://github.com/sftcd/anonymise-column1","last_synced_at":"2025-10-24T17:51:09.894Z","repository":{"id":222562033,"uuid":"757734419","full_name":"sftcd/anonymise-column1","owner":"sftcd","description":"Run a keyed hash over column1 of a CSV file","archived":false,"fork":false,"pushed_at":"2024-02-17T03:12:00.000Z","size":8,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-30T22:42:14.655Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sftcd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-02-14T21:36:36.000Z","updated_at":"2024-02-14T22:44:30.000Z","dependencies_parsed_at":"2024-02-14T23:46:37.249Z","dependency_job_id":null,"html_url":"https://github.com/sftcd/anonymise-column1","commit_stats":null,"previous_names":["sftcd/anonymise-column1"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sftcd/anonymise-column1","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sftcd%2Fanonymise-column1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sftcd%2Fanonymise-column1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sftcd%2Fanonymise-column1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sftcd%2Fanonymise-column1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sftcd","download_url":"https://codeload.github.com/sftcd/anonymise-column1/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sftcd%2Fanonymise-column1/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280841004,"owners_count":26400397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-24T02:00:06.418Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-06T04:14:50.552Z","updated_at":"2025-10-24T17:51:09.876Z","avatar_url":"https://github.com/sftcd.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Run a keyed hash over column1 of a CSV file\n\nA colleague wanted to anonymise student numbers to do some privacy-friendly\nstatistics. This is my suggestion, done as a bash script that requires an\n``openssl`` install. (You also need whatever is the right package for ``xxd``.)\n\nExample:\n\n```bash\n$ head -3 input.csv\nID,col2,col3\n10334051,x,1\n11313330,y,2\n$ cat input.csv | AC1_SECRET=foo ./ac1.sh\nID,col2,col3\n2ac0e32e,x,1\n126145ad,y,2\n04a285f1,z,3\n```\n\nUsage:\n    $ ./ac1.sh [csv-file-name]\n\nThe CSV file can be provided as a command line argument. If none is provided\nthen the script will read from stdin.\n\nYou have to set a secret value to use for the key in the keyed hash.  You can\ndo that by setting a value for ``$AC1_SECRET`` in the environment. If no such\nvalue is set, then the script will prompt the user to enter the secret.\n\nUnder the hood, we do a HMAC-SHA256 using the secret as the key and we select\nthe first 8 ascii-hex output characters of that as the replacement for column 1\nof the input.\n\nIt should be easy enough to change the fixed values, so we'll not bother making\nthat more generic.\n\nIn case it helps, and though I'm not sure of the provenance, [this web\npage](https://www.i-scoop.eu/gdpr/pseudonymization/) does recommend this\npseudonymization technique. In any case, I recall related discussions when the\nGDPR was still \"fresh\" and this is what I recall folks recommending.\n\n## Handling the secret...\n\nAs there are only about 100k real student numbers in question,\na plain (un-keyed) hash of those could be easily reversed via\na brute force attack. So we should pick a secret that is not\nvulnerable to such an attack, or a dictionary attack.\n\nOne way to do that would be to use ``openssl`` to generate\na random number, then set ``$AC1_SECRET`` using that, e.g.:\n\n```bash\n$ export AC1_SECRET=`openssl rand -hex 32`\n$ echo $AC1_SECRET \u003esecret-file\n$ cat secret-file\n9e1f97a7254951d1c357aa8c71618a45ee6dccc4e5eca28734c166cbfc1c6137\n```\n\nIf we want to generate the same pseudonymized identifier for\nthe same input student number multiple times, e.g. each year\nwhen regenerating statistics, then we'll need to have stored\nthat secret somewhere in the meantime. If we did the above\nthen to re-use that secret:\n\n```bash\n$ export AC1_SECRET=`cat secret-file`\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsftcd%2Fanonymise-column1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsftcd%2Fanonymise-column1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsftcd%2Fanonymise-column1/lists"}