{"id":25536293,"url":"https://github.com/bhdicaire/datalossprevention","last_synced_at":"2026-01-27T14:03:17.136Z","repository":{"id":42614838,"uuid":"490367038","full_name":"bhdicaire/dataLossPrevention","owner":"bhdicaire","description":"Data Loss Prevention (DLP) Sample Data Files","archived":false,"fork":false,"pushed_at":"2024-07-28T23:06:01.000Z","size":13764,"stargazers_count":43,"open_issues_count":0,"forks_count":9,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-11T11:41:57.832Z","etag":null,"topics":["data-exfiltration","data-generator","data-loss-prevention","data-structures","dlp","fake","fake-data","faker","generator","mock-data","mock-data-generator","test-data"],"latest_commit_sha":null,"homepage":"","language":"PostScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bhdicaire.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-09T16:45:06.000Z","updated_at":"2025-09-28T03:08:01.000Z","dependencies_parsed_at":"2024-07-29T00:59:15.572Z","dependency_job_id":null,"html_url":"https://github.com/bhdicaire/dataLossPrevention","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bhdicaire/dataLossPrevention","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhdicaire%2FdataLossPrevention","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhdicaire%2FdataLossPrevention/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhdicaire%2FdataLossPrevention/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhdicaire%2FdataLossPrevention/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bhdicaire","download_url":"https://codeload.github.com/bhdicaire/dataLossPrevention/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bhdicaire%2FdataLossPrevention/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28814310,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T12:25:15.069Z","status":"ssl_error","status_checked_at":"2026-01-27T12:25:05.297Z","response_time":168,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-exfiltration","data-generator","data-loss-prevention","data-structures","dlp","fake","fake-data","faker","generator","mock-data","mock-data-generator","test-data"],"created_at":"2025-02-20T04:31:44.818Z","updated_at":"2026-01-27T14:03:17.123Z","avatar_url":"https://github.com/bhdicaire.png","language":"PostScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"![dataLossPrevention logo](https://github.com/bhdicaire/dataLossPrevention/raw/main/logo.png)\n\nYou’ve been there too — setting up a data loss prevention solution might be a damn long project (DLP), if you need to _support multiple languages_ and don’t have [adequate data sources](dataSets/source.md).\n\nThis repository consolidate Data Loss/Leak Prevention insight and [sample files (e.g., datasets)](dataSets/README.md), that I have collected and used over the years. Your quality assurance library does not have to be unique, everyone strives for consistency. \n\nFork this repository, and improve _your_ library. Even better, send me an update :laughing:.\n\n\u003e **A DLP solution is a set of enterprise processes, [tools, and techniques](techniques/README.md) that monitor sensitive information and prevent data exfiltration.**\n\n## What problem does it solve and why is it useful?\n\nI wasn't happy with the provided bundle of mock files to test my DLP policies and [demonstrate compliance](techniques/compliance.md). They were either [too simple or not localized for my use case](dataSets/README.md).\n\nFriend don’t let friends test the effectiveness of a DLP solution with production data. You need _realistic test data[^1]_ in several formats such as CSV, JSON, SQL, TXT, and Excel to make sure your DLP Policies are working correctly especially after a significant change. \n\n_dataLossPrevention_ by Benoît H. Dicaire is shared with an [unlicense](LICENSE). For more information, please refer to [unlicense.org](https://unlicense.org).\n\n[^1]: Refer to the [sensitive information type entity definitions provided by Microsoft](https://docs.microsoft.com/en-us/microsoft-365/compliance/sensitive-information-type-entity-definitions?view=o365-worldwide) for more information about the required structure.\n\n### Fake sensitive information generators\n\n \n| Name | Cybersecurity | Finance | Legal | Personal | Technology|\n| :-- | :--: | :--:| :--: | :--:| :--: |\n|[DLP Test](https://dlptest.com/)| X | X | X | X | X |\n|[Fake Person Generator](https://www.fakepersongenerator.com/)| X | X | X | X | X |\n|[Fake Generator](https://fakegenerator.net/)| X | X | X | X | X |\n|[GenerateData.com](https://generatedata.com/)[^2]| X | X | X | X | X |\n|[Get Fake Data](https://getfakedata.com/)| X | X | X | X | X |\n|[Get Bored Human](https://boredhumans.com/faces.php)| X | X | X | X | X |\n|[Mockaroo](https://www.mockaroo.com/)| X | X | X | X | X |\n|[Mock Turtle](https://mockturtle.net/)| X | X | X | X | X |\n|[Venkom](https://cloud.venkon.us/)| X | X | X | X | X |\n\n[^2]:Source code is available [on GitHub/benkeen/generatedata](https://github.com/benkeen/generatedata)\n\nYou can also search on GitHub for library code and C tool related to [data-generator](https://github.com/topics/data-generator), [fake-data](https://github.com/topics/fake-data), [mock-data ](https://github.com/topics/mock-data), [mock-data-generator](https://github.com/topics/mock-data-generator), and [test data](https://github.com/topics/test-data).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbhdicaire%2Fdatalossprevention","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbhdicaire%2Fdatalossprevention","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbhdicaire%2Fdatalossprevention/lists"}