{"id":20619745,"url":"https://github.com/borgbackup/backupdata","last_synced_at":"2025-04-15T12:12:13.995Z","repository":{"id":95311274,"uuid":"43649042","full_name":"borgbackup/backupdata","owner":"borgbackup","description":"create lots of data for backup scalability testing","archived":false,"fork":false,"pushed_at":"2021-10-09T21:07:24.000Z","size":128,"stargazers_count":12,"open_issues_count":1,"forks_count":6,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-10-30T05:42:04.231Z","etag":null,"topics":["backup","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/borgbackup.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2015-10-04T19:14:48.000Z","updated_at":"2024-04-25T21:17:42.000Z","dependencies_parsed_at":"2023-09-25T06:47:50.067Z","dependency_job_id":"37b52961-0603-4240-8f0b-7e1ea4d880e0","html_url":"https://github.com/borgbackup/backupdata","commit_stats":{"total_commits":5,"total_committers":2,"mean_commits":2.5,"dds":0.4,"last_synced_commit":"5f173f5582d5faf0302e23d81c1abbe66799904f"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borgbackup%2Fbackupdata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borgbackup%2Fbackupdata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borgbackup%2Fbackupdata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borgbackup%2Fbackupdata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/borgbackup","download_url":"https://codeload.github.com/borgbackup/backupdata/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224911733,"owners_count":17390845,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backup","python"],"created_at":"2024-11-16T12:12:24.807Z","updated_at":"2024-11-16T12:12:25.259Z","avatar_url":"https://github.com/borgbackup.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Tools for backup scalability testing\n====================================\n\nI made this to test scalability for borgbackup, but maybe you find it useful\nfor testing other stuff, too.\n\nmkdata.py\n---------\n\nRealistically testing a deduplication backup software with a lot of data isn't\neasy if you do not have a lot of such data.\n\nIf you need to create such data, you can't just duplicate existing data (the\nbackup tool would just deduplicate it and not create a lot of output data).\nAlso, just fetching data from /dev/urandom is rather slow (and the data is not\nat all \"realistic\", because it is too random).\n\nThe solution is to start from a set of real files (maybe 1-2GB in size), but\nto modify each copy slightly (and repeatedly, so there are not even longer\nduplicate chunks inside the files) by inserting some bytes derived from a\ncounter.\n\nPlease note that due to this, all output files are \"corrupt\" copies and\nonly intended as test data and expected to be thrown away after the test.\nThe input files are not modified on disk.\n\nThis tool expects some data in the SRC directory, it could look like\nthis, for example (test data is not included, please use your own data):\n\n::\n\n    234M  testdata/bin     # linux executable binaries\n    245M  testdata/jpg     # photos\n    101M  testdata/ogg     # music\n    4.0K  testdata/sparse  # 1x 1GB empty sparse file, name must be \"sparse\"\n    259M  testdata/src_txt # source code, lots of text files\n    151M  testdata/tgz     # 1x tar.gz file\n\n\nMake sure all the SRC data fits into memory as it will be read into and kept\nin RAM for better performance.\n\nThe tool creates N (modified) copies of this data set in directories named\n0 .. N inside the DST directory.\n\nThe copies of the empty \"sparse\" file will also be created as empty sparse\nfiles and they won't be modified. This can be used to test extreme\ndeduplication (or handling of sparse input files) by the tested backup tool.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fborgbackup%2Fbackupdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fborgbackup%2Fbackupdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fborgbackup%2Fbackupdata/lists"}