{"id":13585151,"url":"https://github.com/jbd/msrsync","last_synced_at":"2025-04-07T06:32:43.661Z","repository":{"id":32201710,"uuid":"35775378","full_name":"jbd/msrsync","owner":"jbd","description":"Multi-stream rsync wrapper","archived":false,"fork":false,"pushed_at":"2022-10-19T10:03:41.000Z","size":135,"stargazers_count":476,"open_issues_count":10,"forks_count":75,"subscribers_count":21,"default_branch":"master","last_synced_at":"2024-11-06T02:38:48.794Z","etag":null,"topics":["multi-stream-rsync","parallel","python","rsync"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jbd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-17T17:24:42.000Z","updated_at":"2024-10-24T13:58:20.000Z","dependencies_parsed_at":"2023-01-14T20:44:31.503Z","dependency_job_id":null,"html_url":"https://github.com/jbd/msrsync","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbd%2Fmsrsync","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbd%2Fmsrsync/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbd%2Fmsrsync/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbd%2Fmsrsync/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jbd","download_url":"https://codeload.github.com/jbd/msrsync/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247607556,"owners_count":20965943,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["multi-stream-rsync","parallel","python","rsync"],"created_at":"2024-08-01T15:04:46.107Z","updated_at":"2025-04-07T06:32:43.652Z","avatar_url":"https://github.com/jbd.png","language":"Python","readme":"This project is not actively developed. Please have a look at the alternatives in the motivation section.\n\n# msrsync: maximize rsync bandwidth usage\n\n`msrsync` (multi-stream rsync) is a python wrapper around `rsync`. It only depends on `python \u003e= 2.6` and `rsync`.\n\nIt will split the transfer in multiple buckets while the source is scanned and will hopefully help maximizing the usage of the available bandwidth by running a configurable number of `rsync` processes in parallel. The main limitation is it does not handle remote source or target directory, they must be locally accessible (local disk, nfs/cifs/other mountpoint). I hope to address this in a near future.\n\n## Quick example\n\n```bash\n$ msrsync -p 4 /source /destination # you can also use -P/--progress and --stats options\n```\n\nThis will copy /source directory in the /destination directory (same behaviour as `rsync` regarding the slash handling) using 4 `rsync` processes (using `\"-aS --numeric-ids\"` as default option. Could be override with `--rsync` option). `msrsync` will split the files and directory list into bucket of 1G or 1000 files maximum (see `--size` and `--files` options) before feeding them to each `rsync` process in parallel using the `--files-from` option. As long as the source and the destination can cope with the parallel I/O (think big boring \"enterprise grade\" NAS), it should be faster than a single `rsync`.\n\n\u003e `msrsync` shares the same spirit as [fpart](https://github.com/martymac/fpart) (and its [fpsync](https://github.com/martymac/fpart/blob/master/tools/fpsync) associated tool) by [Ganaël Laplanche](https://github.com/martymac) or [parsync](http://moo.nac.uci.edu/~hjm/parsync/) by [Harry Mangalam](https://github.com/hjmangalam). Those are two fantastic much more complete tools used in the field to do real work. Please check them out, they might be what you're looking for.\n\nYou can also check [fcp](https://github.com/olcf/pcircle) from the pcircle project. It looks very powerful. See the [associated publication](https://cug.org/proceedings/cug2016_proceedings/includes/files/pap142s2-file1.pdf).\n\n## Motivation\n\nWhy write `msrsync` if tools like [fpart](https://github.com/martymac/fpart), [parsync](http://moo.nac.uci.edu/~hjm/parsync/) or [pftool](https://github.com/pftool/pftool) exist ? While reasonable, their dependencies can be a point of friction given the constraints we can have on a given system. When you're lucky, you can use your package manager ([fpart](https://github.com/martymac/fpart) seems to be well supported among various GNU/Linux and FreeBSD distribution: [FreeBSD](http://www.freshports.org/sysutils/fpart), [Debian](http://packages.debian.org/fpart), [Ubuntu](http://packages.ubuntu.com/fpart), [Archlinux](https://aur.archlinux.org/packages/fpart/), [OBS](https://build.opensuse.org/package/show/home:mgoppold/fpart)) to deal with the requirements but more often than not, I found myself struggling with the sad state of the machine I'm working with.\n\nThat's why the only dependencies of msrsync are [python](https://www.python.org/) \u003e=2.6 and [rsync](https://rsync.samba.org/). What python 2.6 ? I'm aiming RHEL6 like distribution as a minimum requirement here, so I'm stuck with python 2.6. I miss some cool features, but that's part of the project.\n\nThe devil is in the details. If you need a starting point to think about data migration, this overview by Jeff Layton is very informative: [Moving Your Data – It’s Not Always Pleasant](http://www.admin-magazine.com/HPC/Articles/Moving-Your-Data-It-s-Not-Always-Pleasant).\n\nThe \"[How to transfer large amounts of data via network](http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html)\" article by `parsync` author is updated regularly and its worth a read also.\n\nIf you can read french, I co-wrote an article with [Ganaël Laplanche](https://github.com/martymac) about [fpart](https://github.com/martymac/fpart) : [Parallélisez vos transferts de fichiers](http://connect.ed-diamond.com/GNU-Linux-Magazine/GLMF-164/Parallelisez-vos-transferts-de-fichiers).\n\nYou might be also interested by this Intel whitepaper on data migration : [Data Migration with\nIntel® Enterprise Edition for Lustre* Software](http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/data-migration-enterprise-edition-for-lustre-software-white-paper.pdf) which mentions all of those tools (but not `msrsync`).\n\n## Requirements\n\n[python](python) \u003e= 2.6 and [rsync](https://rsync.samba.org/)\n\n## Installation\n\n`msrsync` is a single python file, you just have to download it. Or if you prefer, you can clone the repository and use the provided Makefile:\n\n```bash\n$ wget https://raw.githubusercontent.com/jbd/msrsync/master/msrsync \u0026\u0026 chmod +x msrsync\n```\nor\n```bash\n$ git clone https://github.com/jbd/msrsync \u0026\u0026 cd msrsync \u0026\u0026 sudo make install\n```\n\n## Usage\n\n```\n$ msrsync --help\nusage: msrsync [options] [--rsync \"rsync-options-string\"] SRCDIR [SRCDIR2...] DESTDIR\n   or: msrsync --selftest\n\nmsrsync options:\n    -p, --processes ...   number of rsync processes to use [1]\n    -f, --files ...       limit buckets to \u003cfiles\u003e files number [1000]\n    -s, --size ...        limit partitions to BYTES size (1024 suffixes: K, M, G, T, P, E, Z, Y) [1G]\n    -b, --buckets ...     where to put the buckets files (default: auto temporary directory)\n    -k, --keep            do not remove buckets directory at the end\n    -j, --show            show bucket directory\n    -P, --progress        show progress\n    --stats               show additional stats\n    -d, --dry-run         do not run rsync processes\n    -v, --version         print version\n\nrsync options:\n    -r, --rsync ...       MUST be last option. rsync options as a quoted string [\"-aS --numeric-ids\"]. The \"--from0 --files-from=... --quiet --verbose --stats --log-file=...\" options will ALWAYS be added, no\n                            matter what. Be aware that this will affect all rsync *from/filter files if you want to use them. See rsync(1) manpage for details.\n\nself-test options:\n    -t, --selftest        run the integrated unit and functional tests\n    -e, --bench           run benchmarks\n    -g, --benchshm        run benchmarks in /dev/shm or the directory in $SHM environment variable\n```\n\nIf you want to use specific options for the rsync processes, use the `--rsync` option.\n\n```bash\n$ msrsync -p4 --rsync \"-a --numeric-ids --inplace\" source destination\n```\n\nSome examples:\n```\n$ msrsync -p 8 /usr/share/doc/ /tmp/doc/\n```\n```\n$ msrsync -P -p 8 /usr/share/doc/ /tmp/doc/\n[33491/33491 entries] [602.1 M/602.1 M transferred] [3378 entries/s] [60.7 M/s bw] [monq 1] [jq 1]\n```\n```\n$ msrsync -P -p 8 --stats /usr/share/doc/ /tmp/doc/\n[33491/33491 entries] [602.1 M/602.1 M transferred] [3533 entries/s] [63.5 M/s bw] [monq 1] [jq 1]\nStatus: SUCCESS\nWorking directory: /home/jbdenis/Code/msrsync\nCommand line: ./msrsync -P -p 8 --stats /usr/share/doc/ /tmp/doc/\nTotal size: 602.1 M\nTotal entries: 33491\nBuckets number: 34\nMean entries per bucket: 985\nMean size per bucket: 17.7 M\nEntries per second: 3533\nSpeed: 63.5 M/s\nRsync workers: 8\nTotal rsync's processes (34) cumulative runtime: 73.0s\nCrawl time: 0.4s (4.3% of total runtime)\nTotal time: 9.5s\n```\n\n## Performance\n\nYou can launch a benchmark using the `--bench` option or `make bench`. It is only for testing purpose. They are comparing the performance between vanilla `rsync` and `msrsync` using multiple options. Since I'm just creating a huge fake file tree with empty files, you won't see any `msrsync` benefits here, unless you're trying with many many files. They need to be run as root since I'm dropping disk cache between run.\n\n```\n$ sudo make bench # or sudo msrsync --bench\nBenchmarks with 100000 entries (95% of files):\nrsync -a --numeric-ids took 14.05 seconds (speedup x1)\nmsrsync --processes 1 --files 1000 --size 1G took 18.58 seconds (speedup x0.76)\nmsrsync --processes 2 --files 1000 --size 1G took 10.61 seconds (speedup x1.32)\nmsrsync --processes 4 --files 1000 --size 1G took 6.60 seconds (speedup x2.13)\nmsrsync --processes 8 --files 1000 --size 1G took 6.58 seconds (speedup x2.14)\nmsrsync --processes 16 --files 1000 --size 1G took 6.66 seconds (speedup x2.11)\n```\n\nPlease test on real data instead =). There is also a `--benchshm` option that will perform the benchmark in `/dev/shm`.\n\nHere is a real test on a big nas box (not known for handling small files well) on a 1G network (you'll see that is more than useless due to the I/O overhead) with the [linux 4.0.4](https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.0.4.tar.xz) kernel decompressed source 21 times in different folders:\n\n```\n$ ls /mnt/nfs/linux-src/\n0  1  10  11  12  13  14  15  16  17  18  19  2  20  3  4  5  6  7  8  9\n$ du -s --apparent-size --bytes /mnt/nfs/linux-src\n11688149821     /mnt/nfs/linux-src\n$ du -s --apparent-size --human /mnt/nfs/linux-src\n11G     /mnt/nfs/linux-src\n$ find /mnt/nfs/linux-src -type f | wc -l\n1027908\n$ find /mnt/nfs/linux-src -type d | wc -l\n66360\n```\n\nThe source and the destination are on an nfs mount.\n\nLet's run `rsync` and `msrsync` with a various number of process:\n\n```\n$ rm -rf /mnt/nfs/dest\n$ echo 3 | sudo tee /proc/sys/vm/drop_caches \u003e /dev/null\n$ time rsync -a --numeric-ids /mnt/nfs/linux-src /mnt/nfs/dest\n\nreal    136m10.406s\nuser    1m54.939s\nsys     7m31.188s\n\n$ rm -rf /mnt/nfs/dest\n$ echo 3 | sudo tee /proc/sys/vm/drop_caches \u003e /dev/null\n$ msrsync -p 1 /mnt/nfs/linux-src /mnt/nfs/dest\n\nreal    144m8.954s\nuser    2m20.426s\nsys     8m4.127s\n\n$ rm -rf /mnt/nfs/dest\n$ echo 3 | sudo tee /proc/sys/vm/drop_caches \u003e /dev/null\n$ msrsync -p 2 /mnt/nfs/linux-src /mnt/nfs/dest\n\nreal    73m57.312s\nuser    2m27.543s\nsys     7m56.484s\n\n$ rm -rf /mnt/nfs/dest\n$ echo 3 | sudo tee /proc/sys/vm/drop_caches \u003e /dev/null\n$ msrsync -p 4 /mnt/nfs/linux-src /mnt/nfs/dest\n\nreal    42m31.105s\nuser    2m24.196s\nsys     7m46.568s\n\n$ rm -rf /mnt/nfs/dest\n$ echo 3 | sudo tee /proc/sys/vm/drop_caches \u003e /dev/null\n$ msrsync -p 8 /mnt/nfs/linux-src /mnt/nfs/dest\n\nreal    36m55.141s\nuser    2m27.149s\nsys     7m40.392s\n\n$ rm -rf /mnt/nfs/dest\n$ echo 3 | sudo tee /proc/sys/vm/drop_caches \u003e /dev/null\n$ msrsync -p 16 /mnt/nfs/linux-src /mnt/nfs/dest\n\nreal    33m0.976s\nuser    2m35.848s\nsys     7m40.623s\n```\n\nRidiculous rates due to the size of each file and the I/O overhead (nfs + network), but that's a real use case and we've got nice speedup without too much thinking : just use msrync and you're good to go. That's exactly what I wanted. Here is a summary of the previous\nresults:\n\n| Command       |  Time      | Entries per second   | Bandwidth (MBytes/s) | Speedup |\n| --------      |:----------:|:--------------------:|:--------------------:|:-------:|\n| rsync         | 136m10s    |       133            |      1.36            |   x1    |\n| msrsync -p 1  | 144m9s     |       126            |      1.28            |   x0.94 |\n| msrsync -p 2  | 73m57s     |       246            |      2.51            |   x1.84 |\n| msrsync -p 4  | 42m31s     |       428            |      4.36            |   x3.20 |\n| msrsync -p 8  | 36m55s     |       494            |      5.03            |   x3.68 |\n| msrsync -p 16 | 33m0s      |       552            |      5.62            |   x4.12 |\n\nAstute readers will notify the slight overhead of `msrync` over the equivalent `rsync` in the single process case. This overhead vanishes (but still exists) when you increase processes number.\n\n## Notes\n\n- The `rsync` processes are always run with the `--from0 --files-from=... --quiet --verbose --stats --log-file=...` options, no matter what. `--from0` option affects `--exclude-from`, `--include-from`, `--files-from`, and any merged files specified in a `--filter` rule.\n\n- This may seem obvious but if the source or the destination of the copy cannot handle parallel I/O well, you won't see any benefits (quite the opposite in fact) using `msrsync`.\n\n## Development\n\nI'm targeting python 2.6 without external dependencies besides rsync. The provided Makefile is just an helper around the embedded testing and coverage.py:\n\n```\n$ make help\nPlease use `make \u003ctarget\u003e' where \u003ctarget\u003e is one of\n  clean         =\u003e clean all generated files\n  cov           =\u003e coverage report using /usr/bin/python-coverage (use COVERAGE env to change that)\n  covhtml       =\u003e coverage html report\n  man           =\u003e build manpage\n  test          =\u003e run embedded tests\n  install       =\u003e install msrsync in /usr/bin (use DESTDIR env to change that)\n  lint          =\u003e run pylint\n  bench         =\u003e run benchmarks (linux only. Need root to drop buffer cache between run)\n  benchshm      =\u003e run benchmarks using /dev/shm (linux only. Need root to drop buffer cache between run)\n\n```\nThere is an integrated test suite (`--selftest` option, or `make test`). Since I'm using unittest from python 2.6 library, I cannot capture the output of the tests (buffer parameter from TestResult object appeared in 2.7).\n\n```\n$ make test # or msrsync --selftest\ntest_get_human_size (__main__.TestHelpers)\nconvert bytes to human readable string ... ok\ntest_get_human_size2 (__main__.TestHelpers)\nconvert bytes to human readable string ... ok\ntest_human_size (__main__.TestHelpers)\nconvert human readable size to bytes ... ok\n...\ntest simple msrsync synchronisation ... ok\ntest_msrsync_cli_2_processes (__main__.TestSyncCLI)\ntest simple msrsync synchronisation ... ok\ntest_msrsync_cli_4_processes (__main__.TestSyncCLI)\ntest simple msrsync synchronisation ... ok\ntest_msrsync_cli_8_processes (__main__.TestSyncCLI)\ntest simple msrsync synchronisation ... ok\ntest_simple_msrsync_cli (__main__.TestSyncCLI)\ntest simple msrsync synchronisation ... ok\ntest_simple_rsync (__main__.TestSyncCLI)\ntest simple rsync synchronisation ... ok\n\n----------------------------------------------------------------------\nRan 29 tests in 3.320s\n\nOK\n```\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjbd%2Fmsrsync","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjbd%2Fmsrsync","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjbd%2Fmsrsync/lists"}