{"id":13586052,"url":"https://github.com/shirosaidev/saisoku","last_synced_at":"2025-04-14T04:32:25.793Z","repository":{"id":128791067,"uuid":"171464720","full_name":"shirosaidev/saisoku","owner":"shirosaidev","description":"Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.","archived":false,"fork":false,"pushed_at":"2020-09-12T05:38:26.000Z","size":124,"stargazers_count":44,"open_issues_count":2,"forks_count":3,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-02-13T21:21:54.852Z","etag":null,"topics":["data-pipeline","data-synchronization","data-transfer","directory-transfer","file-transfer","luigi","luigi-pipeline","orchestration-framework","pipeline","python","rclone","s3","scheduling","sync","sync-directories","tornado","transfer-files","transfer-server"],"latest_commit_sha":null,"homepage":"https://shirosaidev.github.io/saisoku/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shirosaidev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.MD","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-19T11:50:13.000Z","updated_at":"2024-08-01T16:31:59.310Z","dependencies_parsed_at":"2023-04-25T15:31:24.166Z","dependency_job_id":null,"html_url":"https://github.com/shirosaidev/saisoku","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shirosaidev%2Fsaisoku","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shirosaidev%2Fsaisoku/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shirosaidev%2Fsaisoku/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shirosaidev%2Fsaisoku/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shirosaidev","download_url":"https://codeload.github.com/shirosaidev/saisoku/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248822024,"owners_count":21166993,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-pipeline","data-synchronization","data-transfer","directory-transfer","file-transfer","luigi","luigi-pipeline","orchestration-framework","pipeline","python","rclone","s3","scheduling","sync","sync-directories","tornado","transfer-files","transfer-server"],"created_at":"2024-08-01T15:05:17.947Z","updated_at":"2025-04-14T04:32:24.978Z","avatar_url":"https://github.com/shirosaidev.png","language":"Python","funding_links":["https://www.patreon.com/shirosaidev","https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=CLF223XAS4W72"],"categories":["Python"],"sub_categories":[],"readme":"# saisoku - Fast file transfer orchestration pipeline\n\n\u003cimg align=\"left\" width=\"226\" height=\"200\" src=\"docs/saisoku.png?raw=true\" hspace=\"5\" vspace=\"5\" alt=\"saisoku\"\u003e\n\nSaisoku is a Python (2.7, 3.6 tested) package that helps you build complex pipelines of batch file/directory transfer/sync jobs. It supports threaded transferring of files locally, over network mounts, or HTTP. With Saisoku you can also transfer files to and from AWS S3 buckets and sync directories using Rclone and keep directories in sync \"real-time\" with Watchdog.\n\nSaisoku includes a Transfer Server and Client which support copying over TCP sockets.\n\nSaisoku uses Luigi for task management and web ui. To learn more about Luigi, see it's [github](https://github.com/spotify/luigi) or [readthedocs](https://luigi.readthedocs.io/en/stable/index.html).\n\n\n[![License](https://img.shields.io/github/license/shirosaidev/saisoku.svg?label=License\u0026maxAge=86400)](./LICENSE)\n[![Release](https://img.shields.io/github/release/shirosaidev/saisoku.svg?label=Release\u0026maxAge=60)](https://github.com/shirosaidev/saisoku/releases/latest)\n[![Sponsor Patreon](https://img.shields.io/badge/Sponsor%20%24-Patreon-brightgreen.svg)](https://www.patreon.com/shirosaidev)\n[![Donate PayPal](https://img.shields.io/badge/Donate%20%24-PayPal-brightgreen.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=CLF223XAS4W72)\n\n\n## Requirements\n- luigi\n- tornado\n- scandir\n- pyfastcopy\n- tqdm\n- requests\n- beautifulsoup4\n- boto3\n- watchdog\n\nInstall above python modules using pip\n\n```sh\n$ pip install -r requirements.txt\n```\n\n## Download\n\n```shell\n$ git clone https://github.com/shirosaidev/saisoku.git\n$ cd saisoku\n```\n[Download latest version](https://github.com/shirosaidev/saisoku/releases/latest)\n\n\n## How to use\n\n## Start Luigi\n\nCreate directory for state file for Luigi\n```sh\n$ mkdir /usr/local/var/luigi-server\n```\nStart Luigi scheduler daemon in foreground with\n```sh\n$ luigid --state-path=/usr/local/var/luigi-server/state.pickle\n```\nor in the background with\n```sh\n$ luigid --background --state-path=/usr/local/var/luigi-server/state.pickle --logdir=/usr/local/var/log\n```\nIt will default to port 8082, so you can point your browser to http://localhost:8082 to access the web ui.\n\n## Configure Boto 3\n\nIf you are going to use the S3 copy Luigi tasks, first start be setting up Boto 3 (aws sdk python module) with the quick start instructions at [boto 3 github](https://github.com/boto/boto3).\n\n## Usage - Luigi tasks\n\n### Local/network mount copy\n\nWith the Luigi centralized scheduler running, we can send a copy files task to Luigi \n```sh\n$ python run_luigi.py CopyFiles --src /source/path --dst /dest/path\n```\n\nSee below for the different [parameters](#using-saisoku-module-in-python) for each Luigi task.\n\n### Tarball package copy\nTo run a copy package task, which will create a tar.gz (gzipped tarball) file containing all files at src and copy the tar.gz to dst\n```sh\n$ python run_luigi.py CopyFilesPackage --src /source/path --dst /dest/path\n```\n\n### HTTP copy\n\nStart up 2 Saisoku http servers, the get requests from saisoku clients will be load balanced across these.\n```sh\n$ python saisoku_server.py --httpserver -p 5005 -d /src/dir\n$ python saisoku_server.py --httpserver -p 5006 -d /src/dir\n```\nThis will create an index.html file on http://localhost:5005 serving up the files in /src/dir.\n\nTo send a HTTP copy files task to Luigi\n```sh\n$ python run_luigi.py CopyFilesHTTP --src http://localhost --dst /dest/path --ports [5005,5006] --threads 2\n```\n\n### S3 copy\n\nTo copy a local file to s3 bucket\n```sh\n$ python run_luigi.py CopyLocalFileToS3 --src /source/file --dst s3://bucket/foo/bar\n```\n\ns3 bucket object to local file\n```sh\n$ python run_luigi.py CopyS3lFileToLocal --src s3://bucket/foo/bar --dst /dest/file\n```\n\n### Rclone sync\n\nSaisoku can use Rclone to sync directories, etc. First, make sure you have [Rclone](https://rclone.org/) installed and in your PATH.\n\nTo to do a dry-run sync from source to dest using Rclone:\n```sh\n$ python run_luigi.py SyncDirsRclone --src /source/path --dst /dest/path\n```\n\nTo sync from source to dest using Rclone\n```sh\n$ python run_luigi.py SyncDirsRclone --src /source/path --dst /dest/path --cmdargs '[\"-vv\"]'\n```\n\nTo change the subcommand that Rclone uses (default is sync)\n```sh\n$ python run_luigi.py SyncDirsRclone --src /source/path --dst /dest/path --command 'subcommand'\n```\n\n### Watchdog directory sync\n\nSaisoku can use watchdog to keep directories synced in \"real-time\". First, make sure you have rsync installed and in your PATH.\n\nTo keep directories in sync from source to dest using Watchdog\n```sh\n$ python run_luigi.py SyncDirsWatchdog --src /source/path --dst /dest/path\n```\n\n## Usage - Server -\u003e Client transfer\n\nStart up Saisoku Transfer server listening on all interfaces on port 5005 (default)\n```sh\n$ python saisoku_server.py --host 0.0.0.0 -p 5005\n```\nRun client to download file from server\n```sh\n$ python saisoku_client.py --host 192.168.2.3 -p 5005 /path/to/file\n```\n\n\n## Log file\n\nSaisoku output get logged to os env TEMP/TMPDIR directory in `saisoku.log` file.\n\n\n## Using saisoku module in Python\n\n### ThreadedCopy\n\nSaisoku's `ThreadedCopy` class requires two parameters:\n\n`src` source directory containing files you want to copy\n\n`dst` destination directory of where you want the files to go (directory will be created if not there already)\n\nOptional parameters:\n\n`filelist` optional txt file containing one filename per line of files in src directory (not full path)\n\n`ignore` optional ignore files list, example `['*.pyc', 'tmp*']`\n\n`threads` number of worker copy threads (default 16)\n\n`symlinks` copy symlinks (default False)\n\n`copymeta` copy file stat info (default True)\n\n\n```\n\u003e\u003e\u003e from saisoku import ThreadedCopy\n\n\u003e\u003e\u003e ThreadedCopy(src='/source/dir', dst='/dest/dir', filelist='filelist.txt')\ncalculating total file size..\n100%|██████████████████████████████████████████████████████████| 173/173 [00:00\u003c00:00, 54146.30files/s]\ncopying 173 files..\n100%|██████████████████████████████████████████████| 552M/552M [00:06\u003c00:00, 97.6MB/s, file=dk-9.4.zip]\n```\n\n### ThreadedHTTPCopy\n\nSaisoku's `ThreadedHTTPCopy` class requires two parameters:\n\n`src` source http tornado server (tserv) serving a directory of files you want to copy\n\n`dst` destination directory of where you want the files to go (directory will be created if not there already)\n\nOptional parameters:\n\n`threads` number of worker copy threads (default 1)\n\n`ports` tornado server (tserv) ports, these ports will be load balanced (default [5000])\n\n`fetchmode` file get mode, either requests or urlretrieve (default urlretrieve)\n\n`chunksize` chunk size for requests fetchmode (default 8192)\n\n```\n\u003e\u003e\u003e from saisoku import ThreadedHTTPCopy\n\n\u003e\u003e\u003e ThreadedHTTPCopy('http://localhost', '/dest/dir')\n```\n\n### Rclone\n\nSaisoku's `Rclone` class requires two parameters:\n\n`src` source directory of files you want to sync\n\n`dst` destination directory of where you want the files to go\n\nOptional parameters:\n\n def __init__(self, src, dst, flags=[], command='sync', cmdargs=[]):\n\n`flags` a list of Rclone flags (default [])\n\n`command` subcommand you want Rclone to use (default sync)\n\n`cmdargs` a list of command args to use (default ['--dry-run', '-vv'])\n\n```\n\u003e\u003e\u003e from saisoku import Rclone\n\n\u003e\u003e\u003e Rclone('/src/dir', '/dest/dir')\n```\n\n### Watchdog\n\nSaisoku's `Watchdog` class requires two parameters:\n\n`src` source directory of files you want to sync\n\n`dst` destination directory of where you want the files to go\n\nOptional parameters:\n\n def __init__(self, src, dst, recursive, patterns, ignore_patterns, ignore_directories, case_sensitive)\n\n`recursive` bool used for recurisvely checking all sub directories for changes (default True)\n\n`patterns` file name patterns to use when checking for changes (default *)\n\n`ignore_patterns` file name patterns to ingore when checking for changes (default *)\n\n`ignore_directories` bool used for ignoring directories (default False)\n\n`case_sensitive` bool used for being case sensitive (default True)\n\n\n```\n\u003e\u003e\u003e from saisoku import Watchdog\n\n\u003e\u003e\u003e Watchdog('/src/dir', '/dest/dir')\n```\n\n## Patreon\nIf you are a fan of the project or using Saisoku in production, please consider becoming a [Patron](https://www.patreon.com/shirosaidev) to help advance the project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshirosaidev%2Fsaisoku","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshirosaidev%2Fsaisoku","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshirosaidev%2Fsaisoku/lists"}