{"id":23776160,"url":"https://github.com/cea-list/scratch_manager","last_synced_at":"2025-04-12T19:36:03.203Z","repository":{"id":43634569,"uuid":"511473075","full_name":"CEA-LIST/scratch_manager","owner":"CEA-LIST","description":"A daemon to automate caching of read-only datasets between slow and fast storage locations.","archived":false,"fork":false,"pushed_at":"2023-03-30T09:47:05.000Z","size":127,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-26T13:53:48.154Z","etag":null,"topics":["cache","cache-storage","cluster","datasets","deep-learning","filesystem","hpc","io","linux","machine-learning","mlops","nfs","scratch","squashfs","ssd","storage"],"latest_commit_sha":null,"homepage":"https://cea-list.github.io/scratch_manager/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CEA-LIST.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-07T09:56:15.000Z","updated_at":"2023-11-21T13:18:04.000Z","dependencies_parsed_at":"2025-02-21T06:36:35.308Z","dependency_job_id":null,"html_url":"https://github.com/CEA-LIST/scratch_manager","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CEA-LIST%2Fscratch_manager","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CEA-LIST%2Fscratch_manager/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CEA-LIST%2Fscratch_manager/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CEA-LIST%2Fscratch_manager/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CEA-LIST","download_url":"https://codeload.github.com/CEA-LIST/scratch_manager/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248623205,"owners_count":21135204,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cache","cache-storage","cluster","datasets","deep-learning","filesystem","hpc","io","linux","machine-learning","mlops","nfs","scratch","squashfs","ssd","storage"],"created_at":"2025-01-01T07:13:12.265Z","updated_at":"2025-04-12T19:36:03.177Z","avatar_url":"https://github.com/CEA-LIST.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scratch manager\n\nScratch manager is a daemon which automates caching for read only dataset on a system configuration with a slow and a fast storage device.\nIn a typical HPC environement, the slow storage is a shared network filesystem and the fast one is an ssd drive on the compute node.\nThe scratch manager monitors read throughput on a list of dataset stored on the large but slow storage and moves the most active ones to a faster but limited cache storage.\nThe current implementation uses squashfs images to bundle together the files from each dataset.\nThis provide the following advantages:\n\n- Large datasets with many small files are bundled into one container file on the underlying storage.\n- The cache is shared across users and jobs\n- Once mounted, dataset content is accessible via the familar posix filesystem api.\n- Cached images are mounted over the mountpoint of the non-cached version.\n  Caching happens live and transparently for users, even if files are currently open.\n- The daemon relies on readily available linux io statistics and does not require dependencies other than python 3.\n\n\n## Principle\n\nThe general logic of the code is an infinite loop of the following steps:\n\n1. Check and handle added and deleted datasets\n2. Update throughput statistics\n3. Compute the optimal combination of dataset to cache or drop\n4. Unmount and delete datasets to drop from cache\n5. Copy and mount dataset to cache\n\n![scratch manager schema](scratch_manager.svg)\n\n\n## Installation\n\nClone the repository and move into it:\n\n```sh\ngit clone https://github.com/CEA-LIST/scratch_manager.git\ncd scratch_manager\n```\n\nBuild and install the package for RPM-based distributions:\n\n```sh\n# Install build dependencies\nsudo dnf install git python3-rpm-macros python3-wheel \\\n    python3-setuptools python3-setuptools_scm dnf-utils rpmdevtools\n\n# Generate rpm package\nmkdir -p rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}\nsed \"s/Version:.*/Version:        $(python3 -m setuptools_scm)/g\" \\\n    packaging/scratch_manager.spec \u003e rpmbuild/SPECS/scratch_manager.spec\npython3 setup.py sdist --dist-dir rpmbuild/SOURCES\nrpmbuild --define \"_topdir $PWD/rpmbuild\" -bb rpmbuild/SPECS/scratch_manager.spec\n\n# Remove build dependencies if desired\n# sudo dnf remove git python3-rpm-macros python3-wheel \\\n#     python3-setuptools python3-setuptools_scm dnf-utils rpmdevtools\n\n# Install package\nsudo dnf install rpmbuild/RPMS/noarch/*.rpm\n```\n\nBuild and install the package for DEB-based distributions:\n\n```sh\n# Install build dependencies\nsudo apt install -y git python3-setuptools-scm python3-pip\n\n# Generate deb package\npython3 -m pip install -U virtualenv build installer\npython3 -m build --wheel\npython3 -m installer dist/*.whl --prefix=/usr --destdir=debroot\nmkdir -p debroot/DEBIAN\nsed \"s/Version:.*/Version: $(python3 -m setuptools_scm)-1/g\" \\\n    packaging/control \u003e debroot/DEBIAN/control\ninstall -o root -m 0644 -D scratch_manager.service \\\n    debroot/usr/lib/systemd/system/scratch_manager.service\ndpkg-deb --root-owner-group --build debroot/ \\\n    scratch-manager_$(python3 -m setuptools_scm)-1_all.deb\n\n# Remove build dependencies if desired\n# sudo apt autoremove -y git python3-setuptools-scm python3-pip\n\n# Install package\nsudo dpkg -i scratch-manager_*.deb\n```\n\nEdit the `/etc/scratch_manager.conf` and set the daemon configuration:\n\n```ini\n; Directory containing squashfs image files.\ndatadir = /data/scratch_manager\n\n; Directory to use for caching image files.\ncachedir = /scratch/scratch_manager\n\n; Directory in under which dataset image files will be mounted, the\n; daemon will create a subdirectory for each image.\nmountdir = /media\n\n; Maximum allowed cache utilisation (specify the unit from 'GB', 'TB' or '%').\ncapacity = 25%\n\n; Sliding window in seconds over which to aggregate throughput stats.\nperiod = 600\n```\n\nEnsure the daemon starts after the filesystems it depends on:\n\n```sh\nsystemctl edit scratch_manager.service\n```\n\n```ini\n[Unit]\nRequires=home.mount\nAfter=home.mount\n```\n\nEnable and start the daemon:\n\n```sh\nsystemctl enable --now scratch_manager.service\n```\n\n\n## Adding datasets\n\nYou can generate an mksquashfs image from the content of a directory using the [mksquashfs](https://manpages.debian.org/testing/squashfs-tools/mksquashfs.1.en.html) command.\n\nBefore generating the image, make sure the permissions are correct, for example with:\n\n```sh\nfind . -type d -exec chmod 0755 {} \\;\nfind . -type f -exec chmod 0644 {} \\;\n```\n\nThen generate the archive file\n\n```sh\nmksquashfs . ../dataset_name.squashfs\n```\n\nThe following arguments for mksquashfs might be relevant in most situations:\n- `-all-root`: set owner and group of all files to root\n- `-progress -info`: show progress bar\n- `-comp lzo -noD -noF`: disable compression of data blocks and fragments and use lzo compression elsewhere\n- `-no-duplicates`: don't try to detect and deduplicate files\n- `-no-xattrs`: disable support for extended attributes\n- `-no-exports`: disable support for the image being re-exported via nfs\n- `-no-sparse`: don't try to detect and optimize sparse files\n\nOnce the image is ready, move it inside the directory specified by `--datadir`, it should be detected and mounted by the daemon automatically.\nMake sure the image file is already on the same filesystem so that the move is atomic.\nOtherwise, the daemon might try and fail to mount the image while it is being transfered.\n\n\n## Removing a dataset\n\nSimply remove the dataset image from the datadir, the daemon will detect it and clean up the caches and mount points.\n\n\n## License\n\nThis program is distributed under the CeCILL-C license.\n\n\n## Known issues\n\n- Older linux kernel versions do not record io stats for disk images, so caching will not work.\n- The daemon leaves inactive images in cache so long as they do not exceed the capacity allowed.\n  This reduces the available free space available for other programs.\n  It is safe to manually delete cached images but the daemon will reallocate the free space anyway.\n- Pinning a dataset to cache is not yet implemented.\n- It may take a little while until a non-cached dataset gets selected for caching, transfered and mounted.\n  During that time, data will be read from the image stored on the slow storage.\n- The daemon is currently single threaded.\n  It will hang while dataset images are being transfered.\n- Packaging is borken on ubuntu (https://github.com/pypa/installer/issues/176)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcea-list%2Fscratch_manager","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcea-list%2Fscratch_manager","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcea-list%2Fscratch_manager/lists"}