{"id":20566164,"url":"https://github.com/aces/sing-squashfs-support","last_synced_at":"2025-06-30T06:34:10.810Z","repository":{"id":48095796,"uuid":"224311670","full_name":"aces/sing-squashfs-support","owner":"aces","description":"Code and information about using squashfs with Singularity","archived":false,"fork":false,"pushed_at":"2023-10-11T18:26:44.000Z","size":1337,"stargazers_count":9,"open_issues_count":1,"forks_count":4,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-14T15:50:53.052Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aces.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-27T00:32:29.000Z","updated_at":"2024-11-20T15:45:33.000Z","dependencies_parsed_at":"2023-10-12T00:55:29.093Z","dependency_job_id":null,"html_url":"https://github.com/aces/sing-squashfs-support","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aces/sing-squashfs-support","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aces%2Fsing-squashfs-support","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aces%2Fsing-squashfs-support/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aces%2Fsing-squashfs-support/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aces%2Fsing-squashfs-support/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aces","download_url":"https://codeload.github.com/aces/sing-squashfs-support/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aces%2Fsing-squashfs-support/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262724239,"owners_count":23354206,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T04:40:17.856Z","updated_at":"2025-06-30T06:34:10.750Z","avatar_url":"https://github.com/aces.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Important note, October 2023!\n\nAlthough still valid, the utilities and documentation in this\nrepository have been updated to refer to `Apptainer` in a brand new\nrepository, [apptainer-squashfs-support](https://github.com/aces/apptainer-squashfs-support).\n\n# SquashFS through Singularity : Hints And Tips\n\nThis repository contains code, examples, hints and other documentation\nrelated to using singularity containers as access methods to overlay\nfiles (typically, squashfs files).\n\nThe data organization addressed here generally consists of\n\n1. Having one or several `.squashfs` files that contain the data;\n2. Having a singularity container image with specific capabilities (`rsync`, `openssh` etc);\n3. Combining the two, along with other scripts, to build a system that can seemlessly access the data files.\n\n## What this repo contains\n\n* The directory `build_data` contains code and instructions to make squashfs files;\n* The directory `build_simg` contains code and instructions to build a singularity container;\n* The directory `bin` contain some utility scripts (e.g. `sing_sftpd`);\n* The directory `doc_examples` contains sample README files to install with your data, for helping users access it;\n* The directory `images` contains a PDF of technical diagrams, and its source in OmniGraffle format;\n* The rest of this README here contains hints and code snippets on accessing the data files.\n\n## Accessing the data files\n\nFor the examples below, let's assume we have a data distribution\ndirectory called `/data/HCPsquash` containing two SquashFS filesystem\nfiles, and a singularity container image file:\n\n```bash\nunix% ls -l /data/HCPsquash\ntotal 83068518941\n-rw-r--r-- 1 prioux rpp-aevans-ab 1508677619712 Aug  1 16:13 hcp1200-00-100206-103414.squashfs\n-rw-r--r-- 1 prioux rpp-aevans-ab 1532533477376 Aug  1 20:33 hcp1200-01-103515-108020.squashfs\n-rwxr-xr-x 1 prioux rpp-aevans-ab     147062784 Dec  3 16:51 sing_squashfs.simg\n```\n\n(This example is taken as a subset of a real dataset, and more\ninformation about it can be found by reading the file\n[README.txt](doc_examples/hcp_1200_README.txt) that was provided to its\nusers)\n\nThe two squashfs files store a bunch of data files inside them under\nthe root path `/HCP_1200_data`. The first file contains 20\nsubdirectories named `100206` ... `103414`, and the second file\ncontains 20 subdirectories named `103515` ... `108020`.\n\n**Important note:** Singularity versions 3.5.0 to 3.5.2 are known to\nrequire a suffix consisting of the three characters `:ro` after the\nnames of the overlays; the commands below would, for instance,\nrequire all overlay options to be in the form of `--overlay=abc.squashfs:ro` .\n\n### a) Connecting interactively (low-level, directly)\n\nThis will allow you to have a look at the files, with only the first\nsquashfs file mounted:\n\n```bash\ncd /data/HCPsquash\nsingularity shell --overlay=hcp1200-00-100206-103414.squashfs sing_squashfs.simg\n```\n\nYou can then `cd /HCP_1200_data` and `ls` the files. Use `exit` to\nexit the container!\n\nTo get both squashfs files:\n\n```bash\nsingularity shell --overlay=hcp1200-00-100206-103414.squashfs --overlay=hcp1200-01-103515-108020.squashfs sing_squashfs.simg\n```\n\nNow you can notice that the content of `/HCP_1200_data` has 40\nsubdirectories instead of just 20.\n\nTo connect with all .squashfs file, no matter how many:\n\n```bash\nsingularity shell $(ls -1 | grep '\\.squashfs$' | sed -e 's/^/--overlay /') sing_squashfs.simg\n```\n\nTo disable the messages about the squashfs not being a writable\nfilesystem, use the `-s` option of singularity:\n\n```bash\nsingularity -s shell ...\n```\n\n### b) Running a command (low-level, directly)\n\nThis is just like in a) above, but instead of running `singularity shell`\nwe run `singularity exec`:\n\n```bash\nsingularity -s exec --overlay=hcp1200-00-100206-103414.squashfs sing_squashfs.simg ls -l /HCP_1200_data\n```\n\n### c) Running a command or a shell (with utility wrapper)\n\nIn the [bin](bin) directory of this repo, you will find a set of\nutility wrapper scripts. In fact, it's a single script with multiple\nnames. It has many features and options allowing you to choose which\nsquashfs files to access and which singularity image file to run,\nbut the simplest use scenario is to copy the one called `sing_command_here`\ninto the same directory `/data/HCPsquash` as the squashfs files and\nsingularity image:\n\n```\nunix% ls -l /data/HCPsquash\ntotal 83068518941\n-rw-r--r-- 1 prioux rpp-aevans-ab 1508677619712 Aug  1 16:13 hcp1200-00-100206-103414.squashfs\n-rw-r--r-- 1 prioux rpp-aevans-ab 1532533477376 Aug  1 20:33 hcp1200-01-103515-108020.squashfs\n-rwxr-xr-x 3 prioux rpp-aevans-ab          7542 Dec  3 16:57 sing_command_here\n-rwxr-xr-x 1 prioux rpp-aevans-ab     147062784 Dec  3 16:51 sing_squashfs.simg\n```\n\nWhen invoked, it will automatically detect those files around it,\nand run a `singularity exec` command with all the appropriate\noverlays. Now you can run the same command as in example b) above,\nbut in a simpler way:\n\n```bash\n# Run on all squashfs files:\n./sing_command_here ls -l /HCP_1200_data\n\n# Run on just one squashfs file:\n./sing_command_here -O hcp1200-00-100206-103414.squashfs ls -l /HCP_1200_data\n\n# Connect interactively:\n./sing_shell_here -O hcp1200-00-100206-103414.squashfs # with one data file\n./sing_shell_here                                      # with all files\n```\n\n### d) Mounting the data files using sshfs\n\nRunning programs from within the container is the most efficient\noption for accessing the data files. If that option is not available,\nor if the data files need to be accessed remotely, then it is also\npossible to mount the data directory using sshfs, from elsewhere.\n\nNote that mounting the data with sshfs impose a significant performance\npenalty, as encryption and decryption of the SFTP traffic (which\ntransports the mountpoint's data files) will occur at all times.\n\nThe first thing to realize is that we can't simply use a normal\nsshfs mount command, because the container is not initially started.\nEven if the container is started, it:\n\n* doesn't run a sshd deamon;\n* it is not even addressable with a network address.\n\nIn [Diagram #2](images/diag_02_SSHFS.png) we can see that a normal\nsshfs mount results in the program `sftp-server` to be launched on\nthe remote site. This program is connected through its stdin and\nstdout channels to the FUSE client on the local site. Filesystem\noperations within the mount point (open, read, seek etc) are\ntranslated into SFTP operations sent through the ssh connection to\nthat remote `sftp-server` program.\n\nWhat we need to do is tell the sshfs mounter to launch, on the\nremote site, its client `sftp-server` **inside** the container. It will\nrun as a standalone normal process, even though it will still talk\nto its launcher sshd program through stdin and stdout. A very\ncomplicated way of doing this would be to mount the filesystem with:\n\n```bash\n# Complicated example; do NOT DO this!\n\n# Moutpoint: an empty dir\nmkdir mymountpoint\n\n# See how complicated the sftp_server command is, and we're\n# using just ONE of the overlays too!\nsshfs -o sftp_server=\"singularity -s exec --overlay=/data/HCPsquash/hcp1200-00-100206-103414.squashfs /data/HCPsquash/sing_squashfs.simg /usr/libexec/openssh/sftp-server\" user@computer2:/HCP_1200_data mymountpoint\n\nls mymountpoint\nfusermount -u mymountpoint\n```\n\nThe sshfs option `-o sftp_server=` is rare and unusal. It is not\nnormally required with scp or sshfs, as there are no real alternative\ncompatible sftp servers other than the one that comes with the\nOpenSSH package.\n\nA better way of performing the same thing without having to provide\na long command to the `-o sftp_server=` option is to first pack\nthat long command into a separate bash script. Let's call it\n`example1.sh` :\n\n```bash\n#!/bin/bash\n\n# Content of example1.sh\n\nsingularity -s exec \\\n  --overlay=/data/HCPsquash/hcp1200-00-100206-103414.squashfs \\\n  --overlay=/data/HCPsquash/hcp1200-01-103515-108020.squashfs \\\n  /data/HCPsquash/sing_squashfs.simg                          \\\n  /usr/libexec/openssh/sftp-server\n```\n\nThen the mount command becomes a much simpler:\n\n```bash\nsshfs -o sftp_server=\"/path/to/example1.sh\" user@computer2:/HCP_1200_data mymountpoint\n```\n\nWe can have another look at this solution in [Diagram #3](images/diag_03_SSHFS_Setup.png)\nand [Diagram #4](images/diag_04_SSHFS_SINGSFTPD.png).\n\nThis will work fine as long as the content of `example1.sh` is\nupdated appropriately whenever the singularity container is changed,\nor the set of overlays are changed.\n\nA better solution would be to create a new shell wrapper that\nworks like `example1.sh` but in a more generic way. The [bin](bin)\ndirectory in this repo contains such programs, `sing_sftpd` and\n`sing_sftpd_here`.  They come with full documentation, just run\nthem with the `-h` option. But in essence, installing `sing_sftpd_here`\nin the `/data/HCPsquash` directory will make it automatically detect\nall the `.squashfs` and the `.simg` file there, and allow you to\nmount the data files with:\n\n```bash\nsshfs -o sftp_server=\"/data/HCPsquash/sing_sftpd_here\" user@computer2:/HCP_1200_data mymountpoint\n```\n\n### e) Copying the data files using scp\n\nJust like for sshfs above, it is possible to run the scp command's\nserver-side program with an alternative SFTP server. The option\nis in fact exactly the same:\n\n```bash\nscp -o sftp_server=\"singularity -s exec --overlay=/data/HCPsquash/hcp1200-00-100206-103414.squashfs /data/HCPsquash/sing_squashfs.simg /usr/libexec/openssh/sftp-server\" user@computer2:/HCP_1200_data/remote_file.txt localfile.txt\n```\n\nor more simply using the same type of wrapper described for sshfs:\n\n```bash\nscp -o sftp_server=\"/data/HCPsquash/sing_sftpd_here\" user@computer2:/HCP_1200_data/remote_file.txt localfile.txt\n```\n\n### f) Extracting data using rsync\n\nJust like for sshfs in section d) above, we can't simply rsync\nthe data files out of the `.squashfs` files from the outside if the\nrsync program runs on the host where these squashfs files reside.\n[Diagram #5](images/diag_05_RSYNC.png) shows the architecture of a\nstandard rsync session. The rsync program running on computer2 would\nbe outside of a proper container.\n\nBut just like for sshfs, we can also fix that. The rsync program\nsupport an option `--rsync-path=/abc/def/prog` and so if we provide\nsome `/abc/def/prog` that acts like a rsync program, the architecture\nis respected. The way to do that is once again to create a bash\nwrapper `example2.sh`:\n\n```bash\n#!/bin/bash\n\n# Content of example2.sh\n\nsingularity -s exec \\\n  --overlay=/data/HCPsquash/hcp1200-00-100206-103414.squashfs \\\n  --overlay=/data/HCPsquash/hcp1200-01-103515-108020.squashfs \\\n  /data/HCPsquash/sing_squashfs.simg                          \\\n  rsync \"$@\"\n```\n\nIt is now possible to rsync data out of the `.squashfs` files in\nthis way:\n\n```bash\nrsync -a --rsync-path=\"/path/to/example2.sh\" user@computer2:/HCP_1200_data/123456 ./123456_copy\n```\n\nThis solution is shown in [Diagram #6](images/diag_06_RSYNC_Setup.png)\nand [Diagram #7](images/diag_07_RSYNC_SINGRSYNC.png).\n\nAgain, a more general solution is provided in the [bin](bin) directory\nof this repo, where you can find two utilities named `sing_rsync`\nand `sing_rsync_here`. These can be deployed alongside the `.squashfs`\nfilesystem files and the singularity image to make the process of\nrecognizing them and booting the singularity container transparent.\nRunning `sing_rsync` with the `-h` option will provide more\ninformation about these utilties.\n\n## Other tricks and tips\n\n### Writable overlay\n\nFiles in `.squashfs` format encode *read-only* filesystems. They\nare perfect for large static datasets, as they are very fast and\nreduce tremendously the inode requirements on the host filesystem.\n\nWhile a process is running inside a singularity container, that\nprocess can write files only on externally mounted writable\nfilesystems; singularity normally provides /tmp and the $HOME\ndirectory of the user who runs the singularity command. Other mount\npoint can be provided by adding explicit `-B` options to the singularity\ncommand line too.\n\nThe utility programs included in [bin](bin) will launch singularity\ncontainers with not only all the `.squashfs` that they can find,\nbut also any file with a `.ext3` extension. These can be built as\nformatted EXT3 filesystem and singularity will make them writable.\nFor more information about building such files, consult the repo\nfor the utility [withoverlay](https://github.com/prioux/withoverlay).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faces%2Fsing-squashfs-support","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faces%2Fsing-squashfs-support","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faces%2Fsing-squashfs-support/lists"}