{"id":19402779,"url":"https://github.com/cscfi/hpc-container-wrapper","last_synced_at":"2026-04-09T08:21:05.284Z","repository":{"id":39887078,"uuid":"438591678","full_name":"CSCfi/hpc-container-wrapper","owner":"CSCfi","description":"Tool to wrap installations into a container designed for use on HPC systems","archived":false,"fork":false,"pushed_at":"2023-10-12T09:46:27.000Z","size":170,"stargazers_count":20,"open_issues_count":4,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2023-10-13T00:51:00.022Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CSCfi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-15T10:42:49.000Z","updated_at":"2024-05-30T08:17:32.292Z","dependencies_parsed_at":"2024-05-30T08:17:31.450Z","dependency_job_id":"fb930fe9-a55e-44d1-8358-d90ab674f6f6","html_url":"https://github.com/CSCfi/hpc-container-wrapper","commit_stats":null,"previous_names":[],"tags_count":11,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CSCfi%2Fhpc-container-wrapper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CSCfi%2Fhpc-container-wrapper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CSCfi%2Fhpc-container-wrapper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CSCfi%2Fhpc-container-wrapper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CSCfi","download_url":"https://codeload.github.com/CSCfi/hpc-container-wrapper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223945428,"owners_count":17229626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T11:25:48.595Z","updated_at":"2026-04-09T08:21:05.276Z","avatar_url":"https://github.com/CSCfi.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tykky\n\n\nTykky is a set of tools which wrap installations inside \nan Apptainer/Singularity container to improve startup times, \nreduce IO load, and lessen the number of files on large parallel filesystems.\n\n\n\n## Intro\n\nThis is a tool to create installations using existing containers.\nThe basic idea is to install software through a container,\nconvert this into a filesystem image and mount this filesystem image\nwhen running the container. \n\nThe main goal is to reduce the number of files on disk,\nand reduce the IO load when installations are started. If you \nare not running on a parallel filesystem with a lot of users and load,\nthe points might not be that relevant. Only tested and developed\non Lustre so benefits might be different on other parallel filesystems\n\nThe tool originally started as a way to package conda \ninstallations using container, as they cause a significant load on the filesystem.\nThe idea being that using the tool should be very simple\nand as similar as possible to an un-containerized installation (drop in replacement for the majority of cases). \nThis means that we try to hide the container as much as possible \nfrom the end-user. \n\nIt's singularity based, but nothing inherently prohibits usage\nof some other runtime (granted singularity is quite hardcoded atm ).\nAll which is needed is the ability to mount filesystem images and control  over bind mounts\n\n### Design choices    \n\nContainers are used for two main things:\n- Automatic mounting and demounting of filesystem-image \n- Per process private  (mount) namespace \n\nFrom the point of view of the parallel filesystem, the image\njust looks like one single file -\u003e much less load on the parallel filesystem.\n(I'm not a Lustre expert so I don't know if it's more the OST,OSS or MDT being saved)\nThe image could be mounted using other tools, but then we would have to keep \ntrack of unmounting it all kinds of error handling -\u003e things we get for free using a container. The private namespaces means that we don't have to worry about \nconflicts between multiple users, finding folders where to mount or breaking\nsoftware which does not want to be moved. \n\nExisting containers are used as this provides an easier way \nto interface with the host software environment and a user does not have to\nhave singularity build access on the HPC machine ( user namespaces might be temporarily disabled on a system due to security  reasons ). \n\nThe tool generates a lot of wrappers with some relatively nasty tricks. This\nis so that most things which should work without a container works within the container\nand the installation looks like a normal installation to the end-user \n\nThe tool is also explicitly meant to allow intertwining with the host\nsoftware environment. For this there are two basic modes of operation:\n\n1. Mount everything from the host\n - All host paths will be mounted, used when it makes sense and compatibility can be assumend\n - Internally there are some additional variables so exclude paths \n - Note! due to this the default container should always be the same as the host system. \n2. Mount specified defaults\n - defaults set in config\n\n\nCurrent version of the tools is written using bash  + python. \nAt some point it could be worthwile to rewrite the whole thing\nin some language which can be statically compiled for maximum robustness\ne.g GO/rust/C++ or whatever. \n\n### Limitations\n\nWhat things break / work differently when compared to a normal installation\n\n- ssh commands will drop you out of the container, there is a fix for this, but then\nsome pre commands have to be run to start any required ssh services \n- you can't start other containers (singularity can not be nested), Ugly hack is\nto ssh to `localhost`, but that makes environment management tricky and requires sshd to be running on the current node. \n- Resolving binary paths will result in paths which do not exist outside the container. \nAs the image is mounted on a directory which is not present on the host. Bind mounts\nare always applied after the image mount which means an image mount can not mask\na directory on disk, without us dropping the whole preceeding path from the mount point list. \nThere is a fix for this as a PR for apptainer (as of 1.2.2022) but we do not rely on this\nas the future is unclear if singularityCE or apptainer will become dominant.\nThe workaround is to mount all directories on the same level as each component\nof the image mount path -\u003e possibly very expensive -\u003e not done automatically. \n- A bit untested, but running one container per core when you have 128 of them\ncan lead to the compute node feeling a bit unwell. \n\n\n## Basic program structure\n\nStarting from command invocation\nUsers will use commands under `bin`\n\n1. `bin` files are symlinks to `frontends/containerize` \n2. `containerize` is the main script which runs all the steps and is responsible for cleanup  \n3. Based on the used symlink different a corresponding python script is going to be called from `frontends`\n4. This frontend parses the user input and sets a lot of tool specific defaults. \n    - The python interpreter used is hardcoded during the tools installation.\n    - A user config is created\n5. The user config and the default config are both passed to `construct.py` \n    - The default config has been defined during installation  \n    - Some values are overridable other are set.\n    - The construct will produce a `_vars.sh` file which will be sources by subsequent steps\n    - Current handling of environment variables in config is not standardized. `pre_install`,`post_install` and `extra_envs` will not be expanded in any way. `build_tmpdir_base` will be expanded and checked to be a valid directory during config construction. The rest will be expanded using `os.path.expandvars` note that this will leave unset variables as is.\n    - All non-special variables from the yaml will be uppercased, prefixed with `CW_` and dumped to the `_vars.sh` arrays are turned to bash arrays.  \n6. `containersize` makes sure that the installation dir exist.\n7. **pre.sh** Fetch container either by downloading or copying from disk. When modifying installations, will also copy the squashfs image\n8. **create_inst** Run installation script through the container based on some template in `templates`, after which the installation is compressed into a squashfs image. \n    - The installation can be isolated or mount the complete host filesystem. \n    - When modifying an existing installation, the whole installation has to be copied wich might take a while\n9. **generate_wrappers** (This is where most tricks live) Generate wrappers for the installation so that they can be used as normal installations. The wrappers:\n    - Defines common variables such as image name, container name  \n    - Defines runtime bind mounts\n    - Unset singularity envs if not actually inside a container (e.g srun called from container -\u003e some SINGULARITY are still active) \n    - Extra symlink layer in `_bin` to tricks the likes of dask to generate valid executable paths\n    - Copy venv definition when wrapping python venv\n    - Activates conda if a conda env is wrapped\n    - set executable argv0 to enable python virtual envs. \n10. **post.sh**\n    - Copy build files to final installatio file\n    - Save used build files to \u003cinstall_dir\u003e/share\n\n\n**NOTE** \n\nPermissions for installation files \nare determined by the current umask and permissions for the \ntarget installation folder! Files inside the the squashfs are set to\nworld readable to allow copying when installations are updated by other\nthat the original creator. Limit access by setting correct permissions on\non the sqfs image itself. This behavior can be disabled by setting `CW_NO_FIX_PERM`\n\n**BASH MAGIC**\n\n```\n/usr/bin/singularity --silent exec -B $DIR/../$SQFS_IMAGE:$INSTALLATION_PATH:image-src=/ $DIR/../$CONTAINER_IMAGE  bash -c \"eval \\\"\\$(/CSC_CONTAINER/miniforge/bin/conda shell.bash hook )\\\"  \u0026\u0026 conda activate env1 \u0026\u003e/dev/null \u0026\u0026  exec -a $_O_SOURCE $DIR/python $( test $# -eq 0 || printf \" %q\" \"$@\" )\"\n```\n\n- `exec -a` is there to enable creating virtual environments as otherwise the venv symlink -\u003e tykky wrapper -\u003e executable in container chain is broken and python does not consider itself to be a venv. \n- `bash -c` is required when conda is involved as we have to initialize the environment for some commands to work properly. Also partially due to exec being a builtin instead of a binary. \n- `$( test $# -eq 0 || printf \" %q\" \"$@\" )` without printf special characters will not be escaped correctly and the command will error out. printf will print an empty string '' if no arguments so we need to be quiet in that case.\n\n\n## Implemented Frontends\n\n- `conda-containerize`\n - Wrap new conda installation or edit existing\n - requires a conda YML file as input\n- `pip-containerize`\n - Wrap new venv installation or edit existing\n - Will by default use currently available python\n - Option to use uv to manage python and venv\n - Option to also use slim container image  (will then not mount full host)\n- `wrap-container`\n - Generate wrappers for existing container. Mainly\n so that applications in existing containers can be used \"almost\" as a normal installation.\n - Full host will not be mounted\n- `wrap-install`\n - Wrap an installation on disk to a container.\n - Useful for containerizing existing installations which can not be re-installed\n - Option to mount in exact place, so that external and internal paths are identical. This\n will however require dropping the top parent path mount so only works when no dependencies required. E.g\n`wrap-install --mask -w bin /appl/soft/prog --prefix Dir` will not mount `/appl` at all.\nUnderstand the implications of this before using this frontend. For manual workaround\nand more explanation see [limitations section](#limitations).\n\n\nAll tools support `-h/--help` for displaying info\nsome have subcommands. \n\n## Examples\n\n- `conda-containerize new --prefix /path/to_install conda_env.yaml`\n    - Where `conda_env.yaml`\n    ```\n    channels:\n      - conda-forge\n    dependencies:\n      - numpy\n    ```\n\n- `conda-containerize update --post-install post.sh /path/to_install`\n    - Where `post.sh`\n    ```\n    conda install scipy  --channel conda-forge\n    pip install pyyaml \n    ```\n- `pip-containerize new --prefix /path/to_install req.txt`\n   - Where `req.txt`\n   ```\n   numpy \n   ```\n\n- `wrap-container --wrapper-paths /opt/prog/bin --prefix /path/to_install /path/to/container` \n- `wrap-install --wrapper-paths bin --mask --prefix /path/to/install /program/on/disk`\n\n\n## Installation\n\nPreferably use system python + pip\nand run install.sh with the desired config as argument.\nAvailable configs are in `configs` folder\n\n```\nbash install.sh \u003cconfig\u003e\n```\n\nThis will symlink the config to `default_config`,\ninstall pyyaml locally in the repository and\nhardcode the used python interpreter. This\nis so that the tool can be used to construct environments\nwhich use a completely different python. \n\nIf you are not satisfied with the config you can simply edit \nthe config under `default_config` or supply \na custom config file path using `CW_GLOBAL_YAML`. \n\n## Special vars\n\nThese can be set before starting the tool\n\n`CW_GLOBAL_YAML`\npath to config to use\n\n`CW_ENABLE_CONDARC`\nenable user condarc during installation\n\n`CW_DEBUG_KEEP_FILES`\nDon't delete build files when failing. \n\n`CW_FORCE_CONDA_ACTIVATE`\nForce conda to be activated even if called through a virtual environment. \n\n`CW_LOG_LEVEL`\nHow verbosely to report program actions\n\n- 0 only error\n- 1 only warnings\n- 2 general (default)\n- `\u003e2` debug\n\n## Activating and Deactivating wrapped environments\nA `tykky` convenience shell function is provided to mimic the conda activate and\ndeactivate helpers. To use it, you must load those functions into your shell with:\n\n`source etc/profile.d/tykky`\n\nThen you may activate a tykky installation with\n\n`tykky activate \u003cenv_dir\u003e`\n\nIf you define the environment variable `TYKKY_PATH`, environments may also be\nfound by name inside that colon-separated list of paths in `TYKKY_PATH`.\n\nYou may deactivate the tykky environment in your shell with:\n\n`tykky deactivate`\n\nAlternatively, you may also manually add `\u003cenv_dir\u003e/bin` to your `$PATH`.\n\n## License \n\nGetting errors like `CondaToSNonInteractiveError: Terms of Service have not been accepted for the following channels. Please accept or remove them before proceeding:`\n\nNewer versions of conda have finer management of channels licenses, which might stop the installatoion if using miniconda (currently tykky versions \u003c 0.4), set `CONDA_PLUGINS_AUTO_ACCEPT_TOS=true` to mitigate this ( https://www.anaconda.com/docs/getting-started/tos-plugin#docker-in-ci%2Fcd-environments ). Understand that this means accepting the channel TOS, make sure you have read and understood them and what they imply for your use case and organization. \n\n## Misc features ideas\n\n- Keep container name based on src name\n\n## Notes (This is most likely no longer the case for both apptainer and singularity but needs validation)\n`SINGULARITY_BIND` handled after `-B`\nordering within both matter! -\u003e nested bind mounts possible.\nNote that while loop devices can be mounted on bind mounts,\nany extra bind mounts will be applied after extra loop device (image mounts) \nso to mask dirs on disk with an image mount, the path can not be bind mounted.\n(exception is a default $HOME mount, which is applied before loop device mounts).\nNote that this has been fixed in apptainer and you can now apply image mounts over bind mounts. \n\n\n## Convoluted path modifications in py scripts\n\nThe idea is that everything works even if a completely different python\nenvironment is active, we also avoid having any extra envs set while parsing\nthe conf to allow for very \"creative\" usages of the tool\n\n## wrap-install\n\nTechnically updating masked disk installations\nis not an issue, but let's not do that until there is a specific\nrequest. The tool now drops the full path leading to the target\nfrom the bind list, if more binds are needed a yaml input needs to be constructed. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcscfi%2Fhpc-container-wrapper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcscfi%2Fhpc-container-wrapper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcscfi%2Fhpc-container-wrapper/lists"}