{"id":17030558,"url":"https://github.com/vsoch/forward","last_synced_at":"2025-04-12T12:12:12.481Z","repository":{"id":41249119,"uuid":"140033882","full_name":"vsoch/forward","owner":"vsoch","description":"Port Forwarding Utility","archived":false,"fork":false,"pushed_at":"2023-11-29T23:24:41.000Z","size":143,"stargazers_count":53,"open_issues_count":8,"forks_count":28,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-12T12:12:07.348Z","etag":null,"topics":["forwarding","hpc","jupyter","notebook","python","sherlock","singularity","ssh","ssh-forwarding"],"latest_commit_sha":null,"homepage":"https://vsoch.github.io/lessons/sherlock-singularity/","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vsoch.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-06T22:08:40.000Z","updated_at":"2024-10-11T06:09:36.000Z","dependencies_parsed_at":"2025-01-13T16:30:47.365Z","dependency_job_id":"6250d659-6280-409c-b28f-d5f7395adb24","html_url":"https://github.com/vsoch/forward","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fforward","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fforward/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fforward/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fforward/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vsoch","download_url":"https://codeload.github.com/vsoch/forward/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248565078,"owners_count":21125417,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["forwarding","hpc","jupyter","notebook","python","sherlock","singularity","ssh","ssh-forwarding"],"created_at":"2024-10-14T08:07:31.260Z","updated_at":"2025-04-12T12:12:12.459Z","avatar_url":"https://github.com/vsoch.png","language":"Shell","readme":"# forward\n\n## What is this?\nForward sets up an sbatch script on your cluster resource and port forwards it back to your local machine! \nUseful for jupyter notebook and tensorboard, amongst other things.\n\n - **start.sh** is intended for submitting a job and setting up ssh forwarding\n - **start-node.sh** will submit the job and give you a command to ssh to the node, without port forwarding\n\nThe folder [sbatches](sbatches) contains scripts, organized by cluster resource, that are intended\nfor use and submission. It's up to you to decide if you want a port forwarded (e.g., for a jupyter notebook)\nor just an instruction for how to connect to a running node with your application.\n\n## Tiny Tutorials\nHere we will provide some \"tiny tutorials\" to go along with helping to use the software. These are tiny because there\nare many possible use cases!\n\n - [Jupyter with GPU on Sherlock](examples/recipe-sherlock-gpu.md): A recipe for interactive computing using custom Jupyter kernels on Stanford's Sherlock.\n - [Using sherlock/py3-jupyter](https://gist.github.com/vsoch/f2034e2ff768de7eb14d42fef92cc43e) and copying notebook first from your host to use a notebook module (python 3) on the Sherlock cluster at Stanford [Version 0.0.1](https://github.com/vsoch/forward/releases/tag/0.0.1).\n - [Running an R Kernel in a Jupyter Notebook](https://vsoch.github.io/lessons/sherlock-juputer-r/)\n - [Using containershare with repo2docker-julia](https://vsoch.github.io/lessons/containershare) a repo2docker-julia Singularity container deployed on Sherlock using [Version 0.0.1](https://github.com/vsoch/forward/releases/tag/0.0.1)\n\n## Setup\nFor interested users, a few tutorials are provided on the [Research Computing Lessons](https://vsoch.github.io/lessons) site.\nBrief instructions are also documented in this README.\n\nFor farmshare - please navigate to the README located in sbatches/farmshare/README.md.  \n\n### Clone the Repository\nClone this repository to your local machine.\n\nYou will then need to create a parameter file.  To do so, follow the prompts at:\n\n```bash\nbash setup.sh\n```\n\nYou can always edit `params.sh` later to change these configuration options. \n\n#### Parameters\n\n - **RESOURCE** should refer to an identifier for your cluster resource that will be recorded in your ssh configuration, and then referenced in the scripts to interact with the resource (e.g., `ssh sherlock`).\n - **PARTITION** If you intend to use a GPU (e.g., [sbatches/py2-tensorflow.sbatch](sbatches/py2-tensorflow.sbatch) the name of the PARTITION variable should be \"gpu.\"\n - **CONTAINERSHARE** (optional) is a location on your cluster resource (typically world readable) where you might find containers (named by a hash of the container name in the [library]() that are ready to go! If you are at Stanford, leave this to be default. If you aren't, then ask your cluster admin about [setting up a containershare](https://www.github.com/vsoch/containershare)\n - **CONNECTION_WAIT_SECONDS** refers to how many seconds the start.sh script waits before setting up port forwarding. If your cluster runs slow, or is particularly busy, this can be set at 30. \n\nIf you want to modify the partition flag to have a different gpu setup (other than `--partition gpu --gres gpu:1`) then you should set this **entire** string for the partition variable.\n\n### SSH config\n\nYou will also need to at the minimum configure your ssh to recognize your cluster (e.g., sherlock) as\na valid host.  We have provided a [hosts folder](hosts)  for helper scripts that will generate\nrecommended ssh configuration snippets to put in your `~/.ssh/config` file. Based\non the name of the folder, you can intuit that the configuration depends on the cluster\nhost. Here is how you can generate this configuration for Sherlock:\n\n```bash\nbash hosts/sherlock_ssh.sh\n```\n```\nHost sherlock\n    User put_your_username_here\n    Hostname sh-ln01.stanford.edu\n    GSSAPIDelegateCredentials yes\n    GSSAPIAuthentication yes\n    ControlMaster auto\n    ControlPersist yes\n    ControlPath ~/.ssh/%l%r@%h:%p\n```\n\nUsing these options can reduce the number of times you need to authenticate. If you\ndon't have a file in the location `~/.ssh/config` then you can generate it programatically:\n\n```bash\nbash hosts/sherlock_ssh.sh \u003e\u003e ~/.ssh/config\n```\n\nDo not run this command if there is content in the file that you might overwrite! \nOne downside is that you will be foregoing sherlock's load\nbalancing since you need to be connecting to the same login machine at each\nstep.\n\n### SSH Port Forwarding Considerations\n\nDepending on your cluster, you will need to identify whether the compute nodes (not the login nodes) are isolated from the outside world or not (i.e can be ssh'd into directly). For Sherlock, they are isolated. For FarmShare they are not. This is important when we are setting up the ssh command to port forward from the local machine to the compute node. \n\nFor HPC's where the compute node is isolated from the outside world (as is the case with sherlock), the ssh command basically establishes a tunnel to the login node, and then from the login node establishes another tunnel to the compute node. \nIn this case we write a command where we port forward to the login node, and then the compute node, which is accessible from the login node. The entire command might look like this:\n\n```bash\n$ ssh -L $PORT:localhost:$PORT ${RESOURCE} ssh -L $PORT:localhost:$PORT -N \"$MACHINE\" \u0026\n```\n\nIn the command above, the first half is executed on the local machine `ssh -L $PORT:localhost:$PORT ${RESOURCE}`, which establishes a port forwarding to the login node. The next line `ssh -L $PORT:localhost:$PORT -N \"$MACHINE\" \u0026` is run from the login node, and port forwards it to the compute node, since you can only access the compute node from the login nodes.\n\n\nFor HPC's where the compute node is not isolated from the outside world (as is the case with Farmshare) the ssh command for port forwarding first establishes a connection the login node, but then continues to pass on the login credentials to the compute node to establish a tunnel between the localhost and the port on the compute node. \nThe ssh command in this case utilizes the flag `-K` which forwards the login credentials to the compute node:\n```bash\n$ ssh \"$DOMAINNAME\" -l $FORWARD_USERNAME -K -L  $PORT:$MACHINE:$PORT -N  \u0026\n```\nThe drawback of this method is that when the start.sh script is run, you will have to authenticate twice (once at the beginning to check if a job is running on the HPC, and when the port forwarding is setup). This is the case for FarmShare. \n\nIn the setup.sh file, we have added an option `$ISOLATECOMPUTENODE`, which is a boolean operator. For users of FarmShare, and Sherlock, this value is set automatically. For your own default cluster, you will be prompted whether the compute node is isolated or not, please write true or false (case sensitive) for your resource depending on its properties. You may have to consult the documentation or ask the HPC manager. \n\n\n# Notebooks\n\nNotebooks have associated sbatch scripts that are intended to start a jupyter (or similar)\nnotebook, and then forward the port back to your machine. If you just want to submit a job,\n(without port forwarding) see [the job submission](#job-submission) section. For \nnotebook job submission, you will want to use the [start.sh](start.sh) script.\n\n## Notebook password\n\nIf you have not set up notebook authentication before, you will need to set a\npassword via `jupyter notebook password` on your cluster resource.  \nMake sure to pick a secure password!\n\n\n# Job Submission\nJob submission can mean executing a command to a container, running a container, or \nwriting your own sbatch script (and submitting from your local machine). For \nstandard job submission, you will want to use the [start-node.sh](start-node.sh) script.\nIf your cluster has a containershare, you can use the `containershare-notebook`\nset of scripts to have a faster deployment (without needing to pull).\n\n## Usage\n\n```bash\n# Choose a containershare notebook, and launch it! On Sherlock, the containers are already in the share\nbash start.sh sherlock/containershare-notebook docker://vanessa/repo2docker-julia\n\n# Run a Singularity container that already exists on your resource (recommended)\nbash start-node.sh singularity-run /scratch/users/vsochat/share/pytorch-dev.simg\n\n# Execute a custom command to the same Singularity container\nbash start-node.sh singularity-exec /scratch/users/vsochat/share/pytorch-dev.simg echo \"Hello World\"\n\n# Run a Singularity container from a url, `docker://ubuntu`\nbash start-node.sh singularity-run docker://ubuntu\n\n# Execute a custom command to the same container\nbash start-node.sh singularity-exec docker://ubuntu echo \"Hello World\"\n\n# Execute your own custom sbatch script\ncp myscript.job sbatches/\nbash start-node.sh myscript\n```\n\nAs a service for Stanford users, @vsoch provides a [containershare](https://vsoch.github.io/containershare)\nof ready to go containers to use on Sherlock! The majority of these deploy interactive notebooks, \nhowever can also be run without (use start-node.sh instead of [start.sh](start.sh)). If you\nwant to build your own container for containershare (or request a container) see the\n[README](https://www.github.com/vsoch/containershare) in the repository that serves it.\n\n```bash\n# Run a containershare container with a notebook\nbash start.sh sherlock/containershare-notebook docker://vanessa/repo2docker-julia\n```\n\nIf you would like to request a custom notebook, please [reach out](https://www.github.com/vsoch/containershare/issues).\n\n## Usage\n\n\n```bash\n# To start a jupyter notebook in a specific directory ON the cluster resource\nbash start.sh jupyter \u003ccluster-dir\u003e\n\n# If you don't specify a path on the cluster, it defaults to your ${SCRATCH}\nbash start.sh jupyter /scratch/users/\u003cusername\u003e\n\n# To start a jupyter notebook with tensorflow in a specific directory\nbash start.sh py2-tensorflow \u003ccluster-dir\u003e\n\n# If you want a GPU node, make sure your partition is set to \"gpu.\"\n# To start a jupyter notebook (via a Singularity container!) in a specific directory\nbash start.sh singularity-jupyter \u003ccluster-dir\u003e\n```\n\nWant to create your own Singularity jupyter container? Use [repo2docker](https://www.github.com/jupyter/repo2docker) and then specify the container URI at the end.\n\n```bash\nbash start.sh singularity.jupyter \u003ccluster-dir\u003e \u003ccontainer\u003e\n\n# You can also run a general singularity container!\nbash start.sh singularity \u003ccluster-dir\u003e \u003ccontainer\u003e\n\n# To start tensorboard in a specific directory (careful here and not recommended, as is not password protected)\nbash start.sh start \u003ccluster-dir\u003e\n\n# To stop the running jupyter notebook server\nbash end.sh jupyter\n```\n\nIf the sbatch job is still running, but your port forwarding stopped (e.g. if\nyour computer went to sleep), you can resume with:\n\n```bash\nbash resume.sh jupyter`\n```\n\n# Debugging\n\nAlong with some good debugging notes [here](https://vsoch.github.io/lessons/jupyter-tensorflow#debugging), common errors are below.\n\n### Connection refused after start.sh finished\n\nSometimes you can get connection refused messages after the script has started\nup.  Just wait up to a minute and then refresh the opened web page, and this\nshould fix the issue.\n\n### Terminal Hangs when after start.sh\n\nSometimes when you have changes in your network, you would need to reauthenticate.\nIn the same way you might get a login issue here, usually opening a new shell resolves \nthe hangup.\n\n### Terminal Hangs on \"== Checking for previous notebook ==\"\n\nThis is the same bug as above - this command specifically is capturing output into\na variable, so if it hangs longer than 5-10 seconds, it's likely hit the password \nprompt and would hang indefinitely. If you issue a standard command that will\nre-prompt for your password in the terminal session, you should fix the issue.\n\n```bash\n$ ssh sherlock pwd\n```\n\n### slurm_load_jobs error: Socket timed out on send/recv operation \n\n[This error](https://www.rc.fas.harvard.edu/resources/faq/slurm-errors-socket-timed-out) is basically\nsaying something to the effect of \"slurm is busy, try again later.\" It's not an issue with submitting\nthe job, but rather a ping to slurm to perform the check. In the case that the next ping continues, you should be ok. However, if the script is terminate, while you can't control the \"busyness\" of slurm, you **can**\ncontrol how likely it is to be allocated a node, or the frequency of checking. Thus, you can do either of the\nfollowing to mitigate this issue:\n\n**choose a partition that is more readily available**\n\nIn your params.sh file, choose a partition that is likely to be allocated sooner, thus reducing the \nqueries to slurm, and the chance of the error.\n\n**offset the checks by changing the timeout between attempts**\n\nThe script looks for an exported variable, `TIMEOUT` and sets it to be 1 (1 second) if\nnot defined. Thus, to change the timeout, you can export this variable:\n\n```bash\nexport TIMEOUT=3\n```\n\nWhile the forward tool cannot control the busyness of slurm, these two strategies should help a bit.\n\n### mux_client_forward: forwarding request failed: Port forwarding failed \n\n\nSimilarly, if your cluster is slow, you may get this error after \"== Setting up port forwarding ==\". To fix this, increase your CONNECTION_WAIT_SECONDS. \n\n### I ended a script, but can't start\n\nAs you would kill a job on Sherlock and see some delay for the node to come down, the\nsame can be try here! Try waiting 20-30 seconds to give the node time to exit, and try again.\n\n\n## How do I contribute?\n\nFirst, please read the [contributing docs](CONTRIBUTING.md). Generally, you will want to:\n\n - fork the repository to your username\n - clone your fork\n - checkout a new branch for your feature, commit and push\n - add your name to the CONTRIBUTORS.md\n - issue a pull request!\n\n## Adding new sbatch scripts\n\nYou can add more sbatch scripts by putting them in the sbatches directory.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsoch%2Fforward","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvsoch%2Fforward","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsoch%2Fforward/lists"}