{"id":20153485,"url":"https://github.com/nextflow-io/nf-hack17-tutorial","last_synced_at":"2025-05-06T22:31:06.881Z","repository":{"id":68691277,"uuid":"100017737","full_name":"nextflow-io/nf-hack17-tutorial","owner":"nextflow-io","description":"Nextflow basic tutorial for newbie users ","archived":true,"fork":false,"pushed_at":"2018-06-17T14:38:59.000Z","size":449,"stargazers_count":33,"open_issues_count":0,"forks_count":18,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-07T07:11:57.137Z","etag":null,"topics":["bioinformatics","docker","genomics","nextflow","singularity","tutorial"],"latest_commit_sha":null,"homepage":"","language":"Nextflow","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nextflow-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-11T09:50:06.000Z","updated_at":"2025-02-13T16:51:26.000Z","dependencies_parsed_at":null,"dependency_job_id":"67e3ac91-1274-4d75-85e1-726197835b05","html_url":"https://github.com/nextflow-io/nf-hack17-tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nextflow-io%2Fnf-hack17-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nextflow-io%2Fnf-hack17-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nextflow-io%2Fnf-hack17-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nextflow-io%2Fnf-hack17-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nextflow-io","download_url":"https://codeload.github.com/nextflow-io/nf-hack17-tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252778960,"owners_count":21802858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","docker","genomics","nextflow","singularity","tutorial"],"created_at":"2024-11-13T23:19:12.579Z","updated_at":"2025-05-06T22:31:06.868Z","avatar_url":"https://github.com/nextflow-io.png","language":"Nextflow","readme":"# Nextflow tutorial \n\nThis repository contains the tutorial material for the [Nextflow workshop](https://www.nextflow.io/blog/2017/nextflow-workshop.html). \n\n## Prerequisite\n\n* Java 8 or later \n* Docker engine 1.10.x (or higher) \n* Singularity 2.3.x (optional)\n\n## Setup the AWS account \n\n1. SSH in the login node: \n\n     `ssh login@34.252.30.18`\n    \n2. Launch your AWS instance  \n\n   `curl -L https://goo.gl/1C3abb | bash`\n\n   (if you need more storage or to use a different instance type specify in on the command like eg. `curl .. | bash -s t2.large 200`)\n   \n3. Open a new shell terminal and SSH to the new instance  \n\n   `ssh \u003cinstance name printed by the previous command\u003e`\n   \n## Installation \n\nInstall Nextflow by using the following command: \n\n```\ncurl -fsSL get.nextflow.io | bash\n```\n    \nThe above snippet creates the `nextflow` launcher in the current directory. \nComplete the installation moving it into a directory on your `PATH` eg: \n\n```\nmv nextflow $HOME/bin\n``` \n   \nFinally, clone this repository with the following command: \n\n```\ngit clone https://github.com/nextflow-io/hack17-tutorial.git \u0026\u0026 cd hack17-tutorial\n```\n\n## Nextflow hands-on \n\nDuring this tutorial you will implement a proof of concept of a RNA-Seq pipeline which: \n\n1. Indexes a trascriptome file.\n2. Performs quality controls \n3. Performs quantification.\n4. Create a MultiqQC report. \n\n## Step 1 - define the pipeline parameters \n\nThe script `script1.nf` defines the pipeline input parameters. Run it by using the \nfollowing command: \n\n```\nnextflow run script1.nf\n```\n\nTry to specify a different input parameter, for example: \n\n```\nnextflow run script1.nf --reads this/and/that\n```\n\n#### Exercise 1.1 \n\nModify the `script1.nf` adding a fourth parameter named `outdir` and set it to a default path\nthat will be used as the pipeline output directory. \n\n#### Exercise 1.2 \n\nModify the `script1.nf` to print all the pipeline parameters by using a single `println` \ncommand and a [multiline string](https://www.nextflow.io/docs/latest/script.html#multi-line-strings)\nstatement.  \n\nTip: see an example [here](https://github.com/nextflow-io/rnaseq-nf/blob/3b5b49f/main.nf#L41-L48).\n\n#### Recap \n\nIn this step you have learned: \n\n1. How to define parameters in your pipeline script\n2. How to pass parameters by using the command line\n3. The use of `$var` and `${var}` variable placeholders \n4. How to use multiline strings \n\n\n### Step 2 - Create transcriptome index file\n\nNextflow allows the execution of any command or user script by using a `process` definition. \n\nA process is defined by providing three main declarations: \nthe process [inputs](https://www.nextflow.io/docs/latest/process.html#inputs), \nthe process [outputs](https://www.nextflow.io/docs/latest/process.html#outputs)\nand finally the command [script](https://www.nextflow.io/docs/latest/process.html#script). \n\nThe second example adds the `index` process. Open it to see how the process is defined. \n\nIt takes the transcriptome file as input and creates the transcriptome index by using the `salmon` tool. \n\nNote how the input declaration defines a `transcriptome` variable in the process context \nthat it is used in the command script to reference that file in the Salmon command line.\n\nTry to run it by using the command: \n\n```\nnextflow run script2.nf\n```\n\nThe execution will fail because Salmon is not installed in your environment. \n\nAdd the command line option `-with-docker` to launch the execution through a Docker container\nas shown below: \n\n```\nnextflow run script2.nf -with-docker\n```\n\nThis time it works because it uses the Docker container `nextflow/rnaseq-nf` defined in the \n`nextflow.config` file. \n\nIn order to avoid to add the option `-with-docker` add the following line in the `nextflow.config` file: \n\n```\ndocker.enabled = true\n```\n\n#### Exercise 2.1 \n\nEnable the Docker execution by default adding the above setting in the `nextflow.config` file.\n\n#### Exercise 2.2 \n\nPrint the output of the `index_ch` channel by using the [println](https://www.nextflow.io/docs/latest/operator.html#println)\noperator (do not confuse it with the `println` statement seen previously).\n\n#### Exercise 2.3 \n\nUse the command `tree -a work` to see how Nextflow organises the process work directory. \n \n#### Recap \n\nIn this step you have learned: \n\n1. How to define a process executing a custom command\n2. How process inputs are declared \n3. How process outputs are declared\n4. How to access the number of available CPUs\n5. How to print the content of a channel\n\n\n### Step 3 - Collect read files by pairs\n\nThis step shows how to match *read* files into pairs, so they can be mapped by *Salmon*. \n\nEdit the script `script3.nf` and add the following statement as the last line: \n\n```\nread_pairs_ch.println()\n```\n\nSave it and execute it with the following command: \n\n```\nnextflow run script3.nf\n```\n\nIt will print an output similar to the one shown below:\n\n```\n[ggal_gut, [/../data/ggal/gut_1.fq, /../data/ggal/gut_2.fq]]\n```\n\nThe above example shows how the `read_pairs_ch` channel emits tuples composed by \ntwo elements, where the first is the read pair prefix and the second is a list \nrepresenting the actual files. \n\nTry it again specifying different read files by using a glob pattern:\n\n```\nnextflow run script3.nf --reads 'data/ggal/*_{1,2}.fq'\n```\n\n#### Exercise 3.1 \n\nUse the [set](https://www.nextflow.io/docs/latest/operator.html#set) operator in place \nof `=` assignment to define the `read_pairs_ch` channel. \n\n#### Exercise 3.2 \n\nUse the [ifEmpty](https://www.nextflow.io/docs/latest/operator.html#ifempty) operator \nto check if the `read_pairs_ch` contains at least an item. \n\n\n#### Recap \n\nIn this step you have learned: \n\n1. How to use `fromFilePairs` to handle read pair files\n2. How to use the `set` operator to define a new channel variable \n3. How to use the `ifEmpty` operator to check if a channel is empty\n\n\n### Step 4 - Perform expression quantification \n\nThe script `script4.nf` adds the `quantification` process. \n\nIn this script note as the `index_ch` channel, declared as output in the `index` process, \nis now used as a channel in the input section.  \n\nAlso note as the second input is declared as a `set` composed by two elements: \nthe `pair_id` and the `reads` in order to match the structure of the items emitted \nby the `read_pairs_ch` channel.\n\n\nExecute it by using the following command: \n\n```\nnextflow run script4.nf \n```\n\nYou will see the execution of a `quantication` process. \n\nExecute it again adding the `-resume` option as shown below: \n\n```\nnextflow run script4.nf -resume \n```\n\nThe `-resume` option skips the execution of any step that has been processed in a previous \nexecution. \n\nTry to execute it with more read files as shown below: \n\n```\nnextflow run script4.nf -resume --reads 'data/ggal/*_{1,2}.fq'\n```\n\nYou will notice that the `quantification` process is executed more than \none time. \n\nNextflow parallelizes the execution of your pipeline simply by providing multiple input data\nto your script.\n\n\n#### Exercise 4.1 \n\nAdd a [tag](https://www.nextflow.io/docs/latest/process.html#tag) directive to the \n`quantification` process to provide a more readable execution log .\n\n\n#### Exercise 4.2 \n\nAdd a [publishDir](https://www.nextflow.io/docs/latest/process.html#publishdir) directive \nto the `quantification` process to store the process results into a directory of your \nchoice. \n\n#### Recap \n\nIn this step you have learned: \n \n1. How to connect two processes by using the channel declarations\n2. How to resume the script execution skipping already already computed steps \n3. How to use the `tag` directive to provide a more readable execution output\n4. How to use the `publishDir` to store a process results in a path of your choice \n\n\n### Step 5 - Quality control \n\nThis step implements a quality control of your input reads. The inputs are the same \nread pairs which are provided to the `quantification` steps\n\nYou can run it by using the following command: \n\n```\nnextflow run script5.nf -resume \n``` \n\nThe script will report the following error message: \n\n```\nChannel `read_pairs_ch` has been used twice as an input by process `fastqc` and process `quantification`\n```\n\n\n#### Exercise 5.1 \n\nModify the creation of the `read_pairs_ch` channel by using a [into](https://www.nextflow.io/docs/latest/operator.html#into) \noperator in place of a `set`.  \n\nTip: see an example [here](https://github.com/nextflow-io/rnaseq-nf/blob/3b5b49f/main.nf#L58).\n\n\n#### Recap \n\nIn this step you have learned: \n\n1. How to use the `into` operator to create multiple copies of the same channel\n\n\n### Step 6 - MultiQC report \n\nThis step collect the outputs from the `quantification` and `fastqc` steps to create \na final report by using the [MultiQC](http://multiqc.info/) tool.\n \n\nExecute the script with the following command: \n\n```\nnextflow run script6.nf -resume --reads 'data/ggal/*_{1,2}.fq' \n```\n\nIt creates the final report in the `results` folder in the current work directory. \n\nIn this script note the use of the [mix](https://www.nextflow.io/docs/latest/operator.html#mix) \nand [collect](https://www.nextflow.io/docs/latest/operator.html#collect) operators chained \ntogether to get all the outputs of the `quantification` and `fastqc` process as a single\ninput. \n\n\n#### Recap \n\nIn this step you have learned: \n\n1. How to collect many outputs to a single input with the `collect` operator \n2. How to `mix` two channels in a single channel \n3. How to chain two or more operators togethers \n\n\n\n### Step 7 - Handle completion event\n\nThis step shows how to execute an action when the pipeline completes the execution. \n\nNote that Nextflow processes define the execution of *asynchronous* tasks i.e. they are not \nexecuted one after another as they are written in the pipeline script as it would happen in a \ncommon *imperative* programming language.\n\nThe script uses the `workflow.onComplete` event handler to print a confirmation message \nwhen the script completes. \n\nTry to run it by using the following command: \n\n```\nnextflow run script7.nf -resume --reads 'data/ggal/*_{1,2}.fq'\n```\n\n\n### Step 8 - Custom scripts\n\n\nReal world pipelines use a lot of custom user scripts (BASH, R, Python, etc). Nextflow \nallows you to use and manage all these scripts in consistent manner. Simply put them \nin a directory named `bin` in the pipeline project root. They will be automatically added \nto the pipeline execution `PATH`. \n\nFor example, create a file named `fastqc.sh` with the following content: \n\n```\n#!/bin/bash \nset -e \nset -u\n\nsample_id=${1}\nreads=${2}\n\nmkdir fastqc_${sample_id}_logs\nfastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads}\n```\n\nSave it, grant the execute permission and move it in the `bin` directory as shown below: \n\n```\nchmod +x fastqc.sh\nmkdir -p bin \nmv fastqc.sh bin\n```\n\nThen, open the `script7.nf` file and replace the `fastqc` process' script with  \nthe following code: \n\n```\n  script:\n    \"\"\"\n    fastqc.sh \"$sample_id\" \"$reads\"\n    \"\"\"  \n```\n\n\nRun it as before: \n\n```\nnextflow run script7.nf -resume --reads 'data/ggal/*_{1,2}.fq'\n```\n\n#### Recap \n\nIn this step you have learned: \n\n1. How to write or use existing custom script in your Nextflow pipeline.\n2. How to avoid the use of absolute paths having your scripts in the `bin/` project folder.\n\n\n\n### Step 9 - Executors  \n\nReal world genomic application can spawn the execution of thousands of jobs. In this \nscenario a batch scheduler is commonly used to deploy a pipeline in a computing cluster, \nallowing the execution of many jobs in parallel across many computing nodes. \n\nNextflow has built-in support for most common used batch schedulers such as Univa Grid Engine \nand SLURM between the [others](https://www.nextflow.io/docs/latest/executor.html).  \n\nTo run your pipeline with a batch scheduler modify the `nextflow.config` file specifying \nthe target executor and the required computing resources if needed. For example: \n\n```\nprocess.executor = 'slurm'\nprocess.queue = 'short'\nprocess.memory = '10 GB' \nprocess.time = '30 min'\nprocess.cpus = 8 \n```\n\nThe above configuration specify the use of the SLURM batch scheduler to run the \njobs spawned by your pipeline script. Then it specifies to use the `short` queue (partition), \n10 gigabyte of memory and 8 cpus per job, and each job can run for no more than 30 minutes. \n\nNote: the pipeline must be executed in a shared file system accessible to all the computing \nnodes. \n\n#### Exercise 9.1\n\nPrint the head of the `.command.run` script generated by Nextflow in the task work directory \nand verify it contains the SLURM `#SBATCH` directives for the requested resources.\n\n#### Exercise 9.2 \n\nModify the configuration file to specify different resource request for\nthe `quantification` process. \n\nTip: see the [process](https://www.nextflow.io/docs/latest/config.html#scope-process) documentation for an example. \n\n\n#### Recap \n\nIn this step you have learned: \n\n1. How to deploy a pipeline in a computing cluster. \n2. How to specify different computing resources for different pipeline processes. \n\n\n### Step 10 - Use configuration profiles \n\nThe Nextflow configuration file can be organised in different profiles \nto allow the specification of separate settings depending on the target execution environment. \n\nFor the sake of this tutorial modify the `nextflow.config` as shown below: \n\n\n```\nprofiles {\n  standard {\n    process.container = 'nextflow/rnaseq-nf'\n    docker.enabled = true\n  }\n  \n  cluster {\n    process.executor = 'slurm'\n    process.queue = 'short'\n    process.memory = '10 GB' \n    process.time = '30 min'\n    process.cpus = 8     \n  }\n} \n```\n\nThe above configuration defines two profiles: `standard` and `cluster`. The name of the \nprofile to use can be specified when running the pipeline script by using the `-profile` \noption. For example: \n\n```\nnextflow run script7.nf -profile cluster \n```\n\nThe profile `standard` is used by default if no other profile is specified by the user. \n\n\n#### Recap \n\nIn this step you have learned: \n\n1. How to organise your pipeline configuration in separate profiles\n\n\n### Step 11 - Run a pipeline from a GitHub repository \n\nNextflow allows the execution of a pipeline project directly from a GitHub \nrepository (or similar services eg. BitBucket and GitLab). \n\nThis simplifies the sharing and the deployment of complex projects and tracking changes in \na consistent manner. \n\nFor the sake of this tutorial consider the example project published in the following URL: \n\nhttps://github.com/nextflow-io/hello\n\nYou can run it by specifying the project name as shown below: \n\n```\nnextflow run nextflow-io/hello\n```\n\nIt automatically downloads it and store in the `$HOME/.nextflow` folder. \n\nUse the command `info` to show the project information, e.g.: \n\n```\nnextflow info nextflow-io/hello\n```\n\nNextflow allows the execution of a specific *revision* of your project by using the `-r` \ncommand line option. For Example: \n\n```\nnextflow run -r v1.2 nextflow-io/hello\n```\n\nRevision are defined by using Git tags or branches defined in the project repository. \n\nThis allows a precise control of the changes in your project files and dependencies over time. \n\n \n### Step 12 (bonus) - Deposit your pipeline scripts in a GitHub repository \n\nCreate a new repository in your GitHub account and upload there the pipeline scripts \nof this tutorial. Then execute it specifying its name on the Nextflow command line.  \n\nTip: see the [documentation](https://www.nextflow.io/docs/latest/sharing.html#publishing-your-pipeline) \nfor further details.  \n \n\n## Docker hands-on \n\nGet practice with basic Docker commands to pull, run and build your own containers.\n \nA container is a ready-to-run Linux environment which can be executed in an isolated \nmanner from the hosting system. It has own copy of the file system, processes space,\nmemory management, etc. \n \nContainers are a Linux feature known as *Control Groups* or [Ccgroups](https://en.wikipedia.org/wiki/Cgroups)\nintroduced with kernel 2.6. \n\nDocker adds to this concept an handy management tool to build, run and share container images. \n\nThese images can be uploaded and published in a centralised repository know as \n[Docker Hub](https://hub.docker.com), or hosted by other parties like for example [Quay](https://quay.io).\n\n\n### Step 1 - Run a container \n\nRun a container is easy as using the following command: \n\n```\ndocker run \u003ccontainer-name\u003e \n```\n\nFor example: \n\n```\ndocker run hello-world  \n```\n\n### Step 2 - Pull a container \n\nThe pull command allows you to download a Docker image without running it. For example: \n\n```\ndocker pull debian:wheezy \n```\n\nThe above command download a Debian Linux image.\n\n\n### Step 3 - Run a container in interactive mode \n\nLaunching a BASH shell in the container allows you to operate in an interactive mode \nin the containerised operating system. For example: \n\n```\ndocker run -it debian:wheezy bash \n``` \n\nOnce launched the container you wil noticed that's running as root (!). \nUse the usual commands to navigate in the file system.\n\nTo exit from the container, stop the BASH session with the exit command.\n\n### Step 4 - Your first Dockerfile\n\nDocker images are created by using a so called `Dockerfile` i.e. a simple text file \ncontaining a list of commands to be executed to assemble and configure the image\nwith the software packages required.    \n\nIn this step you will create a Docker image containing the Samtools tool.\n\n\nWarning: the Docker build process automatically copies all files that are located in the \ncurrent directory to the Docker daemon in order to create the image. This can take \na lot of time when big/many files exist. For this reason it's important to *always* work in \na directory containing only the files you really need to include in your Docker image. \nAlternatively you can use the `.dockerignore` file to select the path to exclude from the build. \n\nThen use your favourite editor eg. `vim` to create a file named `Dockerfile` and copy the \nfollowing content: \n\n```\nFROM debian:wheezy \n\nMAINTAINER \u003cyour name\u003e\n\nRUN apt-get update \u0026\u0026 apt-get install -y curl cowsay \n\nENV PATH=$PATH:/usr/games/\n   \n```\n\nWhen done save the file. \n\n\n### Step 5 - Build the image  \n\nBuild the Docker image by using the following command: \n\n```\ndocker build -t my-image .\n```\n\nNote: don't miss the dot in the above command. When it completes, verify that the image \nhas been created listing all available images: \n\n```\ndocker images\n```\n\nYou can try your new container by running this command: \n\n```\ndocker run my-image cowsay Hello Docker!\n```\n\n\n### Step 6 - Add a software package to the image\n\nAdd the Salmon package to the Docker image by adding to the `Dockerfile` the following snippet: \n\n```\nRUN curl -sSL https://github.com/COMBINE-lab/salmon/releases/download/v0.8.2/Salmon-0.8.2_linux_x86_64.tar.gz | tar xz \\\n \u0026\u0026 mv /Salmon-*/bin/* /usr/bin/ \\\n \u0026\u0026 mv /Salmon-*/lib/* /usr/lib/\n```\n\nSave the file and build again the image with the same command as before: \n\n```\ndocker build -t my-image .\n```\n\nYou will notice that it creates a new Docker image with the same name *but* with a \ndifferent image ID. \n\n### Step 7 - Run Salmon in the container \n\nCheck that everything is fine running Salmon in the container as shown below: \n\n```\ndocker run my-image salmon --version\n```\n\nYou can even launch a container in an interactive mode by using the following command: \n\n```\ndocker run -it my-image bash\n```\n\nUse the `exit` command to terminate the interactive session. \n\n\n### Step 8 - File system mounts\n\nCreate an genome index file by running Salmon in the container. \n\nTry to run Bowtie in the container with the following command: \n\n```\ndocker run my-image \\\n  salmon index -t $PWD/data/ggal/transcriptome.fa -i index\n```\n\nThe above command fails because Salmon cannot access the input file.\n\nThis happens because the container runs in a complete separate file system and \nit cannot access the hosting file system by default. \n\nYou will need to use the `--volume` command line option to mount the input file(s) eg. \n\n```\ndocker run --volume $PWD/data/ggal/transcriptome.fa:/transcriptome.fa my-image \\\n  salmon index -t /transcriptome.fa -i index \n```\n\nAn easier way is to mount a parent directory to an identical one in the container, \nthis allows you to use the same path when running it in the container eg. \n\n```\ndocker run --volume $HOME:$HOME --workdir $PWD my-image \\\n  salmon index -t $PWD/data/ggal/transcriptome.fa -i index\n```\n\n### Step 9 - Upload the container in the Docker Hub (bonus)\n\nPublish your container in the Docker Hub to share it with other people. \n\nCreate an account in the https://hub.docker.com web site. Then from your shell terminal run \nthe following command, entering the user name and password you specified registering in the Hub: \n\n```\ndocker login \n``` \n\nTag the image with your Docker user name account: \n\n```\ndocker tag my-image \u003cuser-name\u003e/my-image \n```\n\nFinally push it to the Docker Hub:\n\n```\ndocker push \u003cuser-name\u003e/my-image \n```\n\nAfter that anyone will be able to download it by using the command: \n\n```\ndocker pull \u003cuser-name\u003e/my-image \n```\n\nNote how after a pull and push operation, Docker prints the container digest number e.g. \n\n```\nDigest: sha256:aeacbd7ea1154f263cda972a96920fb228b2033544c2641476350b9317dab266\nStatus: Downloaded newer image for nextflow/rnaseq-nf:latest\n```\n\nThis is a unique and immutable identifier that can be used to reference container image \nin a univocally manner. For example: \n\n```\ndocker pull nextflow/rnaseq-nf@sha256:aeacbd7ea1154f263cda972a96920fb228b2033544c2641476350b9317dab266\n```\n\n## Singularity \n\n[Singularity](http://singularity.lbl.gov) is container runtime designed to work in HPC data center, where the usage\nof Docker is generally not allowed due to security constraints. \n\nSingularity implements the container execution model similarly to Docker however using \na complete different implementation design.\n\nA Singularity container image is archived in a plain file that can be stored in shared \nfile system and accessed by many computing nodes managed by a batch scheduler.\n\n### Create a Singularity images \n\nSingularity images are created using a `Singularityfile` in  similar manner to Docker, \nthough using a different syntax. \n\n```\nBootstrap: docker\nFrom: debian:wheezy \n\n%environment\nexport PATH=$PATH:/usr/games/\n\n\n%labels\nAUTHOR \u003cyour name\u003e \n\n%post\n\napt-get update \u0026\u0026 apt-get install -y locales-all curl cowsay \ncurl -sSL https://github.com/COMBINE-lab/salmon/releases/download/v0.8.2/Salmon-0.8.2_linux_x86_64.tar.gz | tar xz \\\n \u0026\u0026 mv /Salmon-*/bin/* /usr/bin/ \\\n \u0026\u0026 mv /Salmon-*/lib/* /usr/lib/\n ```\n\nOnce you have save the `Singularity` file. Create the image with these commands: \n\n```\nsingularity create my-image.img\nsudo singularity bootstrap my-image.img Singularityfile \n```\n\nSingularity requires two commands to build an image, the creates the file image \nand allocate the required space on the storage. The second build the real container\nimage. \n\nNote: the `bootstrap` command requires sudo permissions.\n\n### Running a container \n\nOnce done, you can run your container with the following command \n\n\n```  \nsingularity exec my-image.img cowsay Hello Singularity\n```\n\nBy using the `shell` command you can enter in the container in interactive mode. \nFor example: \n\n```\nsingularity shell my-image.img \n```\n\n### Import an Docker image \n\nAn easier way to create Singularity container without requiring sudo permission and \nbootsting the containers interoperability is to import a Docker image container \npulling it directly from a Docker registry. For example: \n\n```\nsingularity pull docker://debian:wheezy \n```\n\nThe above command automatically download the Debian Docker image and converts it to \na Singularity image store in the current directory with the name `debian-wheezy.img`.\n\n \n### Run a Nextflow script using a Singularity container \n \nNextflow allows the transparent usage of Singularity containers as easy as with \nDocker ones. \n \nIt only requires to enable the use of Singularity engine in place of Docker in the \nNextflow configuration file. \n \nTo run your previous script with Singularity add the following profile \nin the `nextflow.config` file in the `$HOME/hack17-course` directory: \n\n\n```\nprofiles {\n\n  foo {\n    process.container = 'docker://nextflow/rnaseq-nf'\n    singularity.enabled = true\n    singularity.cacheDir = \"$PWD\"\n  }\n}\n```\n\nThe above configuration instructs nextflow to use Singularity engine to run \nyour script processes. The container is pulled from the Docker registry and cached \nin the current directory to be used for further runs. \n\nTry to run the script as shown below: \n\n```\nnextflow run script7.nf -profile foo \n```\n  \nNote: Nextflow will pull the container image automatically, it will require a few seconds \ndepending the network connection speed.    \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnextflow-io%2Fnf-hack17-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnextflow-io%2Fnf-hack17-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnextflow-io%2Fnf-hack17-tutorial/lists"}