{"id":13703724,"url":"https://github.com/fmalmeida/ngs-preprocess","last_synced_at":"2026-01-22T21:55:34.842Z","repository":{"id":43728498,"uuid":"209628536","full_name":"fmalmeida/ngs-preprocess","owner":"fmalmeida","description":"A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies","archived":false,"fork":false,"pushed_at":"2024-06-30T13:46:44.000Z","size":5530,"stargazers_count":34,"open_issues_count":4,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-09-04T22:48:03.132Z","etag":null,"topics":["bax2bam","bioinformatics","illumina","nanopack","nextflow","ngs","ngs-preprocess","pacbio","pacbio-ccs","pipeline","porechop","reproducible-research","reproducible-science","trimgalore","workflow"],"latest_commit_sha":null,"homepage":"https://ngs-preprocess.readthedocs.io/","language":"Nextflow","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fmalmeida.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":".zenodo.json"}},"created_at":"2019-09-19T19:01:22.000Z","updated_at":"2025-04-16T13:45:33.000Z","dependencies_parsed_at":"2024-01-03T06:46:30.400Z","dependency_job_id":"a636a008-5b0a-45ce-bd42-2edfc86ee4fb","html_url":"https://github.com/fmalmeida/ngs-preprocess","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/fmalmeida/ngs-preprocess","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmalmeida%2Fngs-preprocess","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmalmeida%2Fngs-preprocess/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmalmeida%2Fngs-preprocess/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmalmeida%2Fngs-preprocess/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fmalmeida","download_url":"https://codeload.github.com/fmalmeida/ngs-preprocess/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fmalmeida%2Fngs-preprocess/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28672095,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T20:48:19.482Z","status":"ssl_error","status_checked_at":"2026-01-22T20:48:14.968Z","response_time":144,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bax2bam","bioinformatics","illumina","nanopack","nextflow","ngs","ngs-preprocess","pacbio","pacbio-ccs","pipeline","porechop","reproducible-research","reproducible-science","trimgalore","workflow"],"created_at":"2024-08-02T21:00:59.303Z","updated_at":"2026-01-22T21:55:34.817Z","avatar_url":"https://github.com/fmalmeida.png","language":"Nextflow","funding_links":[],"categories":["Next Generation Sequencing"],"sub_categories":["Pipelines"],"readme":"\u003cimg src=\"images/lOGO_3.png\" width=\"300px\"\u003e\n\n[![F1000 Paper](https://img.shields.io/badge/Citation%20F1000-10.12688/f1000research.139488.1-orange)](https://doi.org/10.12688/f1000research.139488.1)\n[![Releases](https://img.shields.io/github/v/release/fmalmeida/ngs-preprocess)](https://github.com/fmalmeida/ngs-preprocess/releases)\n[![Documentation](https://img.shields.io/badge/Documentation-readthedocs-brightgreen)](https://ngs-preprocess.readthedocs.io/en/latest/?badge=latest)\n[![Dockerhub](https://img.shields.io/badge/Docker-fmalmeida/ngs--preprocess-informational)](https://hub.docker.com/r/fmalmeida/ngs-preprocess)\n[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.04.0-23aa62.svg?labelColor=000000)](https://www.nextflow.io/)\n[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000\u0026logo=anaconda)](https://docs.conda.io/en/latest/)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000\u0026logo=docker)](https://www.docker.com/)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)\n[![Follow on Twitter](http://img.shields.io/badge/twitter-%40fmarquesalmeida-1DA1F2?labelColor=000000\u0026logo=twitter)](https://twitter.com/fmarquesalmeida)\n[![License](https://img.shields.io/badge/License-GPL%203-black)](https://github.com/fmalmeida/ngs-preprocess/blob/master/LICENSE)\n[![Zenodo Archive](https://img.shields.io/badge/Zenodo-Archive-blue)](https://doi.org/10.5281/zenodo.3451405)\n\n\u003cp align=\"center\"\u003e\n  \u003c!-- \u003ca href=\"https://github.com/othneildrew/Best-README-Template\"\u003e\n    \u003cimg src=\"images/logo.png\" alt=\"Logo\" width=\"80\" height=\"80\"\u003e\n  \u003c/a\u003e --\u003e\n\n  \u003ch1 align=\"center\"\u003engs-preprocess pipeline\u003c/h2\u003e\n\n  \u003cp align=\"center\"\u003e\n    \u003ch3 align=\"center\"\u003eA pipeline for preprocessing short and long sequencing reads\u003c/h3\u003e\n    \u003cbr /\u003e\n    \u003ca href=\"https://ngs-preprocess.readthedocs.io/en/latest/index.html\"\u003e\u003cstrong\u003eSee the documentation »\u003c/strong\u003e\u003c/a\u003e\n    \u003cbr /\u003e\n    \u003cbr /\u003e\n    \u003ca href=\"https://github.com/fmalmeida/ngs-preprocess/issues\"\u003eReport Bug\u003c/a\u003e\n    ·\n    \u003ca href=\"https://github.com/fmalmeida/ngs-preprocess/issues\"\u003eRequest Feature\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/p\u003e\n\n## About\n\nngs-preprocess is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. It is an easy to use pipeline that uses state-of-the-art software for quality check and pre-processing ngs reads of Illumina, Pacbio and Oxford Nanopore Technologies.\n\nIt wraps up the following software:\n\n| Step | tools |\n| :--- | :---- |\n| SRA NBCI fetch | [Entrez-direct](https://anaconda.org/bioconda/entrez-direct) \u0026 [sra-tools](https://github.com/ncbi/sra-tools) |\n| Illumina pre-processing | [Fastp](https://github.com/OpenGene/fastp) |\n| Nanopore pre-processing | [Porechop](https://github.com/rrwick/Porechop), [Porechop ABI](https://github.com/bonsai-team/Porechop_ABI), [pycoQC](https://github.com/tleonardi/pycoQC), [NanoPack](https://github.com/wdecoster/nanopack) |\n| Pacbio pre-processing | [bam2fastx](https://github.com/PacificBiosciences/pbtk#bam2fastx), [bax2bam](https://anaconda.org/bioconda/bax2bam), [lima](https://github.com/PacificBiosciences/barcoding), [pacbio ccs](https://ccs.how/) |\n\n## Further reading\n\nThis pipeline has two complementary pipelines (also written in nextflow) for [genome assembly](https://github.com/fmalmeida/mpgap) and [prokaryotic genome annotation](https://github.com/fmalmeida/bacannot) that can give the user a complete workflow for bacterial genomics analyses.\n\n## Quickstart\n\n1. Install Nextflow:\n    \n    ```bash\n    curl -s https://get.nextflow.io | bash\n    ```\n    \n2. Give it a try:\n    \n    ```bash\n    nextflow run fmalmeida/ngs-preprocess --help\n    ```\n\n3. Download required tools\n\n    * for docker\n\n        ```bash\n        # for docker\n        docker pull fmalmeida/ngs-preprocess:v2.7\n\n        # run\n        nextflow run fmalmeida/ngs-preprocess -profile docker [options]\n        ```\n\n    * for singularity\n\n        ```bash\n        # for singularity\n        # remember to properly set NXF_SINGULARITY_LIBRARYDIR\n        # read more at https://www.nextflow.io/docs/latest/singularity.html#singularity-docker-hub\n        export NXF_SINGULARITY_LIBRARYDIR=MY_SINGULARITY_IMAGES    # your singularity storage dir\n        export NXF_SINGULARITY_CACHEDIR=MY_SINGULARITY_CACHE       # your singularity cache dir\n        singularity pull \\\n            --dir $NXF_SINGULARITY_LIBRARYDIR \\\n            fmalmeida-ngs-preprocess-v2.7.img docker://fmalmeida/ngs-preprocess:v2.7\n        \n        # run\n        nextflow run fmalmeida/ngs-preprocess -profile singularity [options]\n        ```\n    \n    * for conda\n    \n        ```bash\n        # for conda\n        # it is better to create envs with mamba for faster solving\n        wget https://github.com/fmalmeida/ngs-preprocess/raw/master/environment.yml\n        conda env create -f environment.yml   # advice: use mamba\n\n        # must be executed from the base environment\n        # This tells nextflow to load the available ngs-preprocess environment when required\n        nextflow run fmalmeida/ngs-preprocess -profile conda [options]\n        ```\n    \n4. Start running your analysis\n    \n    ```bash\n    nextflow run fmalmeida/ngs-preprocess -profile \u003cdocker/singularity/conda\u003e\n    ```\n\n:fire: Please read the documentation below on [selecting between conda, docker or singularity](https://github.com/fmalmeida/ngs-preprocess/tree/master#selecting-between-profiles) profiles, since the tools will be made available differently depending on the profile desired.\n\n## Documentation\n\n### Selecting between profiles\n\nNextflow profiles are a set of \"sensible defaults\" for the resource requirements of each of the steps in the workflow, that can be enabled with the command line flag `-profile`. You can learn more about nextflow profiles at:\n\n+ https://nf-co.re/usage/configuration#basic-configuration-profiles\n+ https://www.nextflow.io/docs/latest/config.html#config-profiles\n\nThe pipeline have \"standard profiles\" set to run the workflows with either conda, docker or singularity using the [local executor](https://www.nextflow.io/docs/latest/executor.html), which is nextflow's default and basically runs the pipeline processes in the computer where Nextflow is launched. If you need to run the pipeline using another executor such as sge, lsf, slurm, etc. you can take a look at [nextflow's manual page](https://www.nextflow.io/docs/latest/executor.html) to proper configure one in a new custom profile set in your personal copy of [ngs-preprocess config file](https://github.com/fmalmeida/ngs-preprocess/blob/master/nextflow.config) and take advantage that nextflow allows multiple profiles to be used at once, e.g. `-profile conda,sge`.\n\nBy default, if no profile is chosen, the pipeline will try to load tools from the local machine $PATH. Available pre-set profiles for this pipeline are: `docker/conda/singularity`, you can choose between them as follows:\n\n* conda\n\n    ```bash\n    # must be executed from the base environment\n    # This tells nextflow to load the available ngs-preprocess environment when required\n    nextflow run fmalmeida/ngs-preprocess -profile conda [options]\n    ```\n\n* docker\n    \n    ```bash\n    nextflow run fmalmeida/ngs-preprocess -profile docker [options]\n    ```\n\n* singularity\n    \n    ```bash\n    nextflow run fmalmeida/ngs-preprocess -profile singularity [options]\n    ```\n\n:book: Please use conda as last resource since the packages will not be \"frozen and pre-installed\", problems may arise.\n\n### Usage\n\nFor understading pipeline usage and configuration, users must read the \u003ca href=\"https://ngs-preprocess.readthedocs.io/en/latest/index.html\"\u003e\u003cstrong\u003ecomplete online documentation »\u003c/strong\u003e\u003c/a\u003e\n\n### Using a configuration file\n\nAll the parameters showed above can be, and are advised to be, set through the configuration file. When a configuration file is set the pipeline is run by simply executing:\n\n```bash\nnextflow run fmalmeida/ngs-preprocess -c ./configuration-file\n```\n\nYour configuration file is what will tell to the pipeline the type of data you have, and which processes to execute. Therefore, it needs to be correctly set up.\n\nCreate a configuration file in your working directory:\n\n```bash\nnextflow run fmalmeida/ngs-preprocess [ --get_config ]\n```\n\n### Interactive graphical configuration and execution\n\n#### Via NF tower launchpad (good for cloud env execution)\n\nNextflow has an awesome feature called [NF tower](https://tower.nf). It allows that users quickly customise and set-up the execution and configuration of cloud enviroments to execute any nextflow pipeline from nf-core, github (this one included), bitbucket, etc. By having a compliant JSON schema for pipeline configuration it means that the configuration of parameters in NF tower will be easier because the system will render an input form.\n\nCheckout more about this feature at: https://seqera.io/blog/orgs-and-launchpad/\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://j.gifs.com/GRnqm7.gif\" width=\"500px\"/\u003e\n\u003c/p\u003e\n\n#### Via nf-core launch (good for local execution)\n\nUsers can trigger a graphical and interactive pipeline configuration and execution by using [nf-core launch](https://nf-co.re/launch) utility. nf-core launch will start an interactive form in your web browser or command line so you can configure the pipeline step by step and start the execution of the pipeline in the end.\n\n```bash\n# Install nf-core\npip install nf-core\n\n# Launch the pipeline\nnf-core launch fmalmeida/ngs-preprocess\n```\n\nIt will result in the following:\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./images/nf-core-asking.png\" width=\"500px\"/\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./images/nf-core-gui.png\" width=\"400px\"/\u003e\n\u003c/p\u003e\n\n# Citation\n\nIn order to cite this pipeline, please refer to:\n\n\u003e Almeida FMd, Campos TAd and Pappas Jr GJ. Scalable and versatile container-based pipelines for de novo genome assembly and bacterial annotation. F1000Research 2023, 12:1205 (https://doi.org/10.12688/f1000research.139488.1)\n\nAdditionally, archived versions of the pipeline are also found in [Zenodo](https://doi.org/10.5281/zenodo.3451405).\n\nThis pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [GPLv3](https://github.com/fmalmeida/ngs-preprocess/blob/master/LICENSE).\n\n\u003e The nf-core framework for community-curated bioinformatics pipelines.\n\u003e\n\u003e Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso \u0026 Sven Nahnsen.\n\u003e\n\u003e Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n\nIn addition, users are encouraged to cite the programs used in this pipeline whenever they are used. Links to resources of tools and data used in this pipeline are as follows:\n\n* [Entrez-direct](https://anaconda.org/bioconda/entrez-direct)\n* [sra-tools](https://github.com/ncbi/sra-tools)\n* [Fastp](https://github.com/OpenGene/fastp)\n* [Porechop](https://github.com/rrwick/Porechop)\n* [Porechop ABI](https://github.com/bonsai-team/Porechop_ABI)\n* [pycoQC](https://github.com/a-slide/pycoQC)\n* [bax2bam](https://anaconda.org/bioconda/bax2bam)\n* [bam2fastq](https://github.com/PacificBiosciences/pbtk#bam2fastx)\n* [lima](https://github.com/PacificBiosciences/barcoding)\n* [pacbio ccs](https://ccs.how/)\n* [NanoPack](https://github.com/wdecoster/nanopack).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffmalmeida%2Fngs-preprocess","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffmalmeida%2Fngs-preprocess","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffmalmeida%2Fngs-preprocess/lists"}