{"id":43977607,"url":"https://github.com/bzhanglab/neoflow","last_synced_at":"2026-02-07T08:33:58.041Z","repository":{"id":79790266,"uuid":"150484566","full_name":"bzhanglab/neoflow","owner":"bzhanglab","description":"NeoFlow: a proteogenomics pipeline for neoantigen discovery","archived":false,"fork":false,"pushed_at":"2024-11-16T21:19:32.000Z","size":4207,"stargazers_count":26,"open_issues_count":9,"forks_count":12,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-09-09T20:57:42.559Z","etag":null,"topics":["neoantigen-discovery","neoantigen-prediction","nextflow-pipeline","novel-peptide-identifications","proteogenomics"],"latest_commit_sha":null,"homepage":"","language":"Nextflow","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bzhanglab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2018-09-26T20:13:36.000Z","updated_at":"2025-08-13T14:37:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"95603384-1089-428a-9fcf-9ac55f8e96d4","html_url":"https://github.com/bzhanglab/neoflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bzhanglab/neoflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bzhanglab%2Fneoflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bzhanglab%2Fneoflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bzhanglab%2Fneoflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bzhanglab%2Fneoflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bzhanglab","download_url":"https://codeload.github.com/bzhanglab/neoflow/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bzhanglab%2Fneoflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29190256,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-07T07:37:03.739Z","status":"ssl_error","status_checked_at":"2026-02-07T07:37:03.029Z","response_time":63,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["neoantigen-discovery","neoantigen-prediction","nextflow-pipeline","novel-peptide-identifications","proteogenomics"],"created_at":"2026-02-07T08:33:57.455Z","updated_at":"2026-02-07T08:33:58.035Z","avatar_url":"https://github.com/bzhanglab.png","language":"Nextflow","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [NeoFlow](https://doi.org/10.1038/s41467-020-15456-w)\n\n## Overview\n\n**[NeoFlow](https://doi.org/10.1038/s41467-020-15456-w): a proteogenomics pipeline for neoantigen discovery**\n\nNeoFlow includes four modules:\n\n1. Variant annotation and customized database construction: `neoflow_db.nf`;\n2. Variant peptide identification: `neoflow_msms.nf`;\n\n   * MS/MS searching. Three search engines are available: [MS-GF+](https://github.com/MSGFPlus/msgfplus), [X!Tandem](https://www.thegpm.org/tandem/) and [Comet](http://comet-ms.sourceforge.net/);\n   * FDR estimation: global FDR estimation;\n   * Novel peptide validation by [PepQuery](http://pepquery.org/);\n   * RT based validation for novel peptide identifications using [AutoRT](https://github.com/bzhanglab/AutoRT): optional (GPU required).\n\n3. HLA typing: `neoflow_hlatyping.nf`;\n4. Neoantigen prediction: `neoflow_neoantigen.nf`.\n\n\nNeoFlow supports both label free and iTRAQ/TMT data.\n\n## Installation\n\n1. Download neoflow:\n\n```sh\ngit clone https://github.com/bzhanglab/neoflow\n```\n\n2. Install [Docker](https://docs.docker.com/install/) (\u003e=19.03).\n\n3. Install [Nextflow](https://www.nextflow.io/docs/latest/getstarted.html). More information can be found in the Nextflow [get started](https://www.nextflow.io/docs/latest/getstarted.html) page.\n\n4. Install **ANNOVAR** by following the instruction at [http://annovar.openbioinformatics.org/en/latest/](http://annovar.openbioinformatics.org/en/latest/).\n\n5. Install **netMHCpan 4.0** by following the instruction at [http://www.cbs.dtu.dk/services/doc/netMHCpan-4.0.readme](http://www.cbs.dtu.dk/services/doc/netMHCpan-4.0.readme). Please set **`TMPDIR`** in file `netMHCpan-4.0/netMHCpan` as `/tmp` as shown below:\n\n```sh\n# determine where to store temporary files (must be writable to all users)\n\nif ( ${?TMPDIR} == 0 ) then\n        setenv  TMPDIR  /tmp\nendif\n```\n\n6. Install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) (\u003e=2.2.2) for [**AutoRT**](https://github.com/bzhanglab/AutoRT/) by following the instruction at [https://github.com/NVIDIA/nvidia-docker](https://github.com/NVIDIA/nvidia-docker). This is optional and it is only required when users want to use the RT based validation for novel peptide identifications using AutoRT.\n\nAll other tools used by NeoFlow have been dockerized and will be automatically installed when NeoFlow is run in the first time on a computer.\n\n## Usage\n\n### 1. Variant annotation and customized database construction\n\n```sh\n $ nextflow run neoflow_db.nf --help\nN E X T F L O W  ~  version 19.10.0\nLaunching `neoflow_db.nf` [irreverent_faggin] - revision: 741bf1a931\n=========================================\nneoflow =\u003e variant annotation and customized database construction\n=========================================\nUsage:\nnextflow run neoflow_db.nf\nArguments:\n  --vcf_file              A txt file contains VCF file(s)\n  --annovar_dir           ANNOVAR folder\n  --protocol              The parameter of \"protocol\" for ANNOVAR, default is \"refGene\"\n  --ref_dir               ANNOVAR annotation data folder\n  --ref_ver               The genome version, hg19 or hg38, default is \"hg19\"\n  --out_dir               Output folder, default is \"./output\"\n  --cpu                   The number of CPUs\n  --help                  Print help message\n```\n\nThe input file for parameter `--vcf_file` is a **tab-delimited text file** which contains the path of variant file(s). The variant file can be [**VCF format**](https://samtools.github.io/hts-specs/VCFv4.2.pdf) or [**simple text-based format**](http://annovar.openbioinformatics.org/en/latest/user-guide/input/) ([ANNOVAR input format](http://annovar.openbioinformatics.org/en/latest/user-guide/input/)). The input txt file (**a tab-delimited text file**) for `--vcf_file` format is shown below:\n\n| experiment | sample | file | file_type |\n|---|---|---|---|\n| TMT01 | T1 | T1_somatic.vcf;T1_rna.vcf | somatic;rna |\n| TMT01 | T2 | T2_somatic.vcf;T2_rna.vcf | somatic;rna |\n| TMT02 | T3 | T3_somatic.vcf;T3_rna.vcf | somatic;rna |\n| TMT02 | T4 | T4_somatic.vcf;T4_rna.vcf | somatic;rna |\n\nThe column of `experiment` is label free, TMT or iTRAQ experiment name and the column of `sample` is sample name. If it's iTRAQ or TMT data, the samples from the same iTRAQ or TMT experiment should have the same `experiment` name. If it's label free data, different samples should have different `experiment` name. All variant files (for example, somatic variant vcf file and variant calling result vcf file based on RNA-Seq data) for the same sample should be in the same row (column `file`) and different files should be separated by \";\". The column of `file_type` indicates the corresponding variant types for the vcf files in column `file`. **Please note that all variant files should be under the folder where you run neoflow**. We recommend users to provide absolute path for each variant file in the input txt file for `--vcf_file`.\n\nThe ANNOVAR annotation data (`--annovar_dir`) can be downloaded following the instruction at [http://annovar.openbioinformatics.org/en/latest/user-guide/download/](http://annovar.openbioinformatics.org/en/latest/user-guide/download/). \n\nThe output files of `neoflow_db.nf` include customized protein databases in FASTA format for each experiment, variant annotation result files for each sample.\n\n#### Example\n\n```sh\nnextflow run neoflow_db.nf --ref_dir /data/tools/annovar/humandb_hg19/ \\\n                           --vcf_file example_data/test_vcf_files.tsv \\\n                           --annovar_dir /data/tools/annovar/ \\\n                           --ref_ver hg19 \\\n                           --out_dir output\n```\nPlease update  inputs for parameters `--ref_dir`  and `--annovar_dir` before run the above example. The input file for `--vcf_file` can be downloaded from the [example data](http://pdv.zhang-lab.org/data/download/neoflow_example_data/example_data.tar.gz) (Right click and Select **\"Save link as…\"** or **\"Download Linked File\"**) prepared for testing. After the example data is downloaded to users' computer, unzip the data and all the testing data are available in the **example_data** folder.\n\nThe running time of above example is less than 5 minutes on a Linux server with 40 cores.\n\n### 2. Variant peptide identification\nPlease note that the customized database generated in the first step will be used in this step. \n```sh\n $ ./nextflow run neoflow_msms.nf --help\nN E X T F L O W  ~  version 19.10.0\nLaunching `neoflow_msms.nf` [drunk_nobel] - revision: 6d58fb19bd\n=========================================\nneoflow =\u003e Variant peptide identification\n=========================================\nUsage:\nnextflow run neoflow-msms.nf\nMS/MS searching arguments:\n  --db                        The customized protein database (target + decoy sequences) in FASTA format which is generated by neoflow_db.nf\n  --ms                        MS/MS data in MGF format\n  --msms_para_file            Parameter file for MS/MS searching\n  --out_dir                   Output folder, default is \"./\"\n  --prefix                    The prefix of output files\n  --search_engine             The search engine used for MS/MS searching, comet=Comet, msgf=MS-GF+ or xtandem=X!Tandem\n\nPepQuery arguments:\n  --pv_enzyme                 Enzyme used for protein digestion. 0:Non enzyme, 1:Trypsin (default), 2:Trypsin (no P rule), 3:Arg-C, 4:Arg-C (no P rule), 5:Arg-N, 6:Glu-C, 7:Lys-C\n  --pv_c                      The max missed cleavages, default is 2\n  --pv_tol                    Precursor ion m/z tolerance, default is 10\n  --pv_tolu                   The unit of --tol, ppm or Da. Default is ppm\n  --pv_itol                   The error window for fragment ion, default is 0.5\n  --pv_fixmod                 Fixed modification. The format is like : 1,2,3. Different modification is represented by different number\n  --pv_varmod                 Variable modification. The format is the same with --fixMod;\n  --pv_refdb                  Reference protein database\n\nAutoRT parameters:\n  --rt_validation             Perform RT based validation\n  \n  --help                      Print help message\n```\n\nThe output files of `neoflow_msms.nf` include MS/MS searching raw identification files, FDR estimation result files at both PSM and peptide levels, PepQuery validation result files. \n\n#### Example\n\n```sh\nnextflow run neoflow_msms.nf --ms example_data/mgf/ \\\n               --msms_para_file example_data/comet_parameter.txt \\\n               --search_engine comet \\\n               --db output/customized_database/neoflow_crc_target_decoy.fasta \\\n               --out_dir output \\\n               --pv_refdb output/customized_database/ref.fasta \\\n               --pv_tol 20 \\\n               --pv_itol 0.05\n```\n\nThe input files for `--ms` and `--msms_para_file` can be downloaded from the [example data](http://pdv.zhang-lab.org/data/download/neoflow_example_data/example_data.tar.gz) (Right click and Select **\"Save link as…\"** or **\"Download Linked File\"**)  prepared for testing. \n\nThe variant peptide identification result is in this file `output/novel_peptide_identification/novel_peptides_psm_pepquery.tsv`.\n\nThe running time of above example is less than 15 minutes on a Linux server with 40 cores.\n\n### 3. HLA typing\n```sh\n $ ./nextflow run neoflow_hlatyping.nf --help\nN E X T F L O W  ~  version 19.10.0\nLaunching `neoflow_hlatyping.nf` [spontaneous_hawking] - revision: 5fd970e701\n=========================================\nneoflow =\u003e HLA typing\n=========================================\nUsage:\nnextflow run neoflow_hlatyping.nf\nArguments:\n  --reads                     Reads data in fastq.gz or fastq format. For example, \"*_{1,2}.fq.gz\"\n  --hla_ref_dir               HLA reference folder\n  --seqtype                   Read type, dna or rna. Default is dna.\n  --singleEnd                 Single end or not, default is false (pair end reads)\n  --cpu                       The number of CPUs, default is 6.\n  --out_dir                   Output folder, default is \"./\"\n  --help                      Print help message\n```\nThe  output of `neoflow_hlatyping.nf` is a txt format file containing HLA alleles for a sample. This file is generated by [OptiType](https://github.com/FRED-2/OptiType).\n\n#### Example\n```sh\nnextflow run neoflow_hlatyping.nf --hla_ref_dir example_data/hla_reference \\\n                  --reads \"example_data/dna/*_{1,2}.fastq.gz\" \\\n                  --out_dir output/ \\\n                  --cpu 40\n```\n\nThe input files for `--hla_ref_dir` and `--reads` can be downloaded from the [example data](http://pdv.zhang-lab.org/data/download/neoflow_example_data/example_data.tar.gz) (Right click and Select **\"Save link as…\"** or **\"Download Linked File\"**) prepared for testing. \n\nThe HLA typing result is in this file `output/hla_type/sample1/sample1_result.tsv`.\n\nThe running time of above example is less than 10 minutes on a Linux server with 40 cores.\n\n\n### 4. Neoantigen prediction\nPlease note that the results generated in step 1-3 will be used in this step. \n```sh\n $ ./nextflow run neoflow_neoantigen.nf --help\nN E X T F L O W  ~  version 19.10.0\nLaunching `neoflow_neoantigen.nf` [mighty_roentgen] - revision: e4261baca3\n=========================================\nneoflow =\u003e Neoantigen prediction\n=========================================\nUsage:\nnextflow run neoflow_neoantigen.nf\nArguments:\n  --var_db                  Variant (somatic) database in fasta format generated by neoflow_db.nf\n  --var_info_file           Variant (somatic) information in txt format generated by neoflow_db.nf\n  --ref_db                  Reference (known) protein database\n  --hla_type                HLA typing result in txt format generated by Optitype\n  --netmhcpan_dir           NetMHCpan 4.0 folder\n  --var_pep_file            Variant peptide identification result generated by neoflow_msms.nf, optional.\n  --var_pep_info            Variant information in txt format for customized database used for variant peptide identification\n  --prefix                  The prefix of output files\n  --out_dir                 Output directory\n  --cpu                     The number of CPUs\n  --help                    Print help message\n```\n\nThe output of `neoflow_neoantigen.nf` is a tsv format file containing neoantigen prediction result as shown below:\n\nVariant\\_ID|Chr|Start|End|Ref|Alt|Variant\\_Type|Variant\\_Function|Gene|mRNA|Neoepitope|Variant\\_Start|Variant\\_End|AA\\_before|AA\\_after|HLA\\_type|netMHCpan\\_binding\\_affinity\\_nM|netMHCpan\\_precentail\\_rank|protein\\_var\\_evidence\\_pep\n:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:\nVAR\\|NM\\_002536\\|10054|chrX|48418659|48418659|G|A|nonsynonymous SNV|protein-altering|TBC1D25|NM\\_002536|TGFGGHRG|1|1|A|T|HLA-A*01:01|44216.6|88.5537|-\nVAR\\|NM\\_002536\\|10054|chrX|48418659|48418659|G|A|nonsynonymous SNV|protein-altering|TBC1D25|NM\\_002536|TGFGGHRG|1|1|A|T|HLA-C*07:01|43330|73.7774|-\nVAR\\|NM\\_002536\\|10054|chrX|48418659|48418659|G|A|nonsynonymous SNV|protein-altering|TBC1D25|NM\\_002536|TGFGGHRG|1|1|A|T|HLA-B*08:01|35925.8|70.8561|-\nVAR\\|NM\\_001348265\\|10055|chrX|48418659|48418659|G|A|nonsynonymous SNV|protein-altering|TBC1D25|NM\\_001348265|TGFGGHRG|1|1|A|T|HLA-A*01:01|44216.6|88.5537|-\nVAR\\|NM\\_001348265\\|10055|chrX|48418659|48418659|G|A|nonsynonymous SNV|protein-altering|TBC1D25|NM\\_001348265|TGFGGHRG|1|1|A|T|HLA-C*07:01|43330|73.7774|-\n\n\nColumn description for the above table:\n```\nVariant_ID:\tvariant ID defined by neoflow\nChr:\tvariant chromosome\nStart:\tstart position on genome\nEnd:\tend position on genome\nRef:\treference base\nAlt:\talterative base\nVariant_Type:\tvariant type annotated by ANNOVAR\nVariant_Function:\tvariant function annotated by ANNOVAR\nGene:\tgene ID\nmRNA:\tmRNA ID\nNeoepitope:\tneoepitope peptide\nVariant_Start:\tvariant start position on neoepitope peptide\nVariant_End:\tvariant end position on neoepitope peptide\nAA_before:\treference amino acid\nAA_after:\talterative amino acid\nHLA_type:\tHLA type\nnetMHCpan_binding_affinity_nM:\tMHC-peptide binding affinity from NetMHCpan 4.0. The lower the value, the higher the binding affinity between MHC and neoepitope peptide.\nnetMHCpan_precentail_rank:\tMHC-peptide binding affinity rank from NetMHCpan 4.0\nprotein_var_evidence_pep:\tvariant peptide. \"-\" means no variant peptide identified covers the mutation site.\n```\n\n#### Example\n```sh\nnextflow run neoflow_neoantigen.nf --prefix sample1 \\\n                   --hla_type output/hla_type/sample1/sample1_result.tsv \\\n                   --var_db output/customized_database/sample1-somatic-var.fasta \\\n                   --var_info_file output/customized_database/sample1-somatic-varInfo.txt \\\n                   --out_dir output/ \\\n                   --netmhcpan_dir /data/tools/netMHCpan-4.0/ \\\n                   --cpu 40 \\\n                   --ref_db output/customized_database/ref.fasta \\\n                   --var_pep_file output/novel_peptide_identification/novel_peptides_psm_pepquery.tsv \\\n                   --var_pep_info output/customized_database/neoflow_crc_anno-varInfo.txt\n```\n\nPlease update  input for parameter `--netmhcpan_dir` before run the above example. \n\nThe neoantigen prediction result is in this file `output/neoantigen_prediction/sample1_neoepitope_filtered_by_reference_add_variant_protein_evidence.tsv`.\n\nThe running time of above example is less than 30 minutes on a Linux server with 40 cores.\n\n##  Example data\n\nThe test data used for above examples can be downloaded by clicking [test data ](http://pdv.zhang-lab.org/data/download/neoflow_example_data/example_data.tar.gz) (Right click and Select **\"Save link as…\"** or **\"Download Linked File\"**). \n\n\n## How to cite:\n\nWen, B., Li, K., Zhang, Y. et al. Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis. Nature Communications 11, 1759 (2020). https://doi.org/10.1038/s41467-020-15456-w\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbzhanglab%2Fneoflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbzhanglab%2Fneoflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbzhanglab%2Fneoflow/lists"}