{"id":19857015,"url":"https://github.com/griffithlab/neoag_vaccine_scripts","last_synced_at":"2025-06-24T23:35:16.333Z","repository":{"id":187322436,"uuid":"676596465","full_name":"griffithlab/neoag_vaccine_scripts","owner":"griffithlab","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-20T16:33:27.000Z","size":10968,"stargazers_count":0,"open_issues_count":8,"forks_count":0,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-06-20T17:39:21.611Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/griffithlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-08-09T15:03:20.000Z","updated_at":"2025-06-20T16:33:31.000Z","dependencies_parsed_at":"2023-10-23T16:36:40.471Z","dependency_job_id":"1fb051cc-cc92-4da7-bc15-db72b017a891","html_url":"https://github.com/griffithlab/neoag_vaccine_scripts","commit_stats":null,"previous_names":["evelyn-schmidt/neoag_vaccine_scripts","griffithlab/neoag_vaccine_scripts"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/griffithlab/neoag_vaccine_scripts","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griffithlab%2Fneoag_vaccine_scripts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griffithlab%2Fneoag_vaccine_scripts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griffithlab%2Fneoag_vaccine_scripts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griffithlab%2Fneoag_vaccine_scripts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/griffithlab","download_url":"https://codeload.github.com/griffithlab/neoag_vaccine_scripts/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griffithlab%2Fneoag_vaccine_scripts/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261776271,"owners_count":23208080,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T14:17:13.720Z","updated_at":"2025-06-24T23:35:16.327Z","avatar_url":"https://github.com/griffithlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Neoantigen Pipeline Helper Scripts\nThese scripts assist in setting up files ofr manualling reviewing the results for Neoantigen Vaccine Desing results generate from the [Washington University Immuno Pipeline](https://github.com/wustl-oncology/analysis-wdls).\n  \n## Creating Case Final Report on compute 1\n\n### Before Immunogenomics Tumor Board Review\n\nA written case final report will be created which includes a Genomics Review Report document. This document includes a section of a basic data QC review and a table summarizing values that pass/fail the FDA quality thresholds.\n\n### Basic data QC\n\nPull the basic data qc from various files. This script will output a file final_results/qc_file.txt and also print the summary to to screen.\n\n```\nmkdir $WORKING_BASE/../manual_review\ncd $WORKING_BASE/../manual_review\n\nbsub -Is -q oncology-interactive -G $GROUP -a \"docker(griffithlab/neoang_scripts:version7)\" /bin/bash\npython3 /opt/scripts/get_neoantigen_qc.py -WB $WORKING_BASE -f final_results --yaml $WORKING_BASE/yamls/$CLOUD_YAML\n```\n\n### FDA Quality Thresholds\n\nThis script will output a file final_results/fda_quality_thresholds_report.tsv and also print the summary to to screen.\n\n```\npython3 /opt/scripts/get_FDA_thresholds.py -WB  $WORKING_BASE -f final_results\n```\n\n### HLA Comparison\nThis script will output a file manual_review/hla_comparison.tsv and also print the summary to to screen.\n\n```\npython3 /opt/scripts/hla_comparison.py -WB $WORKING_BASE\nexit\n```\n\n### After Immunogenomics Tumor Board Review\n\nAfter the Immunogenomics Tumor Board Review, both a .tsv and .xlsx file are downloaded from pVACview whihc contains the canidates marked as Accept, Review, Reject, and Pending. These files should be kept in a fould named itb-review-files.\n\n#### Generate Protein Fasta\n\n```bash\ncd $WORKING_BASE\nmkdir ../generate_protein_fasta\ncd ../generate_protein_fasta\nmkdir candidates\nmkdir all\n\n#generate a protein fasta file using the final annotated/evaluated neoantigen candidates TSV as input\n#this will filter down to only those candidates under consideration and use the top transcript\n\n# check the file to find Tumor sample ID in the #CHROM header of VCF\n\nzcat $WORKING_BASE/final_results/annotated.expression.vcf.gz | less\nexport TUMOR_ID=\"100-049-BG004667\"\n\nbsub -Is -q general-interactive -G $GROUP -a \"docker(griffithlab/pvactools:4.0.1)\" /bin/bash\n\npvacseq generate_protein_fasta \\\n  -p $WORKING_BASE/final_results/pVACseq/phase_vcf/phased.vcf.gz \\\n  --pass-only --mutant-only -d 150 \\\n  -s $TUMOR_ID \\\n  --aggregate-report-evaluation {Accept,Review} \\\n  --input-tsv ../itb-review-files/*.tsv  \\\n  $WORKING_BASE/final_results/annotated.expression.vcf.gz \\\n  25 \\\n  $WORKING_BASE/../generate_protein_fasta/candidates/annotated_filtered.vcf-pass-51mer.fa\n\npvacseq generate_protein_fasta \\\n  -p $WORKING_BASE/final_results/pVACseq/phase_vcf/phased.vcf.gz \\\n  --pass-only --mutant-only -d 150 \\\n  -s $TUMOR_ID  \\\n  $WORKING_BASE/final_results/annotated.expression.vcf.gz \\\n  25  \\\n  $WORKING_BASE/../generate_protein_fasta/all/annotated_filtered.vcf-pass-51mer.fa\n\nexit \n```\n\nTo generate files needed for manual review, save the pVAC results from the Immunogenomics Tumor Board Review meeting as $SAMPLE.revd.Annotated.Neoantigen_Candidates.xlsx (Note: if the file is not saved under this exact name the below command will need to be modified).\n\n```\nexport PATIENT_ID=TWJF-5120-28\n\nbsub -Is -q oncology-interactive -G $GROUP -a \"docker(griffithlab/neoang_scripts:version7)\" /bin/bash\n\ncd $WORKING_BASE\nmkdir ../manual_review\n\npython3 /opt/scripts/generate_reviews_files.py -a ../itb-review-files/*.tsv -c ../generate_protein_fasta/candidates/annotated_filtered.vcf-pass-51mer.fa.manufacturability.tsv -variants final_results/variants.final.annotated.tsv -classI final_results/pVACseq/mhc_i/*.all_epitopes.aggregated.tsv -classII final_results/pVACseq/mhc_ii/*.all_epitopes.aggregated.tsv -samp $PATIENT_ID -o ../manual_review/\n\n# Note: You can change the classI and classI IC50/percentile cutoff for coloring\npython3 /opt/scripts/color_peptides51mer.py -p ../manual_review/*Peptides_51-mer.xlsx -probPos C -samp $PATIENT_ID -o ../manual_review/\n```\n\n## Creating Case Final Report locally\n\n### Before Immunogenomics Tumor Board Review\n\nA written case final report will be created which includes a Genomics Review Report document. This document includes a section of a basic data QC review and a table summarizing values that pass/fail the FDA quality thresholds.\n\n### Basic data QC\n\nPull the basic data qc from various files. This script will output a file final_results/qc_file.txt and also print the summary to to screen.\n\n```\ncd $WORKING_BASE\n\ndocker run -it --env HOME --env WORKING_BASE -v $HOME/:$HOME/ -v $HOME/.config/gcloud:/root/.config/gcloud griffithlab/neoang_scripts:version7 /bin/bash\n\ncd $WORKING_BASE\nmkdir ../manual_review\ncd ../manual_review\n\npython3 /opt/scripts/get_neoantigen_qc.py -WB $WORKING_BASE -f final_results --yaml $HOME/yamls/${GCS_CASE_NAME}_immuno_cloud-WDL.yaml\npython3 /opt/scripts/get_FDA_thresholds.py -WB  $WORKING_BASE -f final_results\npython3 /opt/scripts/hla_comparison.py -WB $WORKING_BASE\nexit\n\n```\n\n### After Immunogenomics Tumor Board Review\n\nAfter the Immunogenomics Tumor Board Review, both a .tsv and .xlsx file are downloaded from pVACview whihc contains the canidates marked as Accept, Review, Reject, and Pending. These files should be kept in a fould named itb-review-files.\n\n#### Generate Protein Fasta\n\n```bash\ncd $WORKING_BASE\nmkdir ../generate_protein_fasta\ncd ../generate_protein_fasta\nmkdir candidates\nmkdir all\n\n#generate a protein fasta file using the final annotated/evaluated neoantigen candidates TSV as input\n#this will filter down to only those candidates under consideration and use the top transcript\n\n# check the file to find Tumor sample ID in the #CHROM header of VCF\n\ngzcat $WORKING_BASE/final_results/annotated.expression.vcf.gz | less\nexport TUMOR_SAMPLE_ID=\"100-049-BG004667\"\n\ndocker pull griffithlab/pvactools:4.0.5\ndocker run -it -v $HOME/:$HOME/ --env $WORKING_BASE  --env SAMPLE_ID griffithlab/pvactools:4.0.5 /bin/bash\n\ncd $WORKING_BASE\n\npvacseq generate_protein_fasta \\\n  -p $WORKING_BASE/final_results/pVACseq/phase_vcf/phased.vcf.gz \\\n  --pass-only --mutant-only -d 150 \\\n  -s $TUMOR_SAMPLE_ID \\\n  --aggregate-report-evaluation {Accept,Review} \\\n  --input-tsv ../itb-review-files/*.tsv  \\\n  $WORKING_BASE/final_results/annotated.expression.vcf.gz \\\n  25 \\\n  $WORKING_BASE/../generate_protein_fasta/candidates/annotated_filtered.vcf-pass-51mer.fa\n\npvacseq generate_protein_fasta \\\n  -p $WORKING_BASE/final_results/pVACseq/phase_vcf/phased.vcf.gz \\\n  --pass-only --mutant-only -d 150 \\\n  -s $TUMOR_SAMPLE_ID  \\\n  $WORKING_BASE/final_results/annotated.expression.vcf.gz \\\n  25  \\\n  $WORKING_BASE/../generate_protein_fasta/all/annotated_filtered.vcf-pass-51mer.fa\n\nexit \n```\n\nTo generate files needed for manual review, save the pVAC results from the Immunogenomics Tumor Board Review meeting as $SAMPLE.revd.Annotated.Neoantigen_Candidates.xlsx (Note: if the file is not saved under this exact name the below command will need to be modified).\n\n```\ndocker pull griffithlab/neoang_scripts\ndocker run -it --env WORKING_BASE --env PATIENT_ID -v $HOME/:$HOME/ -v $HOME/.config/gcloud:/root/.config/gcloud griffithlab/neoang_scripts:version7 /bin/bash\n\ncd $WORKING_BASE\nmkdir manual_review\n\npython3 /opt/scripts/generate_reviews_files.py -a itb-review-files/*.tsv -c generate_protein_fasta/candidates/annotated_filtered.vcf-pass-51mer.fa.manufacturability.tsv -classI final_results/pVACseq/mhc_i/*.all_epitopes.aggregated.tsv -classII final_results/pVACseq/mhc_ii/*.all_epitopes.aggregated.tsv -samp $PATIENT_ID -o manual_review/\n\npython3 /opt/scripts/color_peptides51mer.py -p manual_review/*Peptides_51-mer.xlsx -samp $PATIENT_ID -o manual_review/\n```\nOpen colored_peptides51mer.html and copy the table into an excel spreadsheet. The formatting should remain. Utilizing the Annotated.Neoantigen_Candidates and colored Peptides_51-mer for manual review.\n\n# Description of Scripts\n\n## Get Basic QC\n\n```\npython3  /opt/scripts/get_neoantigen_qc.py --help\nusage: get_neoantigen_qc.py [-h] [-WB WB] [-f FIN_RESULTS] [--n_dna N_DNA] [--t_dna T_DNA] [--t_rna T_RNA]\n                            [--concordance CONCORDANCE] [--contam_n CONTAM_N] [--contam_t CONTAM_T]\n                            [--rna_metrics RNA_METRICS] [--strand_check STRAND_CHECK] --yaml YAML\n                            [--fin_variants FIN_VARIANTS]\n\nGet the stats for the basic data QC review in the neoantigen final report.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -WB WB                The path to the gcp_immuno folder of the trial you wish to run the script on, defined as\n                        WORKING_BASE in envs.txt\n  -f FIN_RESULTS, --fin_results FIN_RESULTS\n                        Name of the final results folder in gcp immuno\n  --n_dna N_DNA         File path for aligned normal DNA FDA report table\n  --t_dna T_DNA         File path for aligned tumor DNA FDA report table\n  --t_rna T_RNA         File path for aligned tumor RNA FDA report table\n  --concordance CONCORDANCE\n                        File path for Somalier results for sample tumor/normal sample relatedness\n  --contam_n CONTAM_N   File path for VerifyBamID results for contamination of the normal sample\n  --contam_t CONTAM_T   File path for VerifyBamID results for contamination of the tumor sample\n  --rna_metrics RNA_METRICS\n                        File path for RNA metrics\n  --strand_check STRAND_CHECK\n                        File path for strandness check\n  --yaml YAML           File path for the pipeline YAML file\n  --fin_variants FIN_VARIANTS\n                        File path for the final variants file\n```\n\n## GET FDA metrics\n\n```\npython3  /opt/scripts/get_FDA_thresholds.py --help\nusage: get_FDA_thresholds.py [-h] [-WB WB] [-f FIN_RESULTS] [--n_dna N_DNA] [--t_dna T_DNA] [--t_rna T_RNA]\n                             [--una_n_dna UNA_N_DNA] [--una_t_dna UNA_T_DNA] [--una_t_rna UNA_T_RNA]\n                             [--somalier SOMALIER] [--contam_n CONTAM_N] [--contam_t CONTAM_T]\n\nGet FDA qc stats from various files and determine if they pass or fail.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -WB WB                the path to the gcp_immuno folder of the trial you wish to tun script on, defined as\n                        WORKING_BASE in envs.txt\n  -f FIN_RESULTS, --fin_results FIN_RESULTS\n                        Name of the final results folder in gcp immuno\n  --n_dna N_DNA         file path for aligned normal dna FDA report table\n  --t_dna T_DNA         file path for aligned tumor dna FDA report table\n  --t_rna T_RNA         file path for aligned tumor rna FDA report table\n  --una_n_dna UNA_N_DNA\n                        file path for unaligned normal dna FDA report table\n  --una_t_dna UNA_T_DNA\n                        file path for unaligned tumor dna FDA report table\n  --una_t_rna UNA_T_RNA\n                        file path for unaligned tumor rna FDA report table\n  --somalier SOMALIER   file path for Somalier results for sample tumor/normal sample relatedness\n                        (concordance.somalier.pairs.tsv)\n  --contam_n CONTAM_N   file path for VerifyBamID results for contamination the normal sample\n  --contam_t CONTAM_T   file path for VerifyBamID results for contamination the tumor dna sample\n```\n\n## HLA Comparison\n```\npython3  /opt/scripts/hla_comparison.py --help\nusage: hla_comparison.py [-h] [-WB WB] [-f FIN_RESULTS] [--optitype_n OPTITYPE_N] [--optitype_t OPTITYPE_T]\n                         [--phlat_n PHLAT_N] [--phlat_t PHLAT_T] [--clinical CLINICAL] [--o O]\n\nCompare HLA alleles called by phlat, opitype, and clincal data if available.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -WB WB                The path to the gcp_immuno folder of the trial you wish to run the script on, defined as\n                        WORKING_BASE in envs.txt\n  -f FIN_RESULTS, --fin_results FIN_RESULTS\n                        Name of the final results folder in gcp immuno\n  --optitype_n OPTITYPE_N\n                        File path for optitype normal calls\n  --optitype_t OPTITYPE_T\n                        File path for optitype tumor calls\n  --phlat_n PHLAT_N     File path for phlat normal calls\n  --phlat_t PHLAT_T     File path for phlat tumor calls\n  --clinical CLINICAL   File path for the clinical_calls.txt\n  --o O                 Output folder\n```\n\n## Generate Review Files\n\n```\npython3  /opt/scripts/generate_reviews_files.py --help\nusage: generate_reviews_files.py [-h] -a A -c C [-variants VARIANTS] -classI CLASSI -classII CLASSII -samp SAMP\n                                 [-o O] [-f FIN_RESULTS]\n\nCreate the file needed for the neoantigen manuel review\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -a A                  The path to the ITB Reviewed Candidates tsv file\n  -c C                  The path to candidates annotated_filtered.vcf-pass-51mer.fa.manufacturability.tsv from\n                        the generate_protein_fasta script\n  -variants VARIANTS    The path to the variants.final.annotated.tsv file generated by the pipeline\n  -classI CLASSI        The path to the classI all_epitopes.aggregated.tsv used in pVACseq\n  -classII CLASSII      The path to the classII all_epitopes.aggregated.tsv used in pVACseq\n  -samp SAMP            The name of the sample\n  -o O                  the path to output folder\n  -f FIN_RESULTS, --fin_results FIN_RESULTS\n                        Name of the final results folder in gcp immuno\n```\n\n## Color Peptides 51mer\n\n```\npython3  /opt/scripts/color_peptides51mer.py --help\nusage: color_peptides51mer.py [-h] -p P -samp SAMP [-cIIC50 CIIC50] [-cIpercent CIPERCENT] [-cIIIC50 CIIIC50]\n                              [-cIIpercent CIIPERCENT] [-probPos [PROBPOS [PROBPOS ...]]] [-o O]\n\nColor the 51mer peptide\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -p P                  The path to the Peptides 51 mer\n  -samp SAMP            The name of the sample\n  -cIIC50 CIIC50        Maximum classI IC50 score to annotate\n  -cIpercent CIPERCENT  Maximum classI percentile to annotate\n  -cIIIC50 CIIIC50      Maximum classII IC50 score to annotate\n  -cIIpercent CIIPERCENT\n                        Maximum classII percentile to annotate\n  -probPos [PROBPOS [PROBPOS ...]]\n                        problematic position to make large\n  -o O                  the path to output folder\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgriffithlab%2Fneoag_vaccine_scripts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgriffithlab%2Fneoag_vaccine_scripts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgriffithlab%2Fneoag_vaccine_scripts/lists"}