{"id":19749506,"url":"https://github.com/mskcc/waltz","last_synced_at":"2025-02-28T00:41:04.806Z","repository":{"id":75108597,"uuid":"265999976","full_name":"mskcc/Waltz","owner":"mskcc","description":"Fast, efficient bam metrics, pileups and genotyping","archived":false,"fork":false,"pushed_at":"2020-05-22T02:34:07.000Z","size":6526,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-01-10T21:21:59.736Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mskcc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-22T02:29:50.000Z","updated_at":"2021-03-24T14:20:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"6fc0b091-f4b7-46e4-9c54-a75e546e31da","html_url":"https://github.com/mskcc/Waltz","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2FWaltz","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2FWaltz/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2FWaltz/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mskcc%2FWaltz/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mskcc","download_url":"https://codeload.github.com/mskcc/Waltz/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241079749,"owners_count":19906114,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T02:27:01.219Z","updated_at":"2025-02-28T00:41:04.799Z","avatar_url":"https://github.com/mskcc.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Waltz\n\nA fast, efficient bam pileup and application modules based on it, like coverage metrics, genotyping, signature finding etc.\n\nThis software was developed at the Innovation Lab, Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center.\n\u003cbr/\u003e\n\n\nWaltz has 2 main modules:  \n1. **Bam metrics**: Generate various useful metrics for a given bam file\n2. **Genotyping**: Determine the fragment count and allele fraction of given mutations in given bam file\n\n\n## Java\nJava 1.8 or above is required.\n\n## Dependencies (bundled with the release jar)\n\n1. BioinfoUtils\n2. HTSJDK\n3. Google Guava\n4. Apache Commons IO\n\n\u003cbr/\u003e\n\n### 1. Bam Metrics\n\n#### Generate bam level metrics\n\njava -server -Xms4g -Xmx4g -cp Waltz.jar org.mskcc.juber.waltz.countreads.CountReads bam-file coverageThreshold canonical-transcripts-bed-file intervals-bed-file\n\nwhere  \ncoverageThreshold is the average coverage above which a contiguous region should be considered covered (suggested value: 5)  \ncanonical-transcripts-bed-file is the bed file with all exons in across the genomes (included above)  \nintervals-bed-file is the bed file of chosen genomic intervals  \n\n\nThis produces 3 files:  \n.covered-regions: regions of contiguous coverage, annotated with canonical transcripts. Useful for checking what regions are actually covered in the bam file. Columns: chr, start, end, length, average total coverage in the contiguous region.\n\n.read-counts: bam-level stats. Columns: bam file name, total reads, unmapped reads, total mapped reads, unique mapped reads, duplicate fraction, total on-target reads, unique on-target reads, total on-target rate, unique on-target rate\n\n.fragment-sizes: fragment size distribution. Columns: fragment-size, total frequency, unique frequency\n\n \n#### Generate metrics specific to given genomic regions\n\njava -server -Xms4g -Xmx4g -cp Waltz.jar org.mskcc.juber.waltz.Waltz PileupMetrics mappinngQualityThreshold bam-file reference-fasta intervals-bed-file\n\nThis produces 4 different files:\n-pileup.txt: per-position fragment count for different alleles. Columns: chr, position, ref, depth (including N's), fragment counts for A, C, G, T, insertions, deletions, soft clip start, soft clip end, hard clip start, hard clip end\n\n-pileup-without-duplicates.txt: similar to above but only unique fragments are counted\n\n-intervals.txt: stats per genomic interval. Columns: chr, start, end, interval name, interval length, peak coverage, average coverage, GC fraction, number of fragments mapped\n\n-intervals-without-duplicates.txt: similar to above but only unique fragments are considered\n\n\n#### Collect metrics across samples\n\nRun aggregate-bam-metrics.sh script in the folder where the above output files are present to collect metrics across samples.\n\nThis produces 3 main files with self-explanatory headers.\nread-counts.txt: collection of metrics from *.read-counts files\n\nwaltz-coverage.txt: per sample coverage calculated across chosen genomic intervals\n\nfragment-sizes.txt: fragment size distributions for all samples\n\n \n\n\n### 2. Genotyping\n\njava -server -Xms4g -Xmx4g -cp Waltz.jar org.mskcc.juber.waltz.Waltz Genotyping mappinngQualityThreshold bam-file reference-fasta intervals-bed-file mutations-maf-file\n\nwhere\nmutations-maf-file is a file in maf format specifying the mutations to be profiled in the given bam. Required fields are Chromosome, Start_Position, Variant_Type, Reference_Allele and Tumor_Seq_Allele2\n\nThis will produce a -genotypes.maf file with 4 addtional columns at the end: Waltz_total_t_depth, Waltz_total_t_alt_count, Waltz_MD_t_depth and Waltz_MD_t_alt_count. All sample-specific columns will be made empty while all the mutation-specific information will be retained. Tumor_Sample_Barcode will contain the name of the sample being genotyped.\n\n#### Collect genotypes across multiple samples\n\nRun aggregate-genotypes.sh script in the folder where the -genotypes.maf files are present to collect genotyping information across multiple samples. The output is a genotypes.maf file. \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmskcc%2Fwaltz","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmskcc%2Fwaltz","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmskcc%2Fwaltz/lists"}