{"id":41312822,"url":"https://github.com/jts/smrest","last_synced_at":"2026-01-23T05:26:42.406Z","repository":{"id":224576606,"uuid":"756428691","full_name":"jts/smrest","owner":"jts","description":"Tumour-only somatic mutation calling using long reads","archived":false,"fork":false,"pushed_at":"2024-03-04T14:33:19.000Z","size":33320,"stargazers_count":20,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-04-16T03:49:46.900Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jts.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-12T16:40:30.000Z","updated_at":"2024-04-07T20:26:23.000Z","dependencies_parsed_at":"2024-03-01T18:39:22.639Z","dependency_job_id":null,"html_url":"https://github.com/jts/smrest","commit_stats":null,"previous_names":["jts/smrest"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jts/smrest","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fsmrest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fsmrest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fsmrest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fsmrest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jts","download_url":"https://codeload.github.com/jts/smrest/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jts%2Fsmrest/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28680692,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T04:33:33.518Z","status":"ssl_error","status_checked_at":"2026-01-23T04:33:30.433Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-23T05:26:41.832Z","updated_at":"2026-01-23T05:26:42.397Z","avatar_url":"https://github.com/jts.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# smrest\n\nsmrest is a prototype somatic mutation caller for single molecule long reads. It uses haplotype phasing patterns for tumour samples that have a sigificant proportion of normal cells (purity \u003e 0.3, \u003c 0.8) to identify somatic mutations. For more details, see the preprint linked below.\n\n## Citation\n\n[Simpson, J.T., Detecting Somatic Mutations Without Matched Normal Samples Using Long Reads, BioRxiv](https://www.biorxiv.org/content/10.1101/2024.02.26.582089v1)\n\n## Compiling\n\nThis program is written in Rust and uses the [Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) build system. After you have installed Cargo, you can compile this software from github as follows:\n\n```\ngit clone https://github.com/jts/smrest.git\ncd smrest\ncargo build --release\n```\n\n## Usage\n\nsmrest has three steps: first it finds heterozygous SNPs using a panel of known population variants from gnomAD, then these are phased using `whatshap`, followed by somatic mutation calling. These steps can be run manually, or using a Snakemake pipeline we have provided for convenience. We describe both methods here, using a small demo dataset that is descibed in the following section.\n\n### Demo data preparation\n\nTo demonstrate the usage of this program, we have prepared a small dataset consisting of ONT reads for chromosome 20 of COLO829/COLO829BL. To get the demo data you can use the snakemake pipeline (for simplicitly all commands shown below will assume you are running in the `smrest/workflow` directory, if you are running from a different path you will need to adjust the commands):\n\n```\nsnakemake prepare_demo\n```\n\nThis command will place the reads in `data/COLO829.mixture.chr20.bam`. `smrest` needs a set of population variants to estimate the local of heterozygous SNPs and a BED file describing the callable regions of the genome. You can download these resources using snakemake as well:\n\n```\nsnakemake prepare_resources\n```\n\n### Mutation calling (manual)\n\nThere are three steps to calling somatic mutations with smrest. First, we find heterozygous SNPs with `smrest genotype-hets`:\n\n```\nsmrest genotype-hets -c resources/genotype_sites.vcf -r chr20 -g resources/GRCh38_no_alt_analysis_set.GCA_000001405.15.fna data/COLO829.mixture.chr20.bam \u003e COLO829.gnomad_genotype.vcf\n```\n\nNext, we phase these hets using whatshap:\n\n```\nwhatshap phase --ignore-read-groups -r resources/GRCh38_no_alt_analysis_set.GCA_000001405.15.fna -o COLO829.gnomad_genotype_whatshap_phased.vcf COLO829.gnomad_genotype.vcf data/COLO829.mixture.chr20.bam\n```\n\nFinally, we call somatic mutations:\n\n```\nsmrest call -m haplotype-likelihood --purity 0.5 -r chr20 -g resources/GRCh38_no_alt_analysis_set.GCA_000001405.15.fna -p COLO829.gnomad_genotype_whatshap_phased.vcf -o COLO829.smrest_called_regions.bed data/COLO829.mixture.chr20.bam \u003e COLO829.smrest_somatic_calls.vcf\n```\nThese mutations calls are over all regions of the genome that could be phased. To produce the final call set we intersect the phased BED file with the GIAB best practices BED:\n\n```\nbedtools intersect -b resources/GRCh38_notinalldifficultregions.bed -a COLO829.smrest_called_regions.bed \u003e COLO829.smrest_best_practice_called_regions.bed\n```\n\nThen use this BED to produce the final call set:\n\n```\nbcftools filter -T COLO829.smrest_best_practice_called_regions.bed COLO829.smrest_somatic_calls.vcf \u003e COLO829.smrest_somatic_calls_final.vcf\n```\n\n### Mutation calling (pipeline)\n\nA Snakemake pipeline is provided in `workflow/Snakemake` to automate these three steps. It will also parallelize the process across 10Mb segments of the genome. It assumes the BAM file is in `data/` (as in the demo data) and the pipeline can be run by building the `smrest_calls/\u003csample\u003e/\u003csample\u003e.whatshap.final_q20_pass_calls.vcf` target, where \u003csample\u003e is the prefix of the BAM file. For example:\n\n```\nsnakemake smrest_calls/COLO829.mixture.chr20/COLO829.mixture.chr20.whatshap.final_q20_pass_calls.vcf\n```\n\n## License\n\nMIT\n\n## Acknowledgements\n\nThis program reuses code originally developed by Edge et al for the [Longshot](https://github.com/pjedge/longshot) variant caller.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjts%2Fsmrest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjts%2Fsmrest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjts%2Fsmrest/lists"}