{"id":13752212,"url":"https://github.com/OpenGene/MutScan","last_synced_at":"2025-05-09T18:33:25.438Z","repository":{"id":44589204,"uuid":"63995231","full_name":"OpenGene/MutScan","owner":"OpenGene","description":"Detect and visualize target mutations by scanning FastQ files directly","archived":false,"fork":false,"pushed_at":"2022-02-10T01:52:44.000Z","size":936,"stargazers_count":151,"open_issues_count":6,"forks_count":39,"subscribers_count":21,"default_branch":"master","last_synced_at":"2025-04-10T09:54:06.789Z","etag":null,"topics":["bioinformatics","cancer","detection","fastq","mutation","ngs","somatic","validation","variant","visualization"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenGene.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-07-23T02:34:15.000Z","updated_at":"2025-03-12T09:16:21.000Z","dependencies_parsed_at":"2022-09-03T05:01:28.681Z","dependency_job_id":null,"html_url":"https://github.com/OpenGene/MutScan","commit_stats":null,"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FMutScan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FMutScan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FMutScan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FMutScan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenGene","download_url":"https://codeload.github.com/OpenGene/MutScan/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253303265,"owners_count":21886917,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","cancer","detection","fastq","mutation","ngs","somatic","validation","variant","visualization"],"created_at":"2024-08-03T09:01:01.590Z","updated_at":"2025-05-09T18:33:20.408Z","avatar_url":"https://github.com/OpenGene.png","language":"C","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"readme":"[![install with conda](\nhttps://anaconda.org/bioconda/mutscan/badges/version.svg)](https://anaconda.org/bioconda/mutscan)\n# MutScan\nDetect and visualize target mutations by scanning FastQ files directly\n* [Features](#features)\n* [Application scenarios](#application-scenarios)\n* [Take a quick glance](#take-a-quick-glance)\n* [Download, compile and install](#get-mutscan)\n* [HTML report](#html-report)\n* [JSON report](#json-report)\n* [All options](#all-options)\n* [Customize your mutation file](#mutation-file)\n* [Work with BAM/CRAM](#work-with-bamcram)\n* [Remarks](#remarks)\n* [Cite MutScan](#cite-mutscan)\n\n# Features\n* Ultra sensitive, guarantee that all reads supporting the mutations will be detected\n* Can be 50X+ faster than normal pipeline (i.e. BWA + Samtools + GATK/VarScan/Mutect).\n* Very easy to use and need nothing else. No alignment, no reference genome, no variant call, no...\n* Contains built-in most actionable mutation points for cancer-related mutations, like EGFR p.L858R, BRAF p.V600E...\n* Beautiful and informative HTML report with informative pileup visualization.\n* Multi-threading support.\n* Supports both single-end and pair-end data.\n* For pair-end data, MutScan will try to merge each pair, and do quality adjustment and error correction.\n* Able to scan the mutations in a VCF file, which can be used to visualize called variants.\n* Can be used to filter false-positive mutations. i.e. MutScan can handle highly repetive sequence to avoid false INDEL calling.\n\n# Application scenarios:    \n* you are interested in some certain mutations (like cancer drugable mutations), and want to check whether the given FastQ files contain them.\n* you have no enough confidence with the mutations called by your pipeline, so you want to visualize and validate them to avoid false positive calling.\n* you worry that your pipeline uses too strict filtering and may cause some false negative, so you want to check that in a fast way.\n* you want to visualize the called mutation and take a screenshot with its clear pipeup information.\n* you called a lot of INDEL mutations, and you worry that mainly they are false positives (especially in highly repetive region)\n* you want to validate and visualize every record in the VCF called by your pipeline.\n* ...\n\n# Take a quick glance\n* Sample HTML report: http://opengene.org/MutScan/report.html\n* Sample JSON report: http://opengene.org/MutScan/report.json\n* Dataset for testing: http://opengene.org/dataset.html\n* Command to test\n```shell\nmutscan -1 R1.fq.gz -2 R2.fq.gz\n```\n\n# Get MutScan\n## install with Bioconda\n[![install with conda](\nhttps://anaconda.org/bioconda/mutscan/badges/version.svg)](https://anaconda.org/bioconda/mutscan)\n```shell\nconda install -c bioconda mutscan\n```\n## download binary \nThis binary is only for Linux systems: http://opengene.org/MutScan/mutscan\n```shell\n# this binary was compiled on CentOS, and tested on CentOS/Ubuntu\nwget http://opengene.org/MutScan/mutscan\nchmod a+x ./mutscan\n```\n## or compile from source\n```shell\n# get source (you can also use browser to download from master or releases)\ngit clone https://github.com/OpenGene/MutScan.git\n\n# build\ncd mutscan\nmake\n\n# Install\nsudo make install\n```\n\n# Windows version (may be not the latest version)\nIf you want to compile MutScan on Windows, you should use `cygwin`. We already built one with cygwin-2.6.0/g++ 5.4, and it can be downloaded from:   \nhttp://opengene.org/MutScan/windows_mutscan.zip\n\n# HTML report\n* A HTML report will be generated, and written to the given filename. See http://opengene.org/MutScan/report.html for an example.\n* ***If you run the command in your Linux server and want to view the HTML report on your local system. DO remember to copy all of the `xxxx.html` and `xxxx.html.files` and keep them in the same folder, then click `xxxx.html` to view it in browser.***\n* The default file name is `mutscan.html`, and a folder `mutscan.html.files` will be also generated.\n* By default, an indivudal HTML file will be generated for each found mutation. But you can specify `-s` or `--standalone` to contain all mutations in a single HTML file. Be caution with this mode if you are scanning too many records (for example, scanning VCF), it will give you a very big HTML file and is not loadable by browser.\n* Here is a screenshot for the pileup of a mutation (EGFR p.T790M) generated by MutScan:   \n\n![image](http://www.opengene.org/MutScan/t790m.png)  \n* An pileup of EGFR p.T790M mutation is displayed above. EGFR p.T790M is a very important drugable mutation for lung cancer. \n* The color of each base indicates its quality, and the quality will be shown when mouse over.\n* In first column, d means the edit distance of match, and --\u003e means forward, \u003c-- means reverse \n\n# JSON report\nJSON report is disabled by default. You can enable it by specifying a JSON file name using `-j` or `--json`. A JSON report is like this:\n\n```json\n{\n\t\"command\":\"./mutscan -1 /Users/shifu/data/fq/S010_20170320003-4_ffpedna_pan-cancer-v1_S10_R1_001.fastq -2 /Users/shifu/data/fq/S010_20170320003-4_ffpedna_pan-cancer-v1_S10_R2_001.fastq -h z.html -j z.json -v --simplified=off \",\n\t\"version\":\"1.14.0\",\n\t\"time\":\"2018-05-15  15:48:21\",\n\t\"mutations\":{\n\t\t\"NRAS-neg-1-115258747-2-c.35G\u003eC-p.G12A-COSM565\":{\n\t\t\t\"chr\":\"chr1\",\n\t\t\t\"ref\":[\"TGGATTGTCAGTGCGCTTTTCCCAACACCA\",\"G\",\"CTGCTCCAACCACCACCAGTTTGTACTCAG\"],\n\t\t\t\"reads\":[\n\t\t\t\t{\n\t\t\t\t\t\"breaks\":[31,61,62,76], \n\t\t\t\t\t\"seq\":\"ATATTCATCTACAAAGTGGTTCTGGATTAGCTGGATTGTCAGTGCGCTTTTCCCAACACCAGCTGCTCCAACCACC\",\n\t\t\t\t\t\"qual\":\"eeeeeiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiieiiiiiiiiiiieieeeee\"\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"breaks\":[31,61,62,76], \n\t\t\t\t\t\"seq\":\"ATATTCATCTACAAAGTGGTTCTGGATTAGCTGGATTGTCAGTGCGCTTTTCCCAACACCAGCTGCTCCAACCACC\",\n\t\t\t\t\t\"qual\":\"eeeeeiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiieeeee\"\n\t\t\t\t}\n\t\t\t]\n\t\t},\n\t\t\"PIK3CA-pos-3-178936082-9-c.1624G\u003eA-E542K-COSM760\":{\n\t\t\t\"chr\":\"chr3\",\n\t\t\t\"ref\":[\"AAAGCAATTTCTACACGAGATCCTCTCTCT\",\"A\",\"AAATCACTGAGCAGGAGAAAGATTTTCTAT\"],\n\t\t\t\"reads\":[\n\t\t\t\t{\n\t\t\t\t\t\"breaks\":[22,52,53,83], \n\t\t\t\t\t\"seq\":\"GGAAAATGACAAAGAACAGCTCAAAGCAATTTCTACACGAGATCCTCTCTCTAAAATCACTGAGCAGGAGAAAGATTTTCCAAAGATGTTTCTCAGAACGCTGCAGTCTGCAATTTGTATGAATTCCC\",\n\t\t\t\t\t\"qual\":\"eeeeeiiiQiiiiiieiiiieiSeiiiiiie`iiii`i`iiiiiiiiiiiiii`iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiaiiiiiiiiiiiiiiiiiieiiiiiieeeee\"\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"breaks\":[0,27,28,58], \n\t\t\t\t\t\"seq\":\"GCAATTTCTACACGAGATCCTCTCTCTAAAATCACTGCGCAGGAGAAAGATTTTCTATGGACCACAGGTAAGTGCTAAAATGGAGATTCTCTGTTTCTTTTTCTTTATTACAGAAAAAATAACTGACTTTGGCTGATCTCAGCATGTTTTTACCATACC\",\n\t\t\t\t\t\"qual\":\"AAAAAEEEEiieiiieiiiiiiiiiieiiiiiiiie``iiiiiieiiiiiiiiiieiiiieiieieeiiiSiiiiiieiiiiiiiiiiiiiieiiiiiSiiiiiiiiiiiiieiiiiiiiiiiii`ieiiieiii`ieiiiii`eS``eieEEEAAAAA\"\n\t\t\t\t}\n\t\t\t]\n\t\t}\n\t}\n}\n```\n\n# All options\n```shell\nusage: mutscan -1 \u003cread1_file\u003e -2 \u003cread2_file\u003e [options]...\noptions:\n  -1, --read1                read1 file name, required\n  -2, --read2                read2 file name\n  -m, --mutation             mutation file name, can be a CSV format or a VCF format\n  -r, --ref                  reference fasta file name (only needed when mutation file is a VCF)\n  -h, --html                 filename of html report, default is mutscan.html in work directory\n  -j, --json                 filename of JSON report, default is no JSON report (string [=])\n  -t, --thread               worker thread number, default is 4\n  -S, --support              min read support required to report a mutation, default is 2.\n  -k, --mark                 when mutation file is a vcf file, --mark means only process the records with FILTER column is M\n  -l, --legacy               use legacy mode, usually much slower but may be able to find a little more reads in certain case\n  -s, --standalone           output standalone HTML report with single file. Don't use this option when scanning too many target mutations (i.e. \u003e1000 mutations)\n  -n, --no-original-reads    dont output original reads in HTML and text output. Will make HTML report files a bit smaller\n  -?, --help                 print this message\n```\nThe plain text result, which contains the detected mutations and their support reads, will be printed directly. You can use `\u003e` to redirect output to a file, like:\n```shell\nmutscan -1 \u003cread1_file_name\u003e -2 \u003cread2_file_name\u003e \u003e result.txt\n```\nMutScan generate a very informative HTML file report, default is `mutscan.html` in the work directory. You can change the file name with `-h` argument, like:\n```\nmutscan -1 \u003cread1_file_name\u003e -2 \u003cread2_file_name\u003e -h report.html\n```\n## single-end and pair-end\nFor single-end sequencing data, `-2` argument is omitted:\n```\nmutscan -1 \u003cread1_file_name\u003e\n```\n## multi-threading\n`-t` argument specify how many worker threads will be launched. The default thread number is `4`. Suggest to use a number less than the CPU cores of your system.\n\n# Mutation file\n* Mutation file, specified by `-m`, can be a `CSV file`, or a `VCF file`. \n* If no `-m` specified, MutScan will use the built-in default mutation file with about 60 cancer related mutation points.\n* If a CSV is provided, no reference genome assembly needed.\n* If a VCF is provided, corresponding reference genome assembly should be provided (i.e. ucsc.hg19.fasta), and should not be zipped.\n\n## CSV-format mutation file\nA CSV file with columns of `name`, `left_seq_of_mutation_point`, `mutation_seq`, `right_seq_of_mutation_point` and `chromosome(optional)`\n```csv\n#name, left_seq_of_mutation_point, mutation_seq, right_seq_of_mutation_point, chromosome\nNRAS-neg-1-115258748-2-c.34G\u003eA-p.G12S-COSM563, GGATTGTCAGTGCGCTTTTCCCAACACCAC, T, TGCTCCAACCACCACCAGTTTGTACTCAGT, chr1\nNRAS-neg-1-115252203-2-c.437C\u003eT-p.A146V-COSM4170228, TGAAAGCTGTACCATACCTGTCTGGTCTTG, A, CTGAGGTTTCAATGAATGGAATCCCGTAAC, chr1\nBRAF-neg-7-140453136-15-c.1799T\u003eA -V600E-COSM476, AACTGATGGGACCCACTCCATCGAGATTTC, T, CTGTAGCTAGACCAAAATCACCTATTTTTA, chr7\nEGFR-pos-7-55241677-18-c.2125G\u003eA-p.E709K-COSM12988, CCCAACCAAGCTCTCTTGAGGATCTTGAAG, A, AAACTGAATTCAAAAAGATCAAAGTGCTGG, chr7\nEGFR-pos-7-55241707-18-c.2155G\u003eA-p.G719S-COSM6252, GAAACTGAATTCAAAAAGATCAAAGTGCTG, A, GCTCCGGTGCGTTCGGCACGGTGTATAAGG, chr7\nEGFR-pos-7-55241707-18-c.2155G\u003eT-p.G719C-COSM6253, GAAACTGAATTCAAAAAGATCAAAGTGCTG, T, GCTCCGGTGCGTTCGGCACGGTGTATAAGG, chr7\n```\n`testdata/mutations.csv` gives an example of CSV-format mutation file\n\n## VCF-format mutation file\nA standard VCF can be used as a mutation file, with file extension `.vcf` or `.VCF`. If the mutation file is a VCF file, you should specify the `reference assembly file` by `-r \u003cref.fa\u003e`. For example the command can be:\n```shell\nmutscan -1 R1.fq -2 R2.fq -m target.vcf -r hg19.fa\n```\n\n# Work with BAM/CRAM\nIf you want to run MutScan with BAM/CRAM files, you can use `samtools` to convert them to FASTQ files using `samtools fastq` command, both single-end and paired-end data are supported by latest version of `samtools fastq`.\n\n# Remarks\n* `MutScan` requires at least 50 bp long reads, if your reads are too short, do not use it\n* If you want to extract mutations even with only one read support, add `-S 1` or `--support=1` in the command\n* Feel free to raise an issue if you meet any problem\n\n# Cite MutScan\nShifu Chen, Tanxiao Huang, TieXiang Wen, Hong Li, Mingyan Xu and Jia Gu. MutScan: fast detection and visualization of target mutations by scanning FASTQ data. BMC Bioinformatics. https://doi.org/10.1186/s12859-018-2024-6\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenGene%2FMutScan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOpenGene%2FMutScan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenGene%2FMutScan/lists"}