{"id":13752453,"url":"https://github.com/OpenGene/GeneFuse","last_synced_at":"2025-05-09T19:32:11.892Z","repository":{"id":90140128,"uuid":"79531214","full_name":"OpenGene/GeneFuse","owner":"OpenGene","description":"Gene fusion detection and visualization","archived":false,"fork":false,"pushed_at":"2022-02-21T08:07:06.000Z","size":404,"stargazers_count":114,"open_issues_count":33,"forks_count":62,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-08-03T09:03:59.530Z","etag":null,"topics":["alk","bioinformatics","cancer","cosmic","eml4","fusion","gene","ret","ros1"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenGene.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-01-20T06:12:55.000Z","updated_at":"2024-05-18T23:06:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"bcbec047-829f-48f6-b050-2a6b5e63fbd0","html_url":"https://github.com/OpenGene/GeneFuse","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FGeneFuse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FGeneFuse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FGeneFuse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FGeneFuse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenGene","download_url":"https://codeload.github.com/OpenGene/GeneFuse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224880776,"owners_count":17385367,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alk","bioinformatics","cancer","cosmic","eml4","fusion","gene","ret","ros1"],"created_at":"2024-08-03T09:01:06.076Z","updated_at":"2024-11-16T05:30:33.563Z","avatar_url":"https://github.com/OpenGene.png","language":"C","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"readme":"[![install with conda](\nhttps://anaconda.org/bioconda/genefuse/badges/version.svg)](https://anaconda.org/bioconda/genefuse)\n# GeneFuse\nA tool to detect and visualize target gene fusions by scanning FASTQ files directly. This tool accepts FASTQ files and reference genome as input, and outputs detected fusion results in TEXT, JSON and HTML formats.\n\n# Take a quick glance of the informative report\n* Sample HTML report: http://opengene.org/GeneFuse/report.html\n* Sample JSON report: http://opengene.org/GeneFuse/report.json\n* Dataset for testing: http://opengene.org/dataset.html  Please download the paired-end FASTQ files for GeneFuse testing (Illumina platform)\n\n# Get genefuse program\n## install with Bioconda\n[![install with conda](\nhttps://anaconda.org/bioconda/genefuse/badges/version.svg)](https://anaconda.org/bioconda/genefuse)\n```shell\nconda install -c bioconda genefuse\n```\n## download binary\nThis binary is only for Linux systems, http://opengene.org/GeneFuse/genefuse\n```shell\n# this binary was compiled on CentOS, and tested on CentOS/Ubuntu\nwget http://opengene.org/GeneFuse/genefuse\nchmod a+x ./genefuse\n```\n## or compile from source\n```shell\n# get source (you can also use browser to download from master or releases)\ngit clone https://github.com/OpenGene/genefuse.git\n\n# build\ncd genefuse\nmake\n\n# Install\nsudo make install\n```\n\n# Usage\nYou should provide following arguments to run genefuse\n* the reference genome fasta file, specified by `-r` or `--ref=`\n* the fusion setting file, specified by `-f` or `--fusion=`\n* the fastq file(s), specified by `-1` or `--read1=` for single-end data. If dealing with pair-end data, specify the read2 file by `-2` or `--read2=`\n* use `-h` or `--html=` to specify the file name of HTML report\n* use `-j` or `--json=` to specify the file name of JSON report\n* the plain text result is directly printed to STDOUT, you can pipe it to a file using a `\u003e`\n\n## Example\n```shell\ngenefuse -r hg19.fasta -f genes/druggable.hg19.csv -1 genefuse.R1.fq.gz -2 genefuse.R2.fq.gz -h report.html \u003e result\n```\n\n## Reference genome\nThe reference genome should be a single whole FASTA file containg all chromosome data. This file shouldn't be compressed. For human data, typicall `hg19/GRch37` or `hg38/GRch38` assembly is used, which can be downloaded from following sites:\n* `hg19/GRch37`: https://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz\n* `hg38/GRch38`: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz  \nRemember to decompress hg19.fa.gz/hg38.fa.gz since it is gzipped and is not supported currently.\n\n## Fusion file\nThe fusion file is a list of coordinated target genes together with their exons. A sample is:\n```CSV\n\u003eEML4_ENST00000318522.5,chr2:42396490-42559688\n1,42396490,42396776\n2,42472645,42472827\n3,42483641,42483770\n4,42488261,42488434\n5,42490318,42490446\n...\n\n\u003eALK_ENST00000389048.3,chr2:29415640-30144432\n1,30142859,30144432\n2,29940444,29940563\n3,29917716,29917880\n4,29754781,29754982\n5,29606598,29606725\n...\n```\nThe coordination system should be consistent with the reference genome.  \n### Fusion files provided in this package\nFour fusion files are provided with `genefuse`:\n1. `genes/druggable.hg19.csv`: all druggable fusion genes based on `hg19/GRch37` reference assembly.\n2. `genes/druggable.hg38.csv`: all druggable fusion genes based on `hg38/GRch38` reference assembly.\n3. `genes/cancer.hg19.csv`: all COSMIC curated fusion genes (http://cancer.sanger.ac.uk/cosmic/fusion) based on `hg19/GRch37` reference assembly.\n4. `genes/cancer.hg38.csv`: all COSMIC curated fusion genes (http://cancer.sanger.ac.uk/cosmic/fusion) based on `hg38/GRch38` reference assembly.\n\nNotes:\n* `genefuse` runs much faster with `druggable` genes than `cancer` genes, since `druggable` genes are only a small subset of `cancer` genes. Use this one if you only care about the fusion related personalized medicine for cancers.\n* The `cancer` genes should be enough for most cancer related studies, since all COSMIC curated fusion genes are included.\n* If you want to create a custom gene list, please follow the instructions given on next section.\n### Create a fusion file based on hg19 or hg38\nIf you'd like to create a custom fusion file, you can use `scripts/make_fusion_genes.py`   \nAs the script uses `refFlat.txt` file to determine genomic coordinates of exons, you need to download a `refFlat.txt` file from UCSC Genome Browser in advance. Of course, the choice of using either hg19 or hg38 is up to you.\n\n- For hg19: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/refFlat.txt.gz\n- For hg38: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refFlat.txt.gz\n\nPlease make sure unzip the file to txt format before you continue\n\nAs for the input gene list file, all genes should be listed in separate lines.  By default, the longest transcript will be used. However, you can specify a different transcript by adding the transcript ID to the end of a gene. The gene and its transcript should be separated by a tab or a space. Please note that each gene should be the HGNC official gene symbol, and each transcript should be NCBI RefSeq transcript ID. \n\nAn example of gene list file:\n\n```\nBRCA2\tNM_000059\nFAM155A\nIRS2\n```\n\nWhen both input gene list file (`gene_list.txt`) and `refFlat.txt` file are prepared, you can use following command to generate a user-defined fusion file (`fusion.csv`):\n\n```shell\npython3 scripts/make_fusion_genes.py gene_list.txt -r /path/to/refflat -o fusion.csv\n```\n\n# HTML report\nGeneFuse can generate very informative and interactive HTML pages to visualize the fusions with following information:\n* the fusion genes, along with their transcripts.\n* the inferred break point with reference genome coordinations.\n* the inferred fusion protein, with all exons and the transcription direction.\n* the supporting reads, with all bases colorized according to their quality scores.\n* the number of supporting reads, and how many of them are unique (the rest may be duplications)\n## A HTML report example\n![image](http://www.opengene.org/GeneFuse/eml4alk.png)  \nSee the HTML page of this picture: http://opengene.org/GeneFuse/report.html\n\n# All options\n```\noptions:\n  -1, --read1       read1 file name (string)\n  -2, --read2       read2 file name (string [=])\n  -f, --fusion      fusion file name, in CSV format (string)\n  -r, --ref         reference fasta file name (string)\n  -u, --unique      specify the least supporting read number is required to report a fusion, default is 2 (int [=2])\n  -d, --deletion    specify the least deletion length of a intra-gene deletion to report, default is 50 (int [=50])\n  -h, --html        file name to store HTML report, default is genefuse.html (string [=genefuse.html])\n  -j, --json        file name to store JSON report, default is genefuse.json (string [=genefuse.json])\n  -t, --thread      worker thread number, default is 4 (int [=4])\n  -?, --help        print this message\n```\n\n# Cite GeneFuse\nIf you used GeneFuse in you work, you can cite it as: \n\nShifu Chen, Ming Liu, Tanxiao Huang, Wenting Liao, Mingyan Xu and Jia Gu. GeneFuse: detection and visualization of target gene fusions from DNA sequencing data. International Journal of Biological Sciences, 2018; 14(8): 843-848. doi: 10.7150/ijbs.24626\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenGene%2FGeneFuse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOpenGene%2FGeneFuse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenGene%2FGeneFuse/lists"}