{"id":19154601,"url":"https://github.com/bcgsc/chopstitch","last_synced_at":"2025-05-07T06:32:02.133Z","repository":{"id":77431356,"uuid":"66395240","full_name":"bcgsc/ChopStitch","owner":"bcgsc","description":"Finding putative exons and constructing splicegraphs using Trans-ABySS contigs","archived":false,"fork":false,"pushed_at":"2018-11-08T22:10:20.000Z","size":357,"stargazers_count":11,"open_issues_count":2,"forks_count":1,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-19T19:34:42.060Z","etag":null,"topics":["bloom-filter","exon","putative-exons","splice","splice-graphs","transcriptome-assembly"],"latest_commit_sha":null,"homepage":"http://www.bcgsc.ca/platform/bioinfo/software/chopstitch","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bcgsc.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-08-23T19:05:30.000Z","updated_at":"2025-01-13T09:59:28.000Z","dependencies_parsed_at":"2023-10-20T18:19:26.854Z","dependency_job_id":null,"html_url":"https://github.com/bcgsc/ChopStitch","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2FChopStitch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2FChopStitch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2FChopStitch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2FChopStitch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bcgsc","download_url":"https://codeload.github.com/bcgsc/ChopStitch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252826783,"owners_count":21810184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom-filter","exon","putative-exons","splice","splice-graphs","transcriptome-assembly"],"created_at":"2024-11-09T08:27:31.534Z","updated_at":"2025-05-07T06:32:02.117Z","avatar_url":"https://github.com/bcgsc.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ChopStitch 1.0.0.\n### Exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data\n                     \nChopStitch is a new method for finding putative exons and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in *de novo* assembled RNA-seq data with the help of a Bloom filter that represents the *k*-mer spectrum of WGSS reads. The algorithm also detects base substitutions in transcript sequences corresponding to sequencing or assembly errors, haplotype variations, or putative RNA editing events. The primary output of our tool is a FASTA file containing putative exons. Further, exon edges are interrogated for alternative exon-exon boundaries to detect transcript isoforms, which are reported as splice graphs in dot output format.\n\n### Requirements:\nInstall [pip](https://pip.pypa.io/en/latest/installing/) \n         \nInstall requirements by running:\n```\npip install -r requirements.txt\n```\nInstall [Graphviz version 2.4.0](http://www.graphviz.org/download/) command line tools\n              \n              \n### Install ChopStitch:\nWhen installing ChopStitch from GitHub source the following tools are required:\n\n* [Autoconf](http://www.gnu.org/software/autoconf)\n* [Automake](http://www.gnu.org/software/automake)\n\nTo generate the configure script and make files:\n\n```\n./autogen.sh\n```\n\nTo compile and install ChopStitch in /usr/local:\n\n```\n$ ./configure\n$ make\n$ sudo make install\n```\nTo install ChopStitch in a specified directory:\n\n```\n$ ./configure --prefix=/opt/ChopStitch\n$ make \n$ make install \n```\n\nChopStitch uses OpenMP for parallelization, which requires a modern compiler such as GCC 4.2 or greater. If you have an older compiler, it is best to upgrade your compiler if possible. If you have multiple versions of GCC installed, you can specify a different compiler:\n\n```\n$ ./configure CC=gcc-xx CXX=g++-xx \n```\n\nFor the best performance of ChopStitch, pass `-O3` flag:  \n\n```\n$ ./configure CFLAGS='-g -O3' CXXFLAGS='-g -O3' \n```\n\nTo run ChopStitch, its executables, `CreateBloom` and `FindExons`, should be found in your PATH. If you installed ChopStitch in /opt/ChopStitch, add /opt/ChopStitch/bin to your PATH:\n\n```\n$ PATH=/opt/ChopStitch/bin:$PATH\n```\n\n\n### Run CreateBloom\n\n```\nUsage: CreateBloom [OPTION]... FILES...\nCreates a Bloom filter (BF) to be used for FindExons.\nAcceptable file formats: fastq, fasta, sam, bam, gz, bz, zip.\n\n Options:\n\n  -t, --threads=N  use N parallel threads [1]\n  -k, --kmer=N\tthe length of kmer [50]\n  -d, --fpr1=N\tprimary BF fpr [0.01]\n  -s, --fpr2=N\tsecondary BF fpr [0.01]\n  -r, --ref\tusing FASTA reference as input instead of FASTQ reads. Don't use fpr2 in this case\n      --help\tdisplay help and exit\n      --version\toutput version information and exit\n  `FILES`: input file or set of files seperated by space, in fasta, fastq, sam, and bam formats. The files can also be in compressed (`.gz`, `.bz2`, `.xz`) formats . A list of files containing file names in each row can be passed with `@` prefix.\n\n```\nExample:\n```\n./CreateBloom -t 32 -k 50 --fpr1 0.01 --fpr2 0.01 \u003cFASTQ1\u003e \u003cFASTQ2\u003e\n```\nTo pass a list of files, `list.in`, as input:\n```\n./CreateBloom -t 32 -k 50 --fpr1 0.01 --fpr2 0.01 @list.in\n```\nTo pass a reference fasta file as input:\n```\n./CreateBloom --ref -t 32 -k 50 --fpr1 0.01  \u003cREFERENCE FASTA\u003e \n```\n               \nOutput:\n            \nBfilter.bf : Bloom filter file\n             \nBfilter.inf : Info file required for FindExons \n        \n             \n### Run FindExons\nFind putative exons in TransAbySS Transcriptome assembly file\nAcceptable file formats: FASTA\n```\n  Options:\n   -i, --input-bloom=FILE     load bloom filter from FILE\n   -l, --leniency=N           leniency for exon-exon juction detection [10]\n   -f, --lfactor=N            leniency calculated as ceil(FPR*lfactor*k)\n   -s, --lsplicesignals=csv   Comma separated 3' splicesignals \\n\"\n   -r, --rsplicesignals=csv   Comma separated 5' splicesignals \\n\"\n       --allexons             Also output exons on either ends of contigs\\n\"\n       --help\t                display this help and exit\n       --version\t            output version information and exit\n\n````\n           \nExample:\n```\n./FindExons -i Bfilter.bf \u003cTranscriptome assembly file (TransABySS FASTA file)\u003e\n   \n./FindExons -i Bfilter.bf -s AG,TG,AC,GC,GG -r GT,TT,AT  \u003cTranscriptome assembly file (TransABySS FASTA file)\u003e\n   \n./FindExons -i Bfilter.bf --allexons \u003cTranscriptome assembly file (TransABySS FASTA file)\u003e\n```\nOutput:\nA FASTA file of exons with headers in this format - \n```\n\u003eTranscriptName_startCoordinate_Endcoordinate\n```         \n              \n### Run MakeSplicegraph.py with the putative exons FASTA file outputted by FindExons(confident-exons.fa)\n    \nExample:\n```\npython MakeSplicegraph.py -i \u003cPutative exon in FASTA format\u003e -o \u003cSplicegraph-outputfile\u003e\n```\n       \n### Run Graphviz ccomps to obtain Splice sub-graphs\n    \nExample:   \n```\nccomps \u003cSplicegraph DOT file from MakeSplicegraph.py\u003e -o \u003csplicegraph_subgraph\u003e\n```\n\n### OR Run FindSubcomponents.py -w to generate a DOT file with Splicegraph sub-graphs\n\nThe script can also generate a file with mappings of transcripts to genes (denoted by random numbers) with the -m option\nBy default, it generates a file with mappings of putative exons to genes (also denoted by random numbers)\n\n```\nUsage: FindSubcomponents.py [-h] -g DOTFILE [-m] [-w]\n\nFind graph subcomponents and write output\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -g DOTFILE, --dotfile DOTFILE\n                        Graph DOT file from MakeSplicegraph.py\n  -m, --geneMap         Write a file with mappings of transcripts to genes\n  -w, --writesplicesubgraphs\n                        Write splice subgraphs to DOT file\n```\n         \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbcgsc%2Fchopstitch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbcgsc%2Fchopstitch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbcgsc%2Fchopstitch/lists"}