{"id":13710406,"url":"https://github.com/runsheng/trackcluster","last_synced_at":"2025-05-06T19:31:01.070Z","repository":{"id":37840289,"uuid":"144128406","full_name":"Runsheng/trackcluster","owner":"Runsheng","description":"An analysis pipeline for Nanopore direct-RNA sequencing ","archived":false,"fork":false,"pushed_at":"2024-11-26T07:18:18.000Z","size":3364,"stargazers_count":13,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-27T15:26:58.324Z","etag":null,"topics":["bioinformatics","longread","nanopore","rna-seq"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Runsheng.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-09T09:08:38.000Z","updated_at":"2025-03-01T08:15:43.000Z","dependencies_parsed_at":"2024-02-17T16:45:16.953Z","dependency_job_id":"0987ee60-b489-4ac9-bb66-d9ca5f659751","html_url":"https://github.com/Runsheng/trackcluster","commit_stats":{"total_commits":102,"total_committers":1,"mean_commits":102.0,"dds":0.0,"last_synced_commit":"514e634bfeb485779c6b967756ba3e6d5ddda77b"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Runsheng%2Ftrackcluster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Runsheng%2Ftrackcluster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Runsheng%2Ftrackcluster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Runsheng%2Ftrackcluster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Runsheng","download_url":"https://codeload.github.com/Runsheng/trackcluster/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252753218,"owners_count":21798934,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","longread","nanopore","rna-seq"],"created_at":"2024-08-02T23:00:55.678Z","updated_at":"2025-05-06T19:30:56.051Z","avatar_url":"https://github.com/Runsheng.png","language":"Python","funding_links":[],"categories":["Software packages"],"sub_categories":["Transcript discovery and quantification"],"readme":"# TrackCluster\n![PyPI](https://img.shields.io/pypi/v/trackcluster?color=green)\n\nTrackcluster is an isoform calling and quantification pipeline for long RNA/cDNA reads.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Requirements](#requirements)\n- [Scripts](#scripts)\n- [Walkthrough](#walkthrough)\n\n####\nThe ongoing development can be found in the [\"dev\"](https://github.com/Runsheng/trackcluster/tree/dev) branch.\n#### TODO: \n1. fix \"fusion\" classification. 2. speed for clusterj. 3. splicing leader/5' indicator finding using pyssw. \n4.add function to get the CDS start (maybe ATG) and CDS end.   \n\n## \u003ca name=\"overview\"\u003e\u003c/a\u003eOverview\nA pipeline for reference-based isoform identification and quantification using long reads. This pipeline was designed to use **only** long and nosisy reads to make a valid transcriptome. An indicator for the intact 5' could be very helpful to the pipeline, i.e, the splicing leader in the mRNA of nematodes. \n\nThe major input/output for this pipeline is \"bigg\"--[\"bigGenePred\"](https://github.com/Runsheng/trackcluster/blob/master/test/bigGenePred.as) format. \n\n## \u003ca name=\"requirements\"\u003e\u003c/a\u003eRequirements\n\n1. developed on python 3.9, tested on python 3.6 and above (or 2.7.10+), should work with most of the py3 versions\n2. samtools V2.0+ , bedtools V2.24+  and minimap2 V2.24+ in your $PATH\n```bash\n# install the external bins with conda\nconda install -c bioconda samtools\nconda install -c bioconda bedtools\nconda install -c bioconda minimap2\n```\n\n## Installation\n```bash\n# use pip from pypi\npip install trackcluster\n# or pip from source code for the latest version\ngit clone https://github.com/Runsheng/trackcluster.git\npip install ./trackcluster\n```\n\n## Recommendations\n1. UCSC Kent source tree (for generating binary track), used only in bigg2b.py\n\n## Scripts\nAll scripts can be run directly from shell after pip installation.\n- **trackrun.py**: the main script for trackcluser run\n- **bam2bigg.py**: convert the mapped read from the bam file, to bigg track format\n- **gff2bigg.py**: convert the isoform annotation in gff3 to bigg format \n- bigg2b.py: convert the bigg track into binary format for better loading in IGV/UCSC\n- biggmutant.py: change the value of one column in a bigglist\n\n## \u003ca name=\"walkthrough\"\u003e\u003c/a\u003eWalkthrough\n```bash\n# test if all dependencies are installed\ntrackrun.py test --install\n\n# prepare the reference annotation bed file from gff file\n# tested on Ensembl, WormBase and Arapost gff\ngff2bigg.py -i ensemblxxxx.gff3 -o ref.bed \n# WormBase full gff contains too many information, need to extract the lines from WormBase only\ncat c_elegans.PRJNA13758.WS266.annotations.gff3 |grep WormBase \u003e ws266.gff\ngff2bigg.py -i ws266.gff -o ref.bed\n# the ref.bed can be sorted to speed up the analysis\nbedtools sort -i ref.bed \u003e refs.bed # refs.bed contains the sorted, know transcripts from gff annotation\n\n# generate the read track from minimap2 bam file\nbam2bigg.py -b group1.bam -o group1.bed\nbam2bigg.py -b group2.bam -o group2.bed\n\n# merge the bed file and sort\ncat group1.bed group2.bed \u003e read.bed\nbedtools sort -i read.bed \u003e reads.bed\n\n# Examples for running commands:\ntrackrun.py clusterj -s reads.bed -r refs.bed -t 40 # run in junction mode, will generate the isoform.bed\ntrackrun.py count -s reads.bed -r refs.bed -i isoform.bed # generate the csv file for isoform expression\n# alternative for cluster\ntrackrun.py cluster -s reads.bed -r refs.bed -t 40 # run in exon/intron intersection mode， slower, will generate the isoform.bed\n\n# the post analysis could include the classification of novel isoforms\ntrackrun.py desc --isoform isoform.bed --reference ref.bed # generate the description for each novel isoform\n# this part can be run directly on reads, to count the frequency of splicing events in reads, like intron_retention\ntrackrun.py addgene -r ref.bed -s reads.bed # will generate reads_gene.bed\ntrackrun.py desc --isoform reads_gene.bed --reference ref.bed # will generated reads_desc.txt and reads_class12.txt \n\n```\n\n\n## Citation\nPlease kindly cite our paper for using trackcluster in your work.\n\nLi, R., Ren, X., Ding, Q., Bi, Y., Xie, D. and Zhao, Z., 2020. Direct full-length RNA sequencing reveals unexpected transcriptome complexity during *Caenorhabditis elegans* development. **Genome research**, 30(2), pp.287-298.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunsheng%2Ftrackcluster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frunsheng%2Ftrackcluster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunsheng%2Ftrackcluster/lists"}