{"id":21662815,"url":"https://github.com/higlass/gene_annotations","last_synced_at":"2025-04-11T23:43:25.132Z","repository":{"id":71427161,"uuid":"295064821","full_name":"higlass/gene_annotations","owner":"higlass","description":null,"archived":false,"fork":false,"pushed_at":"2024-02-22T04:34:21.000Z","size":21,"stargazers_count":6,"open_issues_count":3,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-25T19:40:43.002Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/higlass.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-09-13T02:41:21.000Z","updated_at":"2025-02-19T14:39:40.000Z","dependencies_parsed_at":"2023-06-04T02:30:16.863Z","dependency_job_id":null,"html_url":"https://github.com/higlass/gene_annotations","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/higlass%2Fgene_annotations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/higlass%2Fgene_annotations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/higlass%2Fgene_annotations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/higlass%2Fgene_annotations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/higlass","download_url":"https://codeload.github.com/higlass/gene_annotations/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248497903,"owners_count":21113982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-25T10:18:13.403Z","updated_at":"2025-04-11T23:43:25.113Z","avatar_url":"https://github.com/higlass.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Installation\n\nThis repository just contains standalone scripts. Make sure to install requirements before running:\n\n```\npip install -r requirements.txt\n```\n\n## Expected format\n\nHiGlass expects the gene annotations file to have following format:\n\n```\n# 1: chr (chr1)\n# 2: txStart (52301201) [9]\n# 3: txEnd (52317145) [10]\n# 4: geneName (ACVRL1)   [2]\n# 5: citationCount (123) [16]\n# 6: strand (+)  [8]\n# 7: refseqId (NM_000020)\n# 8: geneId (94) [1]\n# 9: geneType (protein-coding)\n# 10: geneDesc (activin A receptor type II-like 1)\n# 11: cdsStart (52306258)\n# 12: cdsEnd (52314677)\n# 13: exonStarts (52301201,52306253,52306882,52307342,52307757,52308222,52309008,52309819,52312768,52314542,)\n# 14: exonEnds (52301479,523063\n```\n\nThis bed-like format then needs to be aggregated using `clodius aggregate bedfile` in order to limit the amount of data displayed at once and to enable searching by gene name.\n\n## Example 1: From UCSC GTF file\n\n1. Download the UCSC `gtfToGenePred` binary from http://hgdownload.soe.ucsc.edu/admin/exe/\n\n2. Get the GTF and chromsizes files for an assembly (the `-NP .` parameters ensure that a file isn't downloaded if it's already present) and convert to genepred format:\n\n```\nwget -NP . https://hgdownload.soe.ucsc.edu/goldenPath/danRer10/bigZips/genes/danRer10.refGene.gtf.gz\nwget -NP . https://hgdownload.soe.ucsc.edu/goldenPath/danRer10/bigZips/danRer10.chrom.sizes\ngtfToGenePred -genePredExt -geneNameAsName2 danRer10.refGene.gtf.gz danRer10.refGene.genepred\n```\n\n3. Convert to higlass-compatible format:\n\n```\n\ncat danRer10.refGene.genepred | python genepredext_to_hgbed.py | python exonU.py - \u003e danRer10.refGene.hgbed\nclodius aggregate bedfile --chromsizes-filename danRer10.chrom.sizes danRer10.refGene.hgbed\n```\n\n4. Use in either HiGlass or Resgen using `filetype:beddb`, `datatype:gene-annotations`.\n\n## Example 2: From NCBI GFF\n\nFind the genome information page for sacCer3 at https://www.ncbi.nlm.nih.gov/assembly/GCF_000146045.2/.\n\nDownload the gff file by clicking on \"Download Assembly\" and selecting \"Genomic GFF\".\n\nConvert to higlass-compatible format using these commands:\n\n```\ngzcat GCF_000146045.2_R64_genomic.gff.gz \\\n\t| python scripts/gff_to_jsonl.py - \\\n\t| python scripts/gjsonl_to_chromsizes.py - \u003e sacCer3.chrom.sizes\n\ngzcat GCF_000146045.2_R64_genomic.gff.gz \\\n\t| python scripts/gff_to_jsonl.py - \\\n\t| python scripts/gjsonl_to_hgbed.py - \u003e sacCer3.hgbed\n\nclodius aggregate bedfile sacCer3.hgbed \\\n\t--delimiter $`\\t' \\\n\t--chromsizes-filename sacCer3.chrom.sizes\n```\n\nThe `sacCer2.chrom.sizes` file just contains the names of the chromosomes and their sizes.\n\nView in higlass:\n\n```\nhiglass-manage view sacCer3.hgbed.beddb --datatype gene-annotations\n```\n\nNote that this process omits all RNAs and takes the union of all exons in a gene to represent it as if it were just one transcript.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhiglass%2Fgene_annotations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhiglass%2Fgene_annotations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhiglass%2Fgene_annotations/lists"}