{"id":38910499,"url":"https://github.com/deeptools/deeptools_intervals","last_synced_at":"2026-01-17T15:17:32.908Z","repository":{"id":62567477,"uuid":"53515465","full_name":"deeptools/deeptools_intervals","owner":"deeptools","description":"A python library for constructing interval trees with associated exon/annotation information","archived":false,"fork":false,"pushed_at":"2019-08-07T18:09:17.000Z","size":266,"stargazers_count":6,"open_issues_count":2,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-09-19T03:28:14.415Z","etag":null,"topics":["bed","bioinformatics","gtf","interval-tree"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deeptools.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-03-09T17:01:03.000Z","updated_at":"2024-02-18T06:34:39.000Z","dependencies_parsed_at":"2022-11-03T16:30:41.664Z","dependency_job_id":null,"html_url":"https://github.com/deeptools/deeptools_intervals","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/deeptools/deeptools_intervals","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deeptools%2Fdeeptools_intervals","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deeptools%2Fdeeptools_intervals/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deeptools%2Fdeeptools_intervals/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deeptools%2Fdeeptools_intervals/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deeptools","download_url":"https://codeload.github.com/deeptools/deeptools_intervals/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deeptools%2Fdeeptools_intervals/sbom","scorecard":{"id":332271,"data":{"date":"2025-08-11","repo":{"name":"github.com/deeptools/deeptools_intervals","commit":"20702772939bebdb477d20c4feceb680474a3330"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/9 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 26 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T03:54:31.211Z","repository_id":62567477,"created_at":"2025-08-18T03:54:31.212Z","updated_at":"2025-08-18T03:54:31.212Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28511149,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T13:38:16.342Z","status":"ssl_error","status_checked_at":"2026-01-17T13:37:44.060Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bed","bioinformatics","gtf","interval-tree"],"created_at":"2026-01-17T15:17:32.368Z","updated_at":"2026-01-17T15:17:32.899Z","avatar_url":"https://github.com/deeptools.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"For those curious, deepTools needs a new interval tree backend that support metadata associated with each interval. I previously made such a thing, called libGTF. Consequently, I'm just working on a (A) a python front-end for that and (B) some modifications specific to deepTools (namely, every interval needs an associated `deepTools_group` tag and exon bounds will be a new attribute associated with transcripts).\n\nNote that murmur3.c and murmur3.h are C implementations of MurmurHash. The C implementation is from [Peter Scott](https://github.com/PeterScott/murmur3) and MurmurHash itself is by [Austin Appleby](https://code.google.com/p/smhasher/wiki/MurmurHash3). Both of these are in the public domain.\n\nktring.h and kseq.h are from [Heng Li](http://lh3lh3.users.sourceforge.net/) and are available under an MIT license.\n\nUsage\n=====\n\nThe only class contained here is `GTF` and it only has only one function that should ever be used, `findOverlaps`.\n\nNote that as is the case in deepTools, this package attempts to convert between chromosome naming systems. Because the conversion may not always be obvious, this can fail.\n\nThe GTF class\n-------------\n\nTo read one or more files into an interval tree, one initializes a new `GTF` class:\n\n    \u003e\u003e\u003e from deeptoolsintervals import GTF\n    \u003e\u003e\u003e gtf = GTF(\"some_file.gtf\")\n\nMultiple files can also be used:\n\n    \u003e\u003e\u003e from deeptoolsintervals import GTF\n    \u003e\u003e\u003e gtf = GTF([\"some_file.gtf\", \"some_other_file.bed.gz\"])\n\nFiles may be optionally compressed and the compression magic number is used to determine this.\n\nFor GTF and BED12 files, exons are not stored by default, this can be changed with the `keepExons` option:\n\n    \u003e\u003e\u003e from deeptoolsintervals import GTF\n    \u003e\u003e\u003e gtf = GTF([\"some_file.gtf\", \"some_other_file.bed.gz\"], keepExons=True)\n\nThe utility of this will be seen later. GTF and BED files may contain comments or browser lines at the beginning, these are ignored.\n\n### Labels\n\nIt's often useful to have multiple groups of intervals. This can be accomplished by assigning a label to each interval. If multiple files are used, then this package will default to assigning the file name as a label to intervals in each input file. Alternatively, labels can be included inside of files. For BED files, this is accomplished as follows:\n\n    chr1\t1\t100\n    chr1\t150\t200\n    #My group\n    chr1\t300\t400\n    #My other group\n\nThese labels **MUST** be unique in BED files. If they are not, then each subsequent instance will have a suffix appended to ensure that it is unique.\n\nFor GTF files, labels are included in the attribute column, by the addition of `deepTools_group` key:value pair:\n\n    chr1       havana  transcript      11869   14409   .       +       .       gene_id \"ENSG00000223972\"; transcript_id \"ENST00000456328\"; deepTools_group \"group 1\";\n\nThese labels do **NOT** need to be unique across files.\n\nLabels can be over-riden with the `labels` option:\n\n    \u003e\u003e\u003e from deeptoolsintervals import GTF\n    \u003e\u003e\u003e gtf = GTF([\"some_file.gtf\", \"some_other_file.bed.gz\"], keepExons=True, labels=[\"foo\", \"bar\", \"quux\", \"sniggly\"])\n\nThe number of provided labels **MUST** match the number encountered. These labels are applied in the order that groups are encountered in the input files. So if in the above example both files contain two groups, then the following would produce the same results but with the labels swapped across files:\n\n    \u003e\u003e\u003e from deeptoolsintervals import GTF\n    \u003e\u003e\u003e gtf = GTF([\"some_other_file.bed.gz\", \"some_file.gtf\"], keepExons=True, labels=[\"foo\", \"bar\", \"quux\", \"sniggly\"])\n\nLabels can also be replaced after the fact:\n\n    \u003e\u003e\u003e from deeptoolsintervals import GTF\n    \u003e\u003e\u003e gtf = GTF([\"some_file.gtf\", \"some_other_file.bed.gz\", \"some_file.gtf\"], keepExons=True)\n    \u003e\u003e\u003e gtf.labels = [\"foo\", \"bar\", \"quux\", \"sniggly\"]\n\n### GTF-specific options\n\nGTF files come with three options specific to them: `exonID`, `transcriptID`, and `transcript_id_designator`. The \"feature\" column (column 3) in a GTF file denotes the type of feature an entry describes. By default, this package only looks at entries with `transcript` or `exon` (with `keepExons=True`) in the feature column. For some use cases, one might instead want to store CDS as exonic intervals or replace transcripts with genes.\n\nTranscripts are the primary entry used by this package and, consequently, each needs to have an associated transcript ID. Duplicate IDs are always ignored, since such a thing would be biologically non-sensical. In GTF files, the transcript id is stored in as `transcript_id \"some ID\";`. If, however, one changes thr `transcriptID` value to something else, such as `gene`, this key:value pair may not longer be present or may not be unique. In such cases, it's beneficial to change the key portion, for example to `gene_id`.\n\n### Finding overlaps\n\nFinding overlaps requires a chromosome, start, and end positions. As with BED files, these coordinates are 0-based half-open. By default, strand and overlap type are completely ignored. This can be overridden:\n\n     \u003e\u003e\u003e o = gtf.findOverlaps(\"chr1\", 0, 100, strand=\"+\", matchType=1, strandType=3)\n\nThis would search for intervals on the `+` strand (ignoring those on `.`, which would have additionally been returned had `strandType=1` been used) that are exactly [0, 100) on chromosome 1. Anyone interested in these more advanced overlap searching methods should look at the gtf.h file and the \"libGTF\" repository for examples.\n\nIt's often the case the a function looking for intervals is doing so by first dividing the genome into chunks and then sending each chunk to a processor for subsequent analysis. In such cases, it's convenient to NOT have processor duplicate processing intervals that may overlap multiple genomic bins. In these circumstances, the `trimOverlap` option can be set to `True`.\n\nThe output of `findOverlaps()` is a list of tuples:\n\n    \u003e\u003e\u003e from deeptoolsintervals import GTF\n    \u003e\u003e\u003e gtf = GTF(\"foo.gtf\", keepExons=True)\n    gtf.findOverlaps(\"chr1\", 1, 20000)\n    [(11868, 14409, 'ENST00000456328', 'group 1', [(11868, 12227), (12612, 12721), (13220, 14409)], '.'), (12009, 13670, 'ENST00000450305', 'group 1', [(12009, 12057), (12178, 12227), (12612, 12697), (12974, 13052), (13220, 13374), (13452, 13670)], '.'), (14403, 29570, 'ENST00000488147', 'group 1', [(14403, 14501), (15004, 15038), (15795, 15947), (16606, 16765), (16857, 17055), (17232, 17368), (17605, 17742), (17914, 18061), (18267, 18366), (24737, 24891), (29533, 29570)], '.'), (17368, 17436, 'ENST00000619216', 'group 2', [(17368, 17436)], '.')]\n\nEach tuple contains the following members (in order): 0-based starting position, 1-based end position, ID (the transcript ID for GTF files, column 4 for BED6/12 files and a string composed of the intervals for BED3 files), a group label, a sorted list of exonic bounds, and the score (column 4 in GTF files and 5 in BED files). If either the input file type does not provide exonic bounds or `keepExons=True` was not used, these bounds will be identical to that in the tuple:\n\n    \u003e\u003e\u003e from deeptoolsintervals import GTF\n    \u003e\u003e\u003e gtf = GTF(\"foo.gtf\")\n    \u003e\u003e\u003e gtf.findOverlaps(\"chr1\", 1, 20000)\n    [(11868, 14409, 'ENST00000456328', 'group 1', [(11868, 14409)], '.'), (12009, 13670, 'ENST00000450305', 'group 1', [(12009, 13670)], '.'), (14403, 29570, 'ENST00000488147', 'group 1', [(14403, 29570)], '.'), (17368, 17436, 'ENST00000619216', 'group 2', [(17368, 17436)], '.')]\n\nIn some cases, it's desirable to have the group labels be numeric, since the regions may be used for further processing and the results sorted or grouped accordingly. The `numericGroups` argument can be used to facilitate this:\n\n    \u003e\u003e\u003e from deeptoolsintervals import GTF\n    \u003e\u003e\u003e gtf = GTF(\"foo.gtf\")\n    \u003e\u003e\u003e gtf.findOverlaps(\"chr1\", 1, 20000, numericGroups=True)\n    [(11868, 14409, 'ENST00000456328', 0, [(11868, 14409)], '.'), (12009, 13670, 'ENST00000450305', 0, [(12009, 13670)], '.'), (14403, 29570, 'ENST00000488147', 0, [(14403, 29570)], '.'), (17368, 17436, 'ENST00000619216', 1, [(17368, 17436)], '.')]\n\nThe Enrichment class\n--------------------\n\nThe `Enrichment` class is a modification of the base `GTF` class, aimed at querying feature types in a region. Creation of the class from one or more BED/GTF files is also similar to the `GTF` class:\n\n    \u003e\u003e\u003e from deeptoolsintervals import Enrichment\n    \u003e\u003e\u003e gtf = Enrichment([\"foo.gtf\", \"bar.bed\"])\n\nFor GTF files, the feature type is the 3rd column. For BED files, the feature type is the file name, though this can be changed with the `labels` option:\n\n    \u003e\u003e\u003e from deeptoolsintervals import Enrichment\n    \u003e\u003e\u003e gtf = Enrichment([\"foo.gtf\", \"bar.bed\"], labels=[\"this will be ignored\", \"peaks\"])\n\nFor GTF files, the label is ignored, but for the sake of simplicity if labels are specified there must be at least as many as there are files. Note that, unlike with `GTF` objects, you can not change labels after creation of an `Enrichment` object.\n\nAll entries in BED and GTF files are stored. For BED12 files, only columns 2/3 are used as region bounds by default. This can be modified with the `keepExons` option:\n\n    \u003e\u003e\u003e from deeptoolsintervals import Enrichment\n    \u003e\u003e\u003e gtf = Enrichment(\"bar.bed12\", keepExons=True)\n\nAll other file types will ignore the `keepExons` option, so this can still be specified with a mix of BED12 and other file types.\n\n### Finding overlaps\n\nFinding overlaps of an `Enrichment` object is similar to that with a `GTF` object. Once again, the `findOverlaps()` method is used, though the `trimOverlap`, `numericGroup`, and `includeStrand` options are not present. Further, instead of a single `start` and `end` value, a list of tuples is used. This last difference facilitates finding overlaps of spliced genes, since the pysam `get_blocks()` method returns this type of data:\n\n    \u003e\u003e\u003e from deeptoolsintervals import Enrichment\n    \u003e\u003e\u003e gtf = Enrichment(\"GRCh38.84.gtf.gz\")\n    \u003e\u003e\u003e gtf.findOverlaps(\"1\", [(65500, 65600), (69900, 70000)])\n    frozenset(['start_codon', 'transcript', 'gene', 'exon', 'CDS'])\n\nThe output is a set containing all overlapped feature types. This is convenient for quick summarization.\n\n## Enrichment of custom attributes\n\nAs of deeptoolsintervals 0.1.8, the `Enrichment` class is able to use a custom attribute key instead of the feature type. This allows you to find overlaps of things like the gene biotype:\n\n    \u003e\u003e\u003e from deeptoolsintervals import Enrichment\n    \u003e\u003e\u003e gtf = Enrichment(\"GRCh38.84.gtf.gz\", keepExons=True, attributeKey=\"gene_biotype\")\n    \u003e\u003e\u003e gtf.findOverlaps(\"1\", [(0, 2000000)])\n    frozenset(['miRNA', 'group 1', 'group 2', 'transcribed_unprocessed_pseudogene', 'processed_pseudogene', 'lincRNA', 'unprocessed_pseudogene', 'protein_coding']))\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeeptools%2Fdeeptools_intervals","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeeptools%2Fdeeptools_intervals","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeeptools%2Fdeeptools_intervals/lists"}