{"id":19434519,"url":"https://github.com/mlin/iitj","last_synced_at":"2025-10-26T17:41:51.670Z","repository":{"id":46044426,"uuid":"514511086","full_name":"mlin/iitj","owner":"mlin","description":"Implicit Interval Trees (for Java)","archived":false,"fork":false,"pushed_at":"2022-08-08T02:12:18.000Z","size":450,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-25T06:29:15.737Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-07-16T07:34:57.000Z","updated_at":"2022-07-18T03:51:29.000Z","dependencies_parsed_at":"2022-07-18T22:59:34.671Z","dependency_job_id":null,"html_url":"https://github.com/mlin/iitj","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/mlin/iitj","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlin%2Fiitj","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlin%2Fiitj/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlin%2Fiitj/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlin%2Fiitj/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlin","download_url":"https://codeload.github.com/mlin/iitj/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlin%2Fiitj/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278886406,"owners_count":26062974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T14:46:41.575Z","updated_at":"2025-10-08T03:43:20.608Z","avatar_url":"https://github.com/mlin.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# iitj\n**Implicit Interval Trees (for Java)**\n\n[![build](https://github.com/mlin/iitj/actions/workflows/build.yml/badge.svg?branch=main)](https://github.com/mlin/iitj/actions/workflows/build.yml) [![javadoc](https://img.shields.io/badge/javadoc-latest-brightgreen)](https://mlin.github.io/iitj/javadoc/latest) ![GitHub release (latest by date)](https://img.shields.io/github/v/release/mlin/iitj)\n\niitj provides an in-memory data structure for indexing [begin,end) position intervals, such as genome feature annotations or time windows, and answering requests for all items overlapping a query interval. It stores all the intervals in a few primitive arrays instead of individual Objects, achieving space and serialization efficiency; but it's currently read-only once built. The design is based on [Heng Li's cgranges](https://github.com/lh3/cgranges), differing in some implementation details.\n\nOur original motivation was to have a data structure suitable to [broadcast across a Spark cluster](https://spark.apache.org/docs/3.2.1/rdd-programming-guide.html#broadcast-variables) for distributed joining/filtering of big genomic datasets with smaller reference annotations.\n\n## Installation\n\nWith the current [![release version](https://img.shields.io/github/v/release/mlin/iitj)](https://github.com/mlin/iitj/releases) = X.Y.Z,\n\n**Gradle:** add to your `gradle.build`,\n\n```groovy\nrepositories {\n    maven {\n        url \"https://raw.githubusercontent.com/wiki/mlin/iitj/mvn-repo/\"\n    }\n}\ndependencies {\n    implementation 'net.mlin:iitj:X.Y.Z'\n}\n```\n\n**Maven:** add to your `pom.xml`,\n\n```xml\n    \u003crepositories\u003e\n        \u003crepository\u003e\n            \u003cid\u003eiitj\u003c/id\u003e\n            \u003curl\u003ehttps://raw.githubusercontent.com/wiki/mlin/iitj/mvn-repo/\u003c/url\u003e\n        \u003c/repository\u003e\n    \u003c/repositories\u003e\n    \u003cdependencies\u003e\n        \u003cdependency\u003e\n            \u003cgroupId\u003enet.mlin\u003c/groupId\u003e\n            \u003cartifactId\u003eiitj\u003c/artifactId\u003e\n            \u003cversion\u003eX.Y.Z\u003c/version\u003e\n        \u003c/dependency\u003e\n    \u003c/dependencies\u003e\n```\n\n## Quick start\n\nImport any of `net.mlin.iitj.{Double,Float,Integer,Long,Short}IntervalTree` according to the desired interval position type. The following example will use `IntegerIntervalTree`.\n\n```java\nimport net.mlin.iitj.IntegerIntervalTree;\n\nIntegerIntervalTree.Builder builder = new IntegerIntervalTree.Builder();\nint id0 = builder.add(0, 23);   // id0 == 0\nint id1 = builder.add(12, 34);  // id1 == 1\nint id2 = builder.add(34, 56);  // id2 == 2\nIntegerIntervalTree it = builder.build();\n\nList\u003cIntegerIntervalTree.QueryResult\u003e hits = it.queryOverlap(22, 25);\nfor (IntegerIntervalTree.QueryResult hit : hits) {\n    System.out.println(\n        String.join(\"\\t\", String.valueOf(hit.beg), String.valueOf(hit.end), String.valueOf(hit.id))\n    );\n}\n\n/*\noutput:\n0       23      0\n12      34      1\n*/\n```\n\nAll [beg, end) interval positions are *half-open*, with inclusive begin position and exclusive end position. Given a query interval [x,y), intervals [w,x) and [y,z) are abutting but *not* overlapping, so would not be returned by the overlap query. (See [Dijkstra's note](https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) on this convention.)\n\nUse the interval IDs, reflecting the order in which they're added to the builder, to associate results with other data/objects if needed.\n\nSee [![javadoc](https://img.shields.io/badge/javadoc-latest-brightgreen)](https://mlin.github.io/iitj/javadoc/latest) for other available query methods.\n\n## Design notes\n\n**Data structure layout.** First please review the original design of [cgranges](https://github.com/lh3/cgranges); we have some [extra notes](https://github.com/mlin/iitii/blob/master/notes_on_cgranges.md) to help.\n\ncgranges handles a few complications in the typical case that its implicit binary tree isn't [\"perfect\"](https://xlinux.nist.gov/dads/HTML/perfectBinaryTree.html) (that is, the sorted interval array length *N* isn't exactly a power of two minus one). Instead of treating the whole array as one partial tree, we view it as a series of perfect trees, as suggested by [Brodal, Fagerberg \u0026 Jacob (2001) §3.3](https://tidsskrift.dk/brics/article/download/21696/19132). Write *N* as a sum of powers of two, e.g. *N* = 12345 = 8192 + 4096 + 32 + 16 + 8 + 1, then interpret each corresponding slice of the array as an implicit perfect tree (plus one extra \"index node\").\n\nAlthough the code for this solution isn't really simpler than cgranges, it seems easier to explain conceptually.\n\n**Java/JVM specifics.** The implict tree's compactness would be somewhat defeated if we kept each interval boxed in its own JVM `Object`. Instead, we store essential coordinates for all the intervals in a few primitive arrays. We don't store any `Object` references, but we assign each interval an integer ID corresponding to its original insertion order. If the caller takes care to insert the intervals in sorted order (by begin then end), then we don't use any separate storage for the IDs. (Otherwise we store the permutation from the sorted order onto the insertion/ID order.)\n\nLastly, due to [limitations of Java generics](https://www.infoworld.com/article/3639525/openjdk-proposals-would-bring-universal-generics-to-java.html), we provide separate classes for double/float/int/long/short interval position types. `DoubleIntervalTree.java` serves as the source template, from which we [generate](https://github.com/mlin/iitj/blob/main/generate.sh) the others by sed find/replace.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlin%2Fiitj","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmlin%2Fiitj","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlin%2Fiitj/lists"}