{"id":17882331,"url":"https://github.com/jwhitbeck/dendrite","last_synced_at":"2025-04-04T19:06:30.304Z","repository":{"id":20109230,"uuid":"23378936","full_name":"jwhitbeck/dendrite","owner":"jwhitbeck","description":"Dendrite is a library for querying large datasets on a single host at near-interactive speeds.","archived":false,"fork":false,"pushed_at":"2024-10-29T09:06:56.000Z","size":2022,"stargazers_count":72,"open_issues_count":3,"forks_count":0,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-03-28T18:11:25.505Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://dendrite.tech/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jwhitbeck.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-08-27T06:12:11.000Z","updated_at":"2025-03-11T16:06:23.000Z","dependencies_parsed_at":"2024-12-21T03:08:10.341Z","dependency_job_id":"01ebcbc8-ee05-4817-bdd2-ed6bf2766947","html_url":"https://github.com/jwhitbeck/dendrite","commit_stats":{"total_commits":785,"total_committers":2,"mean_commits":392.5,"dds":0.06114649681528661,"last_synced_commit":"bda7fd7c5a1b0bf6c644e1a60810ae78a692d6a2"},"previous_names":[],"tags_count":30,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwhitbeck%2Fdendrite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwhitbeck%2Fdendrite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwhitbeck%2Fdendrite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwhitbeck%2Fdendrite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jwhitbeck","download_url":"https://codeload.github.com/jwhitbeck/dendrite/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247234921,"owners_count":20905854,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-28T12:51:01.744Z","updated_at":"2025-04-04T19:06:30.261Z","avatar_url":"https://github.com/jwhitbeck.png","language":"Java","funding_links":[],"categories":["大数据"],"sub_categories":["微服务框架"],"readme":"# Dendrite\n\nDendrite is a library for querying large datasets on a single host at near-interactive speeds.\n\nIt attempts to be:\n\n- __simple__: there is no configuration, no services to run, and reads are as simple as opening a file;\n- __fast__: there are few bottlenecks and reads will usually make good use of all available CPU cores;\n- __compact__: the file size is typically 30-40% lower than the equivalent compressed JSON;\n- __flexible__: it supports the same rich set of nested data structures as [EDN][];\n- __write once, read often__: optimizations are run at write-time to ensure fast read-time performance.\n\n[EDN]: https://github.com/edn-format/edn\n\n\nThe current implementation is in Java but only exposes a Clojure API. In the future, I would like to expose a\nclean Java interface and build a C implementation for non-JVM code.\n\nThis code has been in used in production for over a year. It has been successfully used both as a building\nblock in large ETL systems and for ad-hoc data-science studies. However, prior to the 1.0 release, no effort\nwill be made at preserving backwards compatibility of APIs or binary compatibility of files.\n\nDendrite implements the record shredding and assembly ideas from Google's [Dremel paper][Dremel] [1]. Querying\nfor only small parts of the stored records can be up to several orders of magnitude faster than fully\ndeserializing each record and pulling out the desired information. Furthermore, this library also borrows many ideas from the [Parquet][] project, an implementation of the\nDremel file format for Hadoop. Unlike Parquet, Dendrite is not tied to any particular ecosystem and is\ndesigned to be a small library with no external dependencies.\n\n\n[Dremel]: http://research.google.com/pubs/pub36632.html\n[Parquet]: http://parquet.io/\n\n__Status update__ (March 23, 2018): For personal reasons, I haven't been able to work on this project in the\npast two years. However, I have been accumulating ideas for the next iteration.\n\n[![Build Status](https://travis-ci.org/jwhitbeck/dendrite.png)](https://travis-ci.org/jwhitbeck/dendrite.png)\n\n## Documentation\n\nWork-in-progress documentation and benchmarks are available at [dendrite.tech](http://dendrite.tech).\n\n## Roadmap to 1.0\n\n- Improve writer performance.\n- Cleanly separate clojure and java code.\n- Expose a good Java API\n- Preserve presence/absence of record keys\n- Add indexing\n\n## References\n\n1. Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis.\n[Dremel: Interactive Analysis of Web-Scale Datasets][Dremel].\nIn _Proc. VLDB_, 2010\n\n## License\n\nCopyright \u0026copy; 2013-2017 John Whitbeck\n\nDistributed under the Eclipse Public License, the same as Clojure.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwhitbeck%2Fdendrite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjwhitbeck%2Fdendrite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwhitbeck%2Fdendrite/lists"}