{"id":13529390,"url":"https://github.com/rayokota/hdocdb","last_synced_at":"2025-04-19T18:34:15.640Z","repository":{"id":8638367,"uuid":"51400586","full_name":"rayokota/hdocdb","owner":"rayokota","description":"HBase as a JSON Document Database","archived":false,"fork":false,"pushed_at":"2023-06-14T22:51:31.000Z","size":418,"stargazers_count":24,"open_issues_count":2,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-29T11:41:38.783Z","etag":null,"topics":["document-database","hbase"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rayokota.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-02-09T21:18:03.000Z","updated_at":"2024-08-03T10:45:15.000Z","dependencies_parsed_at":"2024-01-03T01:20:30.663Z","dependency_job_id":"5a824de5-4866-436e-968e-e8a3fd2bd406","html_url":"https://github.com/rayokota/hdocdb","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rayokota%2Fhdocdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rayokota%2Fhdocdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rayokota%2Fhdocdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rayokota%2Fhdocdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rayokota","download_url":"https://codeload.github.com/rayokota/hdocdb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249239251,"owners_count":21235824,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["document-database","hbase"],"created_at":"2024-08-01T07:00:35.896Z","updated_at":"2025-04-16T12:30:45.904Z","avatar_url":"https://github.com/rayokota.png","language":"Java","funding_links":[],"categories":["Projects"],"sub_categories":["Frameworks"],"readme":"# HDocDB - HBase as a JSON Document Database\n\n[![Build Status][github-actions-shield]][github-actions-link]\n[![Maven][maven-shield]][maven-link]\n[![Javadoc][javadoc-shield]][javadoc-link]\n\n[github-actions-shield]: https://github.com/rayokota/hdocdb/workflows/build/badge.svg?branch=master\n[github-actions-link]: https://github.com/rayokota/hdocdb/actions\n[maven-shield]: https://img.shields.io/maven-central/v/io.hdocdb/hdocdb.svg\n[maven-link]: https://search.maven.org/#search%7Cga%7C1%7Cio.hdocdb\n[javadoc-shield]: https://javadoc.io/badge/io.hdocdb/hdocdb.svg?color=blue\n[javadoc-link]: https://javadoc.io/doc/io.hdocdb/hdocdb\n\nHDocDB is a client layer for using HBase as a store for JSON documents.  It implements many of the interfaces in the [OJAI](http://ojai.github.io) framework.\n\n## Installing\n\nReleases of HDocDB are deployed to Maven Central.\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eio.hdocdb\u003c/groupId\u003e\n    \u003cartifactId\u003ehdocdb\u003c/artifactId\u003e\n    \u003cversion\u003e1.0.1\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n## Building\n\nYou can also choose to build HDocDB manually.  Prerequisites for building:\n\n* git\n* Maven\n* Java 8\n\n```\ngit clone https://github.com/rayokota/hdocdb.git\ncd hdocdb\nmvn clean package -DskipTests\n```\n\n## Deployment\n\nCurrently HDocDB does not make use of coprocessors.  However, HDocDB does make use of server-side filters.  To deploy HDocDB:\n\n* Add target/hdocdb-1.0.0.jar to the classpath of all HBase region servers.\n* Restart the HBase region servers.\n    \n\n## Setup\n\nTo initialize HDocDB, an HBase connection is required.  For example,\n\n```java\n...\nConfiguration config = HBaseConfiguration.create();\nConnection conn = ConnectionFactory.createConnection(config);\nHDocumentDB hdocdb = new HDocumentDB(conn);\n...\n```\n\nNext is to obtain a document collection. \n\n```java\n...\nHDocumentCollection coll = hdocdb.getCollection(\"mycollection\");\n...\n```\n\t\t\nEach document collection is backed by an HBase table.\n\n## Creating Documents\n\nOnce a document collection is in hand, creating documents is straightforward.\n\n```java\nDocument doc = new HDocument()\n    .setId(\"jdoe\")\n    .set(\"firstName\", \"John\")\n    .set(\"lastName\", \"Doe\")\n    .set(\"dateOfBirth\", ODate.parse(\"1970-10-10\"));\ncoll.insert(doc);\n```\n\nYou can also use the `insertOrReplace()` method, which will replace the document with the same ID if it already exists.\n\n```java\ncoll.insertOrReplace(doc);\n```\n\n## Retrieving Documents\n\nTo retrieve all documents in a collection, use the `find()` method.\n\n```java\nDocumentStream docs = coll.find();\n```\n\t\t\nTo retrieve a single document by ID, use the `findById()` method.\n\n```java\nDocument doc = coll.findById(\"jdoe\");\n```\n\t\t\nYou can also pass a condition to the `find()` method.\n\n```java\nQueryCondition condition = new HQueryCondition()\n    .and()\n    .is(\"lastName\", QueryCondition.Op.EQUAL, \"Doe\")\n    .is(\"dateOfBirth\", QueryCondition.Op.LESS, ODate.parse(\"1981-01-01\"))\n    .close()\n    .build();\nDocumentStream docs = coll.find(condition);\n```\n\n## Updating Documents\n\nTo update a document, first create a document mutation.\n\n```java\nDocumentMutation mutation = new HDocumentMutation()\n    .setOrReplace(\"firstName\", \"Jim\")\n    .setOrReplace(\"dateOfBirth\", ODate.parse(\"1970-10-09\"));\ncoll.update(\"jdoe\", mutation);\n```\n\t\t\nHere are the different types of methods supported with `HDocumentMutation`.\n\n* `setOrReplace` - update or replace a field with the given value\n* `set` - perform an update if a field either doesn't exist or has the same type as the given value\n* `delete` - delete a field\n* `increment` - increment a numeric field with the given value\n* `append` - append the given array (or string) to an existing array (or string)\n* `merge` - merge the given subdocument with an existing subdocument\n\nAll of the methods other than the `setOrReplace()` method perform a read-modify-write at the client side.\n\n## Deleting Documents\n\nTo delete a document:\n\n```java\ncoll.delete(\"jdoe\");\n```\n\t\t\n## Saving and Retrieving Objects\n\nSince OJAI has [Jackson](http://wiki.fasterxml.com/JacksonHome) integration, HDocDB can treat HBase as an object store.  Assuming your Java class is annotated as follows:\n\n```java\npublic class User {\n\n    private String id;\n    private String firstName;\n    private String lastName;\n\n    @JsonCreator\n    public User(@JsonProperty(\"_id\")       String id,\n                @JsonProperty(\"firstName\") String firstName,\n                @JsonProperty(\"lastName\")  String lastName) {\n        this.id = id;\n        this.firstName = firstName;\n        this.lastName = lastName;\n    }\n\n    @JsonProperty(\"_id\")\n    public String getId() { return id; }\n\n    public String getFirstName() { return firstName; }\n\n    public String getLastName() { return lastName; }\n}\n```\n\nThen instances of your class can be saved and retrieved using HDocDB.\n\t\t\n```java\nUser user = new User(\"jsmith\", \"John\", \"Smith\");\nDocument doc = Json.newDocument(user);\ncoll.insert(doc);\n...\nuser = coll.findById(\"jsmith\").toJavaBean(User.class);\n```\n\t\t\n## Global Secondary Indexes\n\nHDocDB also has basic support for global secondary indexes.  For more sophisticated indexing support, an engine that can perform full text searches, such as [ElasticSearch](https://www.elastic.co/products/elasticsearch) or [Solr](http://lucene.apache.org/solr/), is recommended.\n\nIndex management is performed mostly on the client-side, so it is not as performant as a coprocessor-based solution such as that provided by [Apache Phoenix](https://phoenix.apache.org).  Also, covered indexes are not supported, so each index lookup requires a join.  However, the currrent index implementation should still help speed up some reads (at the cost of slightly slower writes).\n\nTo create a secondary index on the `lastName` field:\n\n```java\ncoll.createIndex(\"myindex\" \"lastName\", Value.Type.STRING);\n```\n\t\t\nIf the index is created after documents have already been added to the database, then the index will be populated in the background asynchronously.  Since the indexing is performed on the client, this may take some time for a large collection.\n\nNow, when performing a query such as the following, the index above will be used.\n\n```java\nQueryCondition condition = new HQueryCondition()\n    .and()\n    .is(\"lastName\", QueryCondition.Op.EQUAL, \"Doe\")\n    .is(\"dateOfBirth\", QueryCondition.Op.LESS, ODate.parse(\"1981-01-01\"))\n    .close()\n    .build();\nDocumentStream docs = coll.find(condition);\n```\n\nA query will use at most one index.  We can verify which index was used as follows.\n\n```java\nSystem.out.println(((HDocumentStream)docs).explain().asDocument());\n```\n\nwhich should print the following.\n\n```json\n{\n    \"plan\": \"index scan\",\n    \"indexName\": \"myindex\",\n    \"indexBounds\": {\"lastName\": \"[Doe‥Doe]\"},\n    \"staleIndexesRunningCount\": 0\n}\n```\n\nWe can also specify which index to use.\n\n```java\nDocumentStream docs = coll.findWithIndex(\"myindex\", condition);\n```\n\t\t\nOr that no index should be used.\n\n```java\nDocumentStream docs = coll.findWithIndex(Index.NONE, condition);\n```\n\t\t\nYou can also create compound indexes.\n\n```java\nIndexBuilder builder = coll.newIndexBuilder(\"myindex2\")\n    .add(\"lastName\", Value.Type.STRING)\n    .add(\"firstName\", Value.Type.STRING)\n    .build();\n```\n\n\n## HDocDB Shell with Nashorn Integration\n\nThe HDocDB shell is a command-line shell with [Nashorn](http://openjdk.java.net/projects/nashorn/) integration, so that MongoDB-like queries can be specified interactively or in a Nashorn script.\n\nTo start the HDocDB shell you need to use `jrunscript` that comes with Java (typically found in $JAVA_HOME/bin).\n\n```\n$ jrunscript -cp \u003chbase-conf-dir\u003e:target/hdocdb-1.0.0.jar -f target/classes/shell/hdocdb.js -f - \n```\n\nHere is a sample run.\n\n```\nnashorn\u003e db.mycoll.insert( { _id: \"jdoe\", first_name: \"John\", last_name: \"Doe\" } )\n\t\nnashorn\u003e var doc = db.mycoll.find( { last_name: \"Doe\" } )[0]\n\t\nnashorn\u003e print(doc)\n{\"_id\":\"jdoe\",\"first_name\":\"John\",\"last_name\":\"Doe\"}\n\t\nnashorn\u003e db.mycoll.update( { last_name: \"Doe\" }, { $set: { first_name: \"Jim\" } } )\n\t\nnashorn\u003e var doc = db.mycoll.find( { last_name: \"Doe\" } )[0]\n\t\nnashorn\u003e print(doc)\n{\"_id\":\"jdoe\",\"first_name\":\"Jim\",\"last_name\":\"Doe\"}\n\t\nnashorn\u003e db.mycoll.delete( \"jdoe\" )\n```\n\nTo run a script:\n\n```\n$ jrunscript -cp \u003chbase-conf-dir\u003e:target/hdocdb-1.0.0.jar -f target/classes/shell/hdocdb.js -f \u003cscript\u003e\n```\n\t\n## Implementation Notes\n\nEach document is stored as a separate row in HBase.  This allows multiple operations on a document to be performed together atomically.  The document is essentially \"shredded\" using a technique called key-flattening, as described in the [Argo](http://pages.cs.wisc.edu/~chasseur/pubs/argo-long.pdf) paper.  That technique was developed for use with a relational database, but in HDocDB it has been [adapted](https://rayokota.wordpress.com/2016/03/17/hbase-as-a-multi-model-data-store/) for HBase.\n\nThe implementation of global secondary indexes is based on blogs by [Hofhansl](http://hadoop-hbase.blogspot.de/2012/10/musings-on-secondary-indexes.html) and [Yates](http://jyates.github.io/2012/07/09/consistent-enough-secondary-indexes.html).\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frayokota%2Fhdocdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frayokota%2Fhdocdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frayokota%2Fhdocdb/lists"}