{"id":19604565,"url":"https://github.com/prateek/hbase-avro-hive","last_synced_at":"2025-02-26T16:19:00.230Z","repository":{"id":36769529,"uuid":"41076237","full_name":"prateek/hbase-avro-hive","owner":"prateek","description":null,"archived":false,"fork":false,"pushed_at":"2015-08-20T05:00:22.000Z","size":120,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-09T08:38:22.652Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/prateek.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-08-20T04:55:46.000Z","updated_at":"2018-10-03T13:28:38.000Z","dependencies_parsed_at":"2022-09-10T02:41:46.513Z","dependency_job_id":null,"html_url":"https://github.com/prateek/hbase-avro-hive","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prateek%2Fhbase-avro-hive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prateek%2Fhbase-avro-hive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prateek%2Fhbase-avro-hive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prateek%2Fhbase-avro-hive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/prateek","download_url":"https://codeload.github.com/prateek/hbase-avro-hive/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240887775,"owners_count":19873539,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T09:37:19.609Z","updated_at":"2025-02-26T16:19:00.203Z","avatar_url":"https://github.com/prateek.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"HBase-Hive-Avro\n===============\n\nThis project demonstrates how to use HBase columns containing Avro records via Hive. Hive added new functionality for this use-case under https://issues.apache.org/jira/browse/HIVE-6147, but I couldn't find an example of how to use it, which brings us here.\n\n### First, create and load a HBase table\nThe folder `hbase-put` contains a Maven project which creates an HBase table named 'user', with a single column family 'cf'. It further loads 4 Avro records into this table/column-family under the qualifier 'rec'. Follow directions below to build and execute:\n\n```sh\n$ cd hbase-put\n$ mvn package\n$ export CP=/etc/hbase/conf:$(hbase classpath):./target/hbase-1.0-SNAPSHOT.jar\n$ java -cp $CP com.cloudera.sa.examples.HBasePut\n```\n\n### Next, create Hive table and view for the HBase table\nThe provided file, `setup-hive.hql` creates a Hive table named `test_hbase_avro` using the functionality from the JIRA pointed to above.\n\nTo begin with, run the `setup-hive.hql` file:\n\n```sh\n$ hive -f setup-hive.hql\n```\n\nNow to explore the created tables:\n\n```sh\n$ hive -e \"desc test_hbase_avro; select * from test_hbase_avro;\"\n\nOK\nkey                     string                  from deserializer\ncf_rec                  struct\u003cname:string,favourite_color:string\u003e      from deserializer\nTime taken: 2.252 seconds, Fetched: 2 row(s)\n\nOK\n1       {\"name\":\"John\",\"favourite_color\":\"Red\"}\n2       {\"name\":\"Jane\",\"favourite_color\":\"Grey\"}\n3       {\"name\":\"Mike\",\"favourite_color\":\"Blue\"}\n4       {\"name\":\"Roger\",\"favourite_color\":\"Black\"}\nTime taken: 0.635 seconds, Fetched: 4 row(s)\n```\n\nIt also creates an additional view, `test_hbase_avro_view` which exposes the same Hive table with columns containing only primitive types, as follows:\n\n```sh\n$ hive -e \"desc test_hbase_avro_view; select * from test_hbase_avro_view;\"\n\nOK\nrowkey                  string\nuser_name               string\nfav_colour              string\nTime taken: 2.23 seconds, Fetched: 3 row(s)\n\nOK\n1       John    Red\n2       Jane    Grey\n3       Mike    Blue\n4       Roger   Black\nTime taken: 1.377 seconds, Fetched: 4 row(s)\n```\n\n# TODO\n- insert followup note about UDTF extensions\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprateek%2Fhbase-avro-hive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprateek%2Fhbase-avro-hive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprateek%2Fhbase-avro-hive/lists"}