{"id":13487275,"url":"https://github.com/Huawei-Hadoop/hindex","last_synced_at":"2025-03-27T21:32:27.563Z","repository":{"id":190359528,"uuid":"11974846","full_name":"Huawei-Hadoop/hindex","owner":"Huawei-Hadoop","description":"Secondary Index for HBase","archived":false,"fork":false,"pushed_at":"2017-05-18T15:25:10.000Z","size":10123,"stargazers_count":591,"open_issues_count":28,"forks_count":286,"subscribers_count":134,"default_branch":"master","last_synced_at":"2024-10-30T22:40:01.009Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Huawei-Hadoop.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.txt","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"security/src/main/java/org/apache/hadoop/hbase/ipc/SecureClient.java","support":null,"governance":null}},"created_at":"2013-08-08T11:33:29.000Z","updated_at":"2024-08-19T08:09:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"e5a613be-c1b9-456d-8553-3b4abb653c01","html_url":"https://github.com/Huawei-Hadoop/hindex","commit_stats":null,"previous_names":["huawei-hadoop/hindex"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Huawei-Hadoop%2Fhindex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Huawei-Hadoop%2Fhindex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Huawei-Hadoop%2Fhindex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Huawei-Hadoop%2Fhindex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Huawei-Hadoop","download_url":"https://codeload.github.com/Huawei-Hadoop/hindex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245927436,"owners_count":20695233,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:00:57.289Z","updated_at":"2025-03-27T21:32:23.379Z","avatar_url":"https://github.com/Huawei-Hadoop.png","language":"Java","funding_links":[],"categories":["Java","NoSQL","大数据","Projects"],"sub_categories":["Spring Cloud框架","Infrastructure"],"readme":"hindex - Secondary Index for HBase\n======\n\nThe solution is 100% Java, compatible with Apache HBase 0.94.8, and is open sourced under ASL.\n\nFollowing capabilities are supported currently.\n- multiple indexes on table,\n- multi column index,\n- index based on part of a column value,\n- equals and range condition scans using index, and\n- bulk loading data to indexed table (Indexing done with bulk load).\n\n\n## How it works\nHBase Secondary Index is 100% server side implementation with co processors which persists index data in a separate table. Indexing is region wise and custom load balancer co-locates the index table regions with actual table regions. \n\n![si1](https://f.cloud.github.com/assets/5187670/945574/6a0f3bc0-0316-11e3-845c-de653152923a.jpg)\n\nServer reads the Index specification passed during the table creation and creates the index table. There will be one index table for one user table and all index information for that user table goes into the same index table.\n\n## Put Operation\nWhen a row is put into the HBase (user) table, co processors prepare and put the index information  in the corresponding index table.\nIndex table rowkey = region startkey + index name + indexed column value + user table rowkey\n\nE.g.:  \n\nTable –\u003e tab1 column family –\u003e cf\n\nIndex –\u003e idx1, cf1:c1 and idx2, cf1:c2\n\nIndex table –\u003e tab1_idx (user table name with suffix “_idx” )\n\n![si2](https://f.cloud.github.com/assets/5187670/945582/a6140d1c-0316-11e3-9af2-12c9fa636441.jpg)\n\n## Scan Operation \nFor a user table scan, co processor creates a scanner on the index table, scans the index data and seeks to exact rows in the user table. These seeks on HFiles are based on rowkey obtained from index data. This will help to skip the blocks where data is not present and sometimes full HFiles may also be skipped. \n\n![si5](https://f.cloud.github.com/assets/5187670/945631/32f0fc9e-0318-11e3-9a44-d2c7496f1d64.jpg)\n\n![si4](https://f.cloud.github.com/assets/5187670/945610/803d9aee-0317-11e3-8827-5cbc60e6efbb.jpg)\n\n\n## Usage\nClients need to pass the IndexedHTableDescriptor with the index name and columns while creating the table \n\n    IndexedHTableDescriptor htd = new IndexedHTableDescriptor(usertableName);\n\n    IndexSpecification iSpec = new IndexSpecification(indexName);\n    \n    HColumnDescriptor hcd = new HColumnDescriptor(columnFamily);\n    \n    iSpec.addIndexColumn(hcd, indexColumnQualifier, ValueType.String, 10);\n    \n    htd.addFamily(hcd);\n    \n    htd.addIndex(iSpec);\n    \n    admin.createTable(htd);\n    \nNo changes required for Puts, Deletes at client side as index operations for the same are internally handled by co-processors\n\nNo change in scan code for the client app. \n\nNo need to specify the index(s) to be used. Secondary Index implementation finds the best index for Scan by analyzing the filters used for the query.\n\n## Source \nThis repository contains source for Secondary Index support on Apache HBase 0.94.8.\n\n## Building from source and testing\nBuilding from source procedure is same as building HBase source hence it requires\n- Java 1.6 or later\n- Maven 3.X\n\nSeparate test source (secondaryindex\\src\\test\\java\\ )is available for running the tests on secondary indexes.\n\n## Note\nConfigure following configurations in hbase-site.xml for using secondary index.\n\n**Property**\n- **name** - *hbase.use.secondary.index*\n- __value__ -  *true*\n- __description__ - *Enable this property when you are using secondary index*\n\n__Property__\n- __name__ - *hbase.coprocessor.master.classes*\n- __value__ -  *org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver*\n- __description__ - *A comma-separated list of org.apache.hadoop.hbase.coprocessor.MasterObserver coprocessors that are loaded by default on the active HMaster process. For any implemented coprocessor methods, the listed classes will be called in order. After implementing your own MasterObserver, just put it in HBase's classpath and add the fully qualified class name here. \norg.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver -defines of coprocessor hooks to support secondary index operations on master process.*\n\n__Property__\n- __name__ - *hbase.coprocessor.region.classes*\n- __value__ -  *org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver*\n- __description__ - *A comma-separated list of Coprocessors that are loaded by default on all tables. For any override coprocessor method, these classes will be called in order. After implementing your own Coprocessor, just put it in HBase's classpath and add the fully qualified class name here. A coprocessor can also be loaded on demand by setting HTableDescriptor.\norg.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver –class defines coprocessor hooks to support secondary index operations on Region.*\n\n__Property__\n- __name__ - *hbase.coprocessor.wal.classes*\n- __value__ -  *org.apache.hadoop.hbase.index.coprocessor.wal.IndexWALObserver*\n- __description__ - *Classes which defines coprocessor hooks to support WAL operations.\norg.apache.hadoop.hbase.index.coprocessor.wal.IndexWALObserver – class define coprocessors hooks to support secondary index WAL operations*\n\n## Future Work\n- Dynamically add/drop index\n- Integrate Secondary Index Management in the HBase Shell \n- Optimize range scan scenarios\n- HBCK tool support for Secondary index tables\n- WAL Optimizations for Secondary index table entries\n- Make Scan Evaluation Intelligence Pluggable\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHuawei-Hadoop%2Fhindex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHuawei-Hadoop%2Fhindex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHuawei-Hadoop%2Fhindex/lists"}