https://github.com/innovationb1ue/hbase-driver
Hbase client in pure native Python. (No thrift)
https://github.com/innovationb1ue/hbase-driver
hbase hbase-client hbase-driver protobuf python pythonhbaseclient
Last synced: 4 months ago
JSON representation
Hbase client in pure native Python. (No thrift)
- Host: GitHub
- URL: https://github.com/innovationb1ue/hbase-driver
- Owner: innovationb1ue
- License: apache-2.0
- Created: 2024-02-09T09:05:48.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2026-02-21T16:25:23.000Z (4 months ago)
- Last Synced: 2026-02-21T22:57:40.334Z (4 months ago)
- Topics: hbase, hbase-client, hbase-driver, protobuf, python, pythonhbaseclient
- Language: Python
- Homepage:
- Size: 2.51 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# hbase-driver
[](https://github.com/innovationb1ue/hbase-driver/actions)
[](LICENSE)
Pure-Python native HBase client (no Thrift). This project implements core HBase regionserver and master RPCs so Python programs can perform table and metadata operations against an HBase cluster.
## Status
- Integration test status (local): **77 / 77 tests passing** (2026-02-21) using the custom 3-node Docker cluster.
## Quick Start
Get started with hbase-driver in just a few lines:
```python
from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.scan import Scan
# Initialize client
config = {HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}
client = Client(config)
# Put and Get data
with client.get_table("default", "mytable") as table:
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
row = table.get(Get(b"row1"))
print(row.get(b"cf", b"col")) # b"value"
# Scan data
with client.get_table("default", "mytable") as table:
with table.scan(Scan()) as scanner:
for row in scanner:
print(row.rowkey)
```
## Complete Example
For a comprehensive example covering all major features, see [complete_example.py](complete_example.py) which demonstrates:
- Basic CRUD operations (Put, Get, Delete, Scan)
- Advanced features (Batch, CheckAndPut, Increment)
- Filter usage for server-side filtering
- DDL operations (Create, Disable, Enable, Delete, Truncate)
- Connection management and resource cleanup
- Cache invalidation after table modifications
Run the example:
```bash
python3 complete_example.py
```
## Installation
```bash
pip install hbase-driver
```
Or for development:
```bash
git clone https://github.com/innovationb1ue/hbase-driver.git
cd hbase-driver
pip install -e .
```
## Connection Management
The hbase-driver provides context managers for automatic resource cleanup, similar to Java's try-with-resources:
```python
from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
# Using context manager - automatic cleanup
with Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}) as client:
with client.get_table("default", "mytable") as table:
# Do operations
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
# Table automatically closed here
# Client automatically closed here
# Manual cleanup
client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})
try:
table = client.get_table("default", "mytable")
try:
# Do operations
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
finally:
table.close()
finally:
client.close()
```
### Resource Classes Supporting Context Managers
- `Client` - Main client connection
- `Table` - Table operations
- `Admin` - DDL operations
- `ResultScanner` - Scan results iteration
## Configuration
### Configuration Options
| Key | Default | Description |
|-----|---------|-------------|
| `hbase.zookeeper.quorum` | Required | ZooKeeper quorum addresses (comma-separated) |
| `zookeeper.znode.parent` | `/hbase` | ZooKeeper parent znode |
| `hbase.connection.pool.size` | 10 | Maximum connections per pool |
| `hbase.connection.idle.timeout` | 300 | Idle timeout in seconds |
### Using Configuration Constants
The driver provides named constants for configuration keys, compatible with HBase Java driver's HConstants:
```python
from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
config = {
HConstants.ZOOKEEPER_QUORUM: "localhost:2181",
HConstants.CONNECTION_POOL_SIZE: 20,
HConstants.CONNECTION_IDLE_TIMEOUT: 600
}
client = Client(config)
```
### String Literals (Backward Compatible)
You can also use string literals directly:
```python
config = {
"hbase.zookeeper.quorum": "localhost:2181",
"hbase.connection.pool.size": 20
}
client = Client(config)
```
## API Reference
### Client
Main entry point for HBase operations.
```python
from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})
# Get admin for DDL operations
admin = client.get_admin()
# Get table for data operations
table = client.get_table("default", "mytable")
# Close when done
client.close()
```
### Table
Perform data operations on a specific table.
```python
from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.delete import Delete
from hbasedriver.operations.scan import Scan
# Put data
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
# Get data
row = table.get(Get(b"row1"))
if row:
print(row.get(b"cf", b"col"))
# Delete data
table.delete(Delete(b"row1"))
# Scan data
with table.scan(Scan()) as scanner:
for row in scanner:
print(row.rowkey)
```
### Admin
Perform DDL operations.
```python
from hbasedriver.model import ColumnFamilyDescriptor
from hbasedriver.table_name import TableName
# Create table
cfd = ColumnFamilyDescriptor(b"cf")
admin.create_table(TableName.value_of(b"mytable"), [cfd])
# Disable table
admin.disable_table(TableName.value_of(b"mytable"))
# Enable table
admin.enable_table(TableName.value_of(b"mytable"))
# Delete table (must be disabled first)
admin.disable_table(TableName.value_of(b"mytable"))
admin.delete_table(TableName.value_of(b"mytable"))
# List tables
tables = admin.list_tables()
```
### Operations
#### Put
```python
from hbasedriver.operations.put import Put
# Simple put
put = Put(b"row1").add_column(b"cf", b"col", b"value")
table.put(put)
# Multiple columns
put = Put(b"row2") \
.add_column(b"cf", b"name", b"Alice") \
.add_column(b"cf", b"age", b"30") \
.add_column(b"info", b"email", b"alice@example.com")
table.put(put)
# With timestamp
import time
ts = int(time.time() * 1000)
put = Put(b"row3").add_column(b"cf", b"col", b"value", ts=ts)
table.put(put)
```
#### Get
```python
from hbasedriver.operations.get import Get
# Get entire row
row = table.get(Get(b"row1"))
# Get specific column
row = table.get(Get(b"row1").add_column(b"cf", b"col"))
# Get multiple columns
row = table.get(Get(b"row1")
.add_column(b"cf", b"name")
.add_column(b"cf", b"age"))
# With time range
start_ts = int((time.time() - 86400) * 1000) # 24 hours ago
end_ts = int(time.time() * 1000)
row = table.get(Get(b"row1").set_time_range(start_ts, end_ts))
# Check existence only
exists = table.get(Get(b"row1").set_check_existence_only(True)) is not None
```
#### Scan
```python
from hbasedriver.operations.scan import Scan
# Scan entire table
with table.scan(Scan()) as scanner:
for row in scanner:
print(row.rowkey)
# Scan with row key range
scan = Scan(start_row=b"row1", end_row=b"row9")
with table.scan(scan) as scanner:
for row in scanner:
print(row.rowkey)
# Scan with specific columns
scan = Scan().add_column(b"cf", b"col")
with table.scan(scan) as scanner:
for row in scanner:
print(row.get(b"cf", b"col"))
# Scan with limit
scan = Scan().set_limit(100)
with table.scan(scan) as scanner:
for row in scanner:
print(row.rowkey)
# Pagination
scan = Scan()
rows, resume_key = table.scan_page(scan, page_size=100)
while resume_key:
scan = Scan(start_row=resume_key, include_start_row=False)
rows, resume_key = table.scan_page(scan, page_size=100)
```
#### Batch Operations
```python
from hbasedriver.operations.batch import BatchGet, BatchPut
# Batch get
bg = BatchGet([b"row1", b"row2", b"row3"])
bg.add_column(b"cf", b"col1")
results = table.batch_get(bg)
# Batch put
bp = BatchPut()
bp.add_put(Put(b"row1").add_column(b"cf", b"col1", b"value1"))
bp.add_put(Put(b"row2").add_column(b"cf", b"col1", b"value2"))
results = table.batch_put(bp)
# Context manager for batch
with table.batch(batch_size=1000) as batch:
batch.put(b"row1", {b"cf:col1": b"value1"})
batch.put(b"row2", {b"cf:col1": b"value2"})
batch.delete(b"row3")
# All operations executed when exiting context
```
#### Check and Put
```python
from hbasedriver.operations.increment import CheckAndPut
cap = CheckAndPut(b"row1")
cap.set_check(b"cf", b"lock", b"") # Check if lock is empty
cap.set_put(Put(b"row1").add_column(b"cf", b"data", b"value"))
success = table.check_and_put(cap)
```
#### Increment
```python
from hbasedriver.operations.increment import Increment
inc = Increment(b"row1")
inc.add_column(b"cf", b"counter", 1)
new_value = table.increment(inc)
```
### Filters
The driver supports server-side filters:
```python
from hbasedriver.filter import PrefixFilter, RowFilter
from hbasedriver.filter.compare_filter import CompareOperator
from hbasedriver.filter.binary_comparator import BinaryComparator
# Prefix filter
scan = Scan().set_filter(PrefixFilter(b"abc"))
# Row filter with comparison
scan = Scan().set_filter(
RowFilter(CompareOperator.EQUAL, BinaryComparator(b"row1"))
)
```
## Java Compatibility
This driver is designed to be familiar to HBase Java developers. Here's a quick comparison:
| Java Driver | Python Driver |
|-------------|---------------|
| `Connection connection = ConnectionFactory.createConnection(config)` | `client = Client(config)` |
| `Table table = connection.getTable(TableName.valueOf("mytable"))` | `table = client.get_table("default", "mytable")` |
| `table.put(new Put(Bytes.toBytes("row1"))...` | `table.put(Put(b"row1")...` |
| `Result result = table.get(new Get(Bytes.toBytes("row1"))...` | `row = table.get(Get(b"row1")...` |
| `try (ResultScanner scanner = table.scan(...))` | `with table.scan(...) as scanner:` |
| `try (Connection conn = ...)` | `with Client(config) as client:` |
### Configuration Constants
Java's `HConstants` are available as `hbase_constants.HConstants`:
| Java | Python |
|------|--------|
| `HConstants.ZOOKEEPER_QUORUM` | `HConstants.ZOOKEEPER_QUORUM` |
| `HConstants.ZOOKEEPER_ZNODE_PARENT` | `HConstants.ZOOKEEPER_ZNODE_PARENT` |
| (Custom connection pool size) | `HConstants.CONNECTION_POOL_SIZE` |
| (Custom idle timeout) | `HConstants.CONNECTION_IDLE_TIMEOUT` |
## Development
### Quickstart (3-node Docker dev environment)
Prerequisites: Docker and docker-compose installed.
1. Build, start the custom 3-node cluster and run the full test suite:
```bash
./scripts/run_tests_3node.sh
```
2. To run tests against an already-running cluster (fast):
```bash
./scripts/run_tests_3node.sh --no-start
```
3. Run a single test file or case:
```bash
./scripts/run_tests_3node.sh test/test_scan.py
./scripts/run_tests_3node.sh test/test_scan.py::test_scan
```
Legacy single-node dev workflow (still available):
```bash
./scripts/run_tests_docker.sh
```
See [TEST_GUIDE.md](docs/TEST_GUIDE.md) and [DEV_ENV.md](docs/DEV_ENV.md) for full documentation and troubleshooting steps.
## Documentation
- [API Reference](docs/api_reference.md) - Comprehensive API documentation
- [Advanced Usage](docs/advanced_usage.md) - Advanced features and patterns
- [Performance Guide](docs/performance_guide.md) - Performance tuning and optimization
- [Troubleshooting](docs/troubleshooting.md) - Common issues and solutions
- [Migration Guide](docs/migration_guide.md) - Migration from Java HBase, Happybase, and Thrift clients
- [Migration from happybase](docs/migration_from_happybase.md) - Guide for migrating from happybase/happybase-thrift
- [中文介绍](docs/中文介绍.md) - 中文文档介绍 | Chinese Introduction
## License
Apache License 2.0