Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/charlesfrye/7dbs-hbase

Updated code for HBase wikidump example from Seven Databases in Seven Weeks
https://github.com/charlesfrye/7dbs-hbase

Last synced: 27 days ago
JSON representation

Updated code for HBase wikidump example from Seven Databases in Seven Weeks

Awesome Lists containing this project

README

        

# Setup
See [this medium post](https://sanjay-vishwakarma.medium.com/hbase-db-installation-and-data-read-write-via-hbase-shell-bulk-loading-and-non-bulk-loading-437026218d00)
for info on setting the Docker container up.

That is the source of the `run-hbase.sh` script.

### Get an `hbase` shell:
```bash
docker exec -it hbase-docker hbase shell
```

### Get a `bash` shell:
```bash
docker exec -it hbase-docker hbase shell
```

# Wikipedia Dump

The `.rb` files have been modified to reflect updates to HBase
since `1.0`.

In particular, they use proper `Connection`s to the tables
and switch to a `BufferedMutator` for batched writes,
instead of manually managing commit flushing.

Copy the files into `./data`,
which will be generated by `run-hbase`,
to get access to them inside the Docker container.

Hacky, I know.