https://github.com/feedly/hbase-utils

some hbase performance diagnostic code
https://github.com/feedly/hbase-utils

Last synced: about 1 year ago
JSON representation

some hbase performance diagnostic code

Host: GitHub
URL: https://github.com/feedly/hbase-utils
Owner: feedly
License: apache-2.0
Created: 2016-05-10T20:04:50.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2016-05-10T21:15:08.000Z (about 10 years ago)
Last Synced: 2025-01-23T05:15:58.817Z (over 1 year ago)
Language: Java
Size: 23.4 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # hbase-utils

At feedly we ran into a situation where hbase's log sync latency kept incresasing. We used the code in this repo to help diagnose the problem.

It runs on hbase 0.94/hadoop 1.1.

As a brief refresher, the hbase write path consists of immediately writing a log entry of the data to hdfs and then buffering the write in 

memory in the memstore. At some later point, the memstore is flushed to disk as immutable files. The purpose of the log sync is to be able to

gracefully recover from region server crashes.

There will be many, many more log sync operations than memstore flushes. Our particular issue was a faulty network connection to a couple of

data nodes, so this showed up most clearly in the log sync latency. It was hard to pin down the problematic nodes as in hbase 0.94 the hdfs

write pipeline is fairly opaque. You can get this information by running the regionserver and data node at the DEBUG log level. This does

produce a huge amount of log data, so having a separate test made debugging much easier. 

There are 2 tests we used:

## WALPerformanceEvaluation

This is a rough backport of the 1.x code used to benchmark wal writes. It configures the HLog class so as to run outside of hbase. 

Interesting things to do are to run this code on various data nodes (remember that one of the hdfs replica writes will be local) and with

varying replication levels. Writes can be done to the local file system (file:/) or hdfs (hdfs://).We first found our problematic data nodes

by running this test with hbase and hdfs logging set to DEBUG. The hdfs client code will log the write pipeline when opening a new stream. 

Thus by cross referencing the test output for slow performance with the current pipeline noted in the logs, we were able to find the

machines that were always involved in the write pipeline when things slowed down. This is easiest done by setting the -longSync option

appropriately when running the test and then tailing the test output and a grep for pipeline in the log file.

## DataReceiver/DataSender

These classes were used to do very basic network/disk tests. They basically mimic the HLog write pattern, but use java sockets and file APIs

directly to avoid the randomness of hdfs block placement. To run these tests, simply start the DataReceiver class on one machine and then 

DataSender class on another machine. The receiver can be optionally configured to write to the local filesystem (file:/) or hdfs (hdfs://).

This test allowed us to exactly pinpoint where the slowdown was occurring. In our case it was network -- we ran the test without writing to 

disk and noticed that transfers involving healthy nodes were consistently much faster than those involving the problematic nodes. If the 

problem had been disk I/O, the network only test would have been fine but the disk write would have been relatively much slower. 

To run the tests, simply package the project up (mvn assembly:assembly). Then copy the zip to the data nodes in question, unzip and run. 

See the source code for various options (number of writes, replication level, etc.). A sample logging config file is in 

conf/log4j.properties (include the conf dir in the classpath to use the configuration).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/feedly/hbase-utils

Awesome Lists containing this project

README