https://github.com/prateek/hadoop-fileformat-benchmark-kit

Last synced: over 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/prateek/hadoop-fileformat-benchmark-kit
Owner: prateek
Created: 2014-05-29T02:28:00.000Z (about 12 years ago)
Default Branch: master
Last Pushed: 2014-08-20T16:26:24.000Z (almost 12 years ago)
Last Synced: 2025-01-09T08:38:24.861Z (over 1 year ago)
Language: Shell
Size: 156 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# hadoop-fileformat-benchmark-kit
---------------------------------
Aims to be a set of utilities to assist benchmarking performance for different fileformats for a given workload (Hive/Impala). Attributes it cares about -

1. Size of blocks file
2. Compression Ratio
3. Query Performance - *pending item*

**Warning** this is a work in progress. At the moment, it does conversions for single tables using scripts

# Usage
-------
```sh
$ ./generate-conversion-hql.sh . \
> hive-bechmark.hql
$ hive -f hive-bechmark.hql
```

# Known Issues
--------------
- Avro conversion is not working at the moment

## References ##
- Presentation on file-formats http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/prateek/hadoop-fileformat-benchmark-kit

Awesome Lists containing this project

README