https://github.com/prateek/hadoop-fileformat-benchmark-kit
https://github.com/prateek/hadoop-fileformat-benchmark-kit
Last synced: over 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/prateek/hadoop-fileformat-benchmark-kit
- Owner: prateek
- Created: 2014-05-29T02:28:00.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2014-08-20T16:26:24.000Z (almost 12 years ago)
- Last Synced: 2025-01-09T08:38:24.861Z (over 1 year ago)
- Language: Shell
- Size: 156 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# hadoop-fileformat-benchmark-kit
---------------------------------
Aims to be a set of utilities to assist benchmarking performance for different fileformats for a given workload (Hive/Impala). Attributes it cares about -
1. Size of blocks file
2. Compression Ratio
3. Query Performance - *pending item*
**Warning** this is a work in progress. At the moment, it does conversions for single tables using scripts
# Usage
-------
```sh
$ ./generate-conversion-hql.sh . \
> hive-bechmark.hql
$ hive -f hive-bechmark.hql
```
# Known Issues
--------------
- Avro conversion is not working at the moment
## References ##
- Presentation on file-formats http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2