Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nielsbasjes/splittablegzip

Splittable Gzip codec for Hadoop
https://github.com/nielsbasjes/splittablegzip

codec gzip gzip-codec gzipped-files hadoop mapreduce-java pig spark splittable

Last synced: 6 days ago
JSON representation

Splittable Gzip codec for Hadoop

Awesome Lists containing this project

README

        

# Using the SplittableGZipCodec in Apache Hadoop MapReduce (Java)
To use this in a Hadoop MapReduce job written in Java you must make sure this library has been added as a dependency.

In Maven you would simply add this dependency


nl.basjes.hadoop
splittablegzip
1.3

Then in Java you would create an instance of the Job that you are going to run

Job job = ...

and then before actually running the job you set the configuration using something like this:

job.getConfiguration().set("io.compression.codecs", "nl.basjes.hadoop.io.compress.SplittableGzipCodec");
job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.minsize", 5000000000);
job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.maxsize", 5000000000);

NOTE: The ORIGINAL GzipCodec may NOT be in the list of compression codecs anymore !