https://github.com/nielsbasjes/splittablegzip

Splittable Gzip codec for Hadoop
https://github.com/nielsbasjes/splittablegzip

codec gzip gzip-codec gzipped-files hadoop mapreduce-java pig spark splittable

Last synced: about 1 month ago
JSON representation

Splittable Gzip codec for Hadoop

Host: GitHub
URL: https://github.com/nielsbasjes/splittablegzip
Owner: nielsbasjes
License: apache-2.0
Created: 2012-03-28T21:59:15.000Z (over 13 years ago)
Default Branch: main
Last Pushed: 2025-08-17T10:08:37.000Z (about 2 months ago)
Last Synced: 2025-08-19T03:56:28.772Z (about 2 months ago)
Topics: codec, gzip, gzip-codec, gzipped-files, hadoop, mapreduce-java, pig, spark, splittable
Language: Java
Homepage:
Size: 1.38 MB
Stars: 72
Watchers: 6
Forks: 9
Open Issues: 2
Metadata Files:
- Readme: README-JavaMapReduce.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

# Using the SplittableGZipCodec in Apache Hadoop MapReduce (Java)
To use this in a Hadoop MapReduce job written in Java you must make sure this library has been added as a dependency.

In Maven you would simply add this dependency

nl.basjes.hadoop
splittablegzip
1.3

Then in Java you would create an instance of the Job that you are going to run

Job job = ...

and then before actually running the job you set the configuration using something like this:

job.getConfiguration().set("io.compression.codecs", "nl.basjes.hadoop.io.compress.SplittableGzipCodec");
job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.minsize", 5000000000);
job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.maxsize", 5000000000);

NOTE: The ORIGINAL GzipCodec may NOT be in the list of compression codecs anymore !

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nielsbasjes/splittablegzip

Awesome Lists containing this project

README