Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nielsbasjes/splittablegzip
Splittable Gzip codec for Hadoop
https://github.com/nielsbasjes/splittablegzip
codec gzip gzip-codec gzipped-files hadoop mapreduce-java pig spark splittable
Last synced: 6 days ago
JSON representation
Splittable Gzip codec for Hadoop
- Host: GitHub
- URL: https://github.com/nielsbasjes/splittablegzip
- Owner: nielsbasjes
- License: apache-2.0
- Created: 2012-03-28T21:59:15.000Z (over 12 years ago)
- Default Branch: main
- Last Pushed: 2024-11-14T23:08:27.000Z (6 days ago)
- Last Synced: 2024-11-15T00:20:05.641Z (6 days ago)
- Topics: codec, gzip, gzip-codec, gzipped-files, hadoop, mapreduce-java, pig, spark, splittable
- Language: Java
- Homepage:
- Size: 1.37 MB
- Stars: 69
- Watchers: 8
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README-JavaMapReduce.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Using the SplittableGZipCodec in Apache Hadoop MapReduce (Java)
To use this in a Hadoop MapReduce job written in Java you must make sure this library has been added as a dependency.In Maven you would simply add this dependency
nl.basjes.hadoop
splittablegzip
1.3
Then in Java you would create an instance of the Job that you are going to run
Job job = ...
and then before actually running the job you set the configuration using something like this:
job.getConfiguration().set("io.compression.codecs", "nl.basjes.hadoop.io.compress.SplittableGzipCodec");
job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.minsize", 5000000000);
job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.maxsize", 5000000000);NOTE: The ORIGINAL GzipCodec may NOT be in the list of compression codecs anymore !