Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nielsbasjes/splittablegzip
Splittable Gzip codec for Hadoop
https://github.com/nielsbasjes/splittablegzip
codec gzip gzip-codec gzipped-files hadoop mapreduce-java pig spark splittable
Last synced: 7 days ago
JSON representation
Splittable Gzip codec for Hadoop
- Host: GitHub
- URL: https://github.com/nielsbasjes/splittablegzip
- Owner: nielsbasjes
- License: apache-2.0
- Created: 2012-03-28T21:59:15.000Z (almost 13 years ago)
- Default Branch: main
- Last Pushed: 2024-12-19T07:23:22.000Z (about 1 month ago)
- Last Synced: 2025-01-08T09:09:55.738Z (14 days ago)
- Topics: codec, gzip, gzip-codec, gzipped-files, hadoop, mapreduce-java, pig, spark, splittable
- Language: Java
- Homepage:
- Size: 1.37 MB
- Stars: 69
- Watchers: 8
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README-JavaMapReduce.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Using the SplittableGZipCodec in Apache Hadoop MapReduce (Java)
To use this in a Hadoop MapReduce job written in Java you must make sure this library has been added as a dependency.In Maven you would simply add this dependency
nl.basjes.hadoop
splittablegzip
1.3
Then in Java you would create an instance of the Job that you are going to run
Job job = ...
and then before actually running the job you set the configuration using something like this:
job.getConfiguration().set("io.compression.codecs", "nl.basjes.hadoop.io.compress.SplittableGzipCodec");
job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.minsize", 5000000000);
job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.maxsize", 5000000000);NOTE: The ORIGINAL GzipCodec may NOT be in the list of compression codecs anymore !