https://github.com/miku/unzippa
Unzip selected members from a zipfile 150x faster than unzip.
https://github.com/miku/unzippa
performance unzip
Last synced: 5 months ago
JSON representation
Unzip selected members from a zipfile 150x faster than unzip.
- Host: GitHub
- URL: https://github.com/miku/unzippa
- Owner: miku
- License: gpl-3.0
- Created: 2018-04-09T08:39:41.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2025-03-13T13:50:53.000Z (10 months ago)
- Last Synced: 2025-06-20T05:50:50.572Z (7 months ago)
- Topics: performance, unzip
- Language: Go
- Homepage:
- Size: 4.15 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# unzippa
A faster version of a special [unzip](https://linux.die.net/man/1/unzip) use case.

Usage
-----
Where with vanilla [unzip](https://linux.die.net/man/1/unzip) you would write:
```shell
$ unzip -p ...
```
You might hit the an error saying: *Argument list too long* - and maybe you do
not want to mess with
[ARG_MAX](https://www.in-ulm.de/~mascheck/various/argmax/)?
Unfortunately, `unzip` does not allow a file with members passed as list of
archive members:
> An **optional list of archive members to be processed, separated by spaces**.
(VMS versions compiled with VMSCLI defined must delimit files with commas
instead. See -v in OPTIONS below.) **Regular expressions (wildcards) may be
used** to match multiple members; see above. Again, be sure to quote expressions
that would otherwise be expanded or modified by the operating system.
This is the gap, that `unzippa` fills:
```shell
$ unzippa -m
```
This attempts to extract all members given in *members-file*, one per line, to
stdout. This will work with hundreds or thousands of members. By default,
stdout is used, optionally an output file can be set via -o flag.
Performance
-----------
A fake file: Zipfile with 100000 files, and 10000 entries to extract. In this
very special case, unzippa seems about 150x faster than plain unzip.
```shell
$ unzip -l fixtures/fake.zip | sed '1,3d;$d' | sed '$d' | wc -l
100000
$ time unzip -p fixtures/fake.zip $(cat fixtures/fake.txt | tr '\n' ' ')
real 0m20.564s
user 0m19.978s
sys 0m0.146s
$ time unzippa -m fixtures/fake.txt fixtures/fake.zip
real 0m0.138s
user 0m0.136s
sys 0m0.038s
```
The unzippall
-------------
An executable `unzippall` is included in package since 0.1.4.
The unzippall tool takes a list of filenames (e.g. from stdin) and extracts them to stdout in
parallel. Super fast, order is not preserved.
Usage:
```
$ find /tmp/updates -type f -name "*zip" | unzippall -i '.*xml' > data.file
```
Performance: Finding 45000 files with `find`, about 2s. Finding files and
running `unzip -p` on each of them: 13min. Using `unzippall` on the same
fileset: about 2min.