https://github.com/chfoo/rdai
Recursive Deep Archive Iterator: A Python module to print text files from deeply nested compressed archives recursively.
https://github.com/chfoo/rdai
Last synced: 4 months ago
JSON representation
Recursive Deep Archive Iterator: A Python module to print text files from deeply nested compressed archives recursively.
- Host: GitHub
- URL: https://github.com/chfoo/rdai
- Owner: chfoo
- Created: 2013-10-15T23:24:25.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2013-10-22T22:11:09.000Z (over 12 years ago)
- Last Synced: 2023-03-23T04:57:55.162Z (almost 3 years ago)
- Language: Python
- Size: 113 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
Recursive Deep Archive Iterator
===============================
A Python module to print text files from deeply nested compressed archives
recursively.
Useful for grep.
Requires Python 3.3 or greater.
Example::
python3 rdai.py myfile.zip
Example with GNU parallel and grep::
find -iname "*.zip" | parallel python3 rdai.py "{}" | grep hello
Use the ``--json`` option to parse JSON dumps such as the Archive Team Twitter
Stream scrapes: https://archive.org/details/twitterstream ::
find -iname "*.zip" | parallel --ungroup --eta python3 rdai.py --json "{}" | grep -o -P "bit\.ly/[a-zA-Z0-9]+" > urls.txt
JSON performance
++++++++++++++++
For better JSON performance install ujson or simplejson::
pip3 install ujson
pip3 install simplejson
Bugs
++++
* Does not handle infinite recursion.
* No setup.py
* No PyPi package.
* No unit tests.
* Detects compressed files only by filename.