Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/internetarchive/warc
Python library for reading and writing warc files
https://github.com/internetarchive/warc
Last synced: about 1 month ago
JSON representation
Python library for reading and writing warc files
- Host: GitHub
- URL: https://github.com/internetarchive/warc
- Owner: internetarchive
- License: gpl-2.0
- Created: 2012-02-23T12:30:16.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2022-03-07T01:16:42.000Z (over 2 years ago)
- Last Synced: 2024-07-18T22:24:34.457Z (2 months ago)
- Language: Python
- Homepage:
- Size: 202 KB
- Stars: 234
- Watchers: 23
- Forks: 114
- Open Issues: 29
-
Metadata Files:
- Readme: Readme.rst
- License: LICENSE
Awesome Lists containing this project
README
warc: Python library to work with WARC files
============================================.. image:: https://secure.travis-ci.org/anandology/warc.png?branch=master
:alt: build status
:target: http://travis-ci.org/anandology/warcWARC (Web ARChive) is a file format for storing web crawls.
http://bibnum.bnf.fr/WARC/
This `warc` library makes it very easy to work with WARC files.::
import warc
f = warc.open("test.warc")
for record in f:
print record['WARC-Target-URI'], record['Content-Length']Documentation
-------------The documentation of the warc library is available at http://warc.readthedocs.org/.
License
-------This software is licensed under GPL v2. See LICENSE_ file for details.
.. LICENSE: http://github.com/internetarchive/warc/blob/master/LICENSE