Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/machawk1/warcreate
Chrome extension to "Create WARC files from any webpage"
https://github.com/machawk1/warcreate
chrome-extension warc web-archiving
Last synced: 4 days ago
JSON representation
Chrome extension to "Create WARC files from any webpage"
- Host: GitHub
- URL: https://github.com/machawk1/warcreate
- Owner: machawk1
- License: mit
- Created: 2013-03-20T14:42:04.000Z (over 11 years ago)
- Default Branch: main
- Last Pushed: 2023-12-06T16:50:29.000Z (11 months ago)
- Last Synced: 2024-10-23T00:37:52.661Z (12 days ago)
- Topics: chrome-extension, warc, web-archiving
- Language: JavaScript
- Homepage: https://warcreate.com
- Size: 2.23 MB
- Stars: 210
- Watchers: 17
- Forks: 13
- Open Issues: 58
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-starred - machawk1/warcreate - Chrome extension to "Create WARC files from any webpage" (chrome-extension)
README
WARCreate
"Create WARC files from any webpage"
WARCreate is a Google Chrome extension with an aim to be able to "Create [WARC](http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717) files from any webpage".
With WARCs normally being limited to be generated by Internet Archive's [Heritrix](https://github.com/internetarchive/heritrix3) Archival Crawler, providing another means of generating these files from webpages
opens the door to
+ Preserving content not accessible to crawlers (e.g., deep web contents)
+ Circumventing the complication and overhead needed to setup a Heritrix instance by an end-user
+ Allowing a webpage to be interacted with (e.g., Facebook comments unrolled) prior to preservation, ensuring content that might not be initially present in a page is available to be captured....among many other use cases.
WARCreate is currently in active development though has gone through various release and retraction periods due to changes in the Google Chrome extension API and rules controlling extension distribution.
The original idea and prototype was [published](http://dl.acm.org/citation.cfm?id=2232930) in the Joint Conference on Digital Libraries 2012 (JCDL '12) Proceedings.
## Install ##
The latest stable binary can be [downloaded from the Chrome Web Store](https://chrome.google.com/webstore/detail/warcreate/kenncghfghgolcbmckhiljgaabnpcaaa?hl=en&gl=US).
### Citing Project
A publication related to this project appeared in the proceedings of JCDL 2012 ([Read the PDF](https://matkelly.com/papers/2012_jcdl_warcreate.pdf)). Please cite it as below:
> Mat Kelly and Michele C. Weigle. __WARCreate - Create Wayback-Consumable WARC Files from Any Webpage__. In _Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL)_, pages 437–438, Washington, DC, June 2012.
```bib
@INPROCEEDINGS{warcreate-jcdl2012,
AUTHOR = {Mat Kelly and
Michele C. Weigle},
TITLE = {{WARCreate} - Create Wayback-Consumable WARC Files from Any Webpage},
BOOKTITLE = {Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL)},
PAGES = {437--438},
MONTH = {June},
YEAR = {2012},
ADDRESS = {Washington, DC},
DOI = {10.1145/2232817.2232930}
}
```## Contact ##
WARCreate is a project of the Web Science and Digital Libraries (WS-DL) research group at Old Dominion University (ODU), created by Mat Kelly.For support e-mail [email protected] or tweet to us at @machawk1 and/or @WebSciDL.
## License ##
MIT