Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-web-archiving
An Awesome List for getting started with web archiving
https://github.com/ibnesayeed/awesome-web-archiving
Last synced: 5 days ago
JSON representation
-
Training/Documentation
- Awesome Memento
- Archives Unleashed Toolkit documentation
- Heritrix Walkthrough
- warc-specifications
- warcbase workshop
- What is a web archive? - A video from [the UK Web Archive YouTube Channel](https://www.youtube.com/channel/UCJukhTSw8VRj-VNTpBcqWkw)
- Wikipedia's List of Web Archiving Initiatives
- Glossary of Archive-It and Web Archiving Terms
- offical ISO 28500 WARC specification homepage
- What is a web archive? - A video from [the UK Web Archive YouTube Channel](https://www.youtube.com/channel/UCJukhTSw8VRj-VNTpBcqWkw)
-
Tools & Software
-
Utilities
- The Archive Browser - The Archive Browser is a program that lets you browse the contents of archives, as well as extract them. It will let you open files from inside archives, and lets you preview them using Quick Look. WARC is supported (macOS only, Proprietary app).
- The Unarchiver - Program to extract the contents of many archive formats, inclusive of WARC, to a file system. Free variant of The Archive Browser (macOS only, Proprietary app).
-
Acquisition
- Crawl - A simple web crawler in Golang. (Stable)
- Heritrix - An open source, extensible, web-scale, archival quality web crawler. (Stable)
- HTTrack - An open source website copying utility. (Stable)
- SiteStory - A transactional archive that selectively captures and stores transactions that take place between a web client (browser) and a web server. (Stable)
- WebMemex - Browser extension for Firefox and Chrome which lets you archive web pages you visit. (In Development)
- Wget - An open source file retrieval utility that of [version 1.14 supports writing warcs](http://www.archiveteam.org/index.php?title=Wget_with_WARC_output). (Stable)
-
Search & Discovery
- SecurityTrails - Web based archive for WHOIS and DNS records. REST API available free of charge.
- Tempas v1 - Temporal web archive search based on [Delicious](https://en.wikipedia.org/wiki/Delicious_(website)) tags. (Stable)
- Tempas v2 - Temporal web archive search based on links and anchor texts extracted from the German web from 1996 to 2013 (results are not limited to German pages, e.g., [Obama@2005-2009 in Tempas](http://tempas.l3s.de/v2/query?q=obama&from=2005&to=2009)). (Stable)
- here
-
WARC I/O Libraries
- Jwat - Libraries and tools for reading/writing/validating WARC/ARC/GZIP files (Java). (Stable)
-
Analysis
- Archives Unleashed Cloud - Archives Unleashed Cloud (AUK) is an web interface for analysing web archives. Currently, it can sync with Archive-It collections and extract hyperlink networks, full text, and other information from your collections. (Stable)
-
Quality Assurance
- Chrome Check My Links - Browser extension: a link checker with more options.
- Chrome link checker - Browser extension: basic link checker.
- Chrome Open Multiple URLs - Browser extension: opens multiple URLs and also extracts URLs from text.
- Chrome Revolver - Browser extension: switches between browser tabs.
- Xenu - Desktop link checker for Windows.
-
-
Resources for Web Publishers
-
Community Resources
-
Blogs and Scholarship
- IIPC Blog
- Web Archiving Roundtable - Unofficial blog of the Web Archiving Roundtable of the [Society of American Archivists](https://www2.archivists.org/) maintained by the members of the Web Archiving Roundtable.
- The Web as History - An open-source book that provides a conceptual overview to web archiving research, as well as several case studies.
- WS-DL Blog - Web Science and Digital Libraries Research Group blogs about various Web archining related topics, scholarly work, and academic trip reports.
- DSHR's Blog - David Rosenthal regularly reviwes and summarizes work done in the Digital Preservation field.
-
Slack
- IIPC Slack - Ask [@netpreserve](https://twitter.com/NetPreserve) for access.
- Archives Unleashed Slack - [Fill out this request form](https://docs.google.com/forms/d/e/1FAIpQLScXPIH0Ssw63yWqyMkUqHVYmz2-ItBMzHiJQ-sOlJwTA8u5AQ/viewform?usp=sf_link) for access to a researcher group of people working with web archives.
-
Twitter
- @NetPreserve - Official IIPC handle.
- #WebArchiving
-
Programming Languages
Categories
Sub Categories