Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-web-archiving
An Awesome List for getting started with web archiving
https://github.com/ibnesayeed/awesome-web-archiving
Last synced: 2 days ago
JSON representation
-
Training/Documentation
- Awesome Memento
- Archives Unleashed Toolkit documentation
- Heritrix Walkthrough
- warc-specifications
- warcbase workshop
- What is a web archive? - A video from [the UK Web Archive YouTube Channel](https://www.youtube.com/channel/UCJukhTSw8VRj-VNTpBcqWkw)
- Wikipedia's List of Web Archiving Initiatives
- Glossary of Archive-It and Web Archiving Terms
- offical ISO 28500 WARC specification homepage
- The Web Archiving Lifecycle Model - The Web Archiving Lifecycle Model is an attempt to incorporate the technological and programmatic arms of the web archiving into a framework that will be relevant to any organization seeking to archive content from the web. Archive-It, the web archiving service from the Internet Archive, developed the model based on its work with memory institutions around the world.
- What is a web archive? - A video from [the UK Web Archive YouTube Channel](https://www.youtube.com/channel/UCJukhTSw8VRj-VNTpBcqWkw)
- What is a web archive? - A video from [the UK Web Archive YouTube Channel](https://www.youtube.com/channel/UCJukhTSw8VRj-VNTpBcqWkw)
-
Resources for Web Publishers
-
Tools & Software
-
Acquisition
- Crawl - A simple web crawler in Golang. (Stable)
- Heritrix - An open source, extensible, web-scale, archival quality web crawler. (Stable)
- HTTrack - An open source website copying utility. (Stable)
- SiteStory - A transactional archive that selectively captures and stores transactions that take place between a web client (browser) and a web server. (Stable)
- WebMemex - Browser extension for Firefox and Chrome which lets you archive web pages you visit. (In Development)
- Wget - An open source file retrieval utility that of [version 1.14 supports writing warcs](http://www.archiveteam.org/index.php?title=Wget_with_WARC_output). (Stable)
-
Search & Discovery
- SecurityTrails - Web based archive for WHOIS and DNS records. REST API available free of charge.
- Tempas v1 - Temporal web archive search based on [Delicious](https://en.wikipedia.org/wiki/Delicious_(website)) tags. (Stable)
- Tempas v2 - Temporal web archive search based on links and anchor texts extracted from the German web from 1996 to 2013 (results are not limited to German pages, e.g., [Obama@2005-2009 in Tempas](http://tempas.l3s.de/v2/query?q=obama&from=2005&to=2009)). (Stable)
- here
-
WARC I/O Libraries
- Jwat - Libraries and tools for reading/writing/validating WARC/ARC/GZIP files (Java). (Stable)
-
Analysis
- Archives Unleashed Cloud - Archives Unleashed Cloud (AUK) is an web interface for analysing web archives. Currently, it can sync with Archive-It collections and extract hyperlink networks, full text, and other information from your collections. (Stable)
-
Quality Assurance
- Chrome Check My Links - Browser extension: a link checker with more options.
- Chrome link checker - Browser extension: basic link checker.
- Chrome Open Multiple URLs - Browser extension: opens multiple URLs and also extracts URLs from text.
- Chrome Revolver - Browser extension: switches between browser tabs.
- Xenu - Desktop link checker for Windows.
-
-
Community Resources
-
Blogs and Scholarship
- IIPC Blog
- Web Archiving Roundtable - Unofficial blog of the Web Archiving Roundtable of the [Society of American Archivists](https://www2.archivists.org/) maintained by the members of the Web Archiving Roundtable.
- The Web as History - An open-source book that provides a conceptual overview to web archiving research, as well as several case studies.
- WS-DL Blog - Web Science and Digital Libraries Research Group blogs about various Web archining related topics, scholarly work, and academic trip reports.
- DSHR's Blog - David Rosenthal regularly reviwes and summarizes work done in the Digital Preservation field.
-
Slack
- IIPC Slack - Ask [@netpreserve](https://twitter.com/NetPreserve) for access.
- Archives Unleashed Slack - [Fill out this request form](https://docs.google.com/forms/d/e/1FAIpQLScXPIH0Ssw63yWqyMkUqHVYmz2-ItBMzHiJQ-sOlJwTA8u5AQ/viewform?usp=sf_link) for access to a researcher group of people working with web archives.
-
Twitter
- @NetPreserve - Official IIPC handle.
- #WebArchiving
-
Programming Languages
Categories
Sub Categories