Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-web-archiving

An Awesome List for getting started with web archiving
https://github.com/ibnesayeed/awesome-web-archiving

Last synced: 3 days ago
JSON representation

  • Training/Documentation

  • Resources for Web Publishers

  • Tools & Software

    • Acquisition

      • Crawl - A simple web crawler in Golang. (Stable)
      • Heritrix - An open source, extensible, web-scale, archival quality web crawler. (Stable)
      • HTTrack - An open source website copying utility. (Stable)
      • SiteStory - A transactional archive that selectively captures and stores transactions that take place between a web client (browser) and a web server. (Stable)
      • WebMemex - Browser extension for Firefox and Chrome which lets you archive web pages you visit. (In Development)
      • Wget - An open source file retrieval utility that of [version 1.14 supports writing warcs](http://www.archiveteam.org/index.php?title=Wget_with_WARC_output). (Stable)
      • SecurityTrails - Web based archive for WHOIS and DNS records. REST API available free of charge.
      • Tempas v1 - Temporal web archive search based on [Delicious](https://en.wikipedia.org/wiki/Delicious_(website)) tags. (Stable)
      • Tempas v2 - Temporal web archive search based on links and anchor texts extracted from the German web from 1996 to 2013 (results are not limited to German pages, e.g., [Obama@2005-2009 in Tempas](http://tempas.l3s.de/v2/query?q=obama&from=2005&to=2009)). (Stable)
      • here
    • WARC I/O Libraries

      • Jwat - Libraries and tools for reading/writing/validating WARC/ARC/GZIP files (Java). (Stable)
    • Analysis

      • Archives Unleashed Cloud - Archives Unleashed Cloud (AUK) is an web interface for analysing web archives. Currently, it can sync with Archive-It collections and extract hyperlink networks, full text, and other information from your collections. (Stable)
    • Quality Assurance

  • Community Resources

    • Blogs and Scholarship

      • IIPC Blog
      • Web Archiving Roundtable - Unofficial blog of the Web Archiving Roundtable of the [Society of American Archivists](https://www2.archivists.org/) maintained by the members of the Web Archiving Roundtable.
      • The Web as History - An open-source book that provides a conceptual overview to web archiving research, as well as several case studies.
      • WS-DL Blog - Web Science and Digital Libraries Research Group blogs about various Web archining related topics, scholarly work, and academic trip reports.
      • DSHR's Blog - David Rosenthal regularly reviwes and summarizes work done in the Digital Preservation field.
    • Slack

      • IIPC Slack - Ask [@netpreserve](https://twitter.com/NetPreserve) for access.
      • Archives Unleashed Slack - [Fill out this request form](https://docs.google.com/forms/d/e/1FAIpQLScXPIH0Ssw63yWqyMkUqHVYmz2-ItBMzHiJQ-sOlJwTA8u5AQ/viewform?usp=sf_link) for access to a researcher group of people working with web archives.
    • Twitter