Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/capjamesg/markdown-html-link-rot
A Ruby script to substitute invalid links in markdown and HTML with a link to an Internet Archive backup.
https://github.com/capjamesg/markdown-html-link-rot
archiving linkrot ruby
Last synced: 19 days ago
JSON representation
A Ruby script to substitute invalid links in markdown and HTML with a link to an Internet Archive backup.
- Host: GitHub
- URL: https://github.com/capjamesg/markdown-html-link-rot
- Owner: capjamesg
- License: mit
- Created: 2021-12-28T15:37:09.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2021-12-30T15:00:55.000Z (about 3 years ago)
- Last Synced: 2024-12-15T07:48:52.587Z (19 days ago)
- Topics: archiving, linkrot, ruby
- Language: Ruby
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Markdown / HTML Link Rot Detector
A Ruby script to detect link rot in HTML and markdown documents and replace broken links with an Internet Archive backup.
## Getting Started
First, install the required dependencies for this project to work using bundle:
bundle install
Next, run the link rot detector program:
ruby watch.rb
The link rot detector program will check for links that return 404s or an invalid response. If such a link is found, the Internet Archive's Wayback Machine API is queried to see if a snapshot of the site is available. If a snapshot is found, the link to the most recent snapshot is used to replace the broken link in the HTML / markdown document being read.
All changes are logged to a log file whose name is printed to the console when the program runs.
You can optionally use the --webhook flag to send a notification with a JSON payload to a server when the program has finished running. The payload looks like this:
{
"message": "
Cali has identified 0 broken links. These links have been replaced with archived versions.See below for the changes made.
[List of broken links, if applicable]
"
}## Technologies
The following libraries and technologies are used in this project:
- Ruby
- nokogiri
- logger
- dotenv
- HTTParty## Contributors
- capjamesg