Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jancurn/actor-find-broken-links
A source code of an Apify actor that finds and reports broken links on a website. Unlike other SEO analysis tools, it also reports broken URL #fragments.
https://github.com/jancurn/actor-find-broken-links
broken-links seo
Last synced: 3 months ago
JSON representation
A source code of an Apify actor that finds and reports broken links on a website. Unlike other SEO analysis tools, it also reports broken URL #fragments.
- Host: GitHub
- URL: https://github.com/jancurn/actor-find-broken-links
- Owner: jancurn
- License: apache-2.0
- Created: 2018-10-30T11:44:49.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2023-07-20T17:11:25.000Z (over 1 year ago)
- Last Synced: 2024-10-12T05:12:57.039Z (3 months ago)
- Topics: broken-links, seo
- Language: JavaScript
- Homepage: https://apify.com/jancurn/find-broken-links
- Size: 279 KB
- Stars: 6
- Watchers: 3
- Forks: 5
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## What does Broken Links Checker do?
Our Broken Links Checker is an easy-to-use SEO tool to help you keep your UX and SEO score healthy, improve your website ranking, and prevent link rot.## How does it check for broken links?
Broken Link Checker can crawl any website and do all of the following:
- generate a report containing an inspection of **all links on the website** or **only broken links**
- scan **one or multiple websites** simultaneously
- easily check **domain and subdomains**
- identify **broken link fragments**
- give you a neat **report in your email inbox** once the SEO inspection is complete
- initiate a **SEO check manually or automatically** thanks to our powerful scheduling system## How much does it cost to run Broken Links Checker?
Using our basic plan, the scraper's run will cost you around **USD 0.25 in Apify platform credits per 1,000 scraped results**. For more details about the plans we offer, platform credits and usage, see the [platform pricing page](https://apify.com/pricing/actors).## How to start Broken Links Checker
Broken Links Checker is highly adaptable to your SEO requests and can scan your web pages quickly and regularly. To check a website for bad links, repeat the following steps:
1. Click on *Try for free*.
2. Add one or more website URLs to start the audit from.
4. Enable the *Save only broken links* button.
5. *Add your email address* to receive the full SEO report in your inbox.
6. Click *Run* and wait for the data to be collected.
7. Optional step: *Schedule* the tool to check the links automatically every month, week or specific time during the day.For a more detailed instruction with visual aids of how to set up a broken links checker and why, see our [step-by-step tutorial](https://blog.apify.com/step-by-step-guide-to-using-broken-links-checker/) on checking for broken links.
## What's happening under the hood?
Broken Links Checker will start the link check at a given URL and will crawl all linked pages under that website. So for example, if the crawler starts at ```https://www.example.com/something```, then it will also crawl linked pages such as:```
https://www.example.com/something/index.html
https://www.example.com/something/else
https://www.example.com/something/even/more/deeper/file.html
```
On every checked page, the crawler will also analyze whether links to other pages are working or not. For example, if the page contains a link to `https://www.example.com/another/page#anchor`, the actor will open the inspected page ```https://www.example.com/another/page```, check whether it loads correctly and then it also check if it contains the `#anchor`.## Input options
If this actor is run on the [Apify platform](https://console.apify.com/), our user-friendly UI will help you configure all the necessary and optional parameters of this scraper before running it. Our Broken Links Checker recognizes the following input fields:**Website URL**
The initial URL to start the broken links inspection from.**Max pages**
Use this field to set the maximum number of pages to be checked. If left empty, the number will be unlimited.
**Notification emails**
Add the email address to receive a notification after the crawler discovers all broken links.**Save only broken links**
If set to `true`, you'll get only the broken links in the report. If set to `false`, the crawler will include into the report both broken and healthy links (not a CSV friendly option).
**Crawl subdomains**
If set to `true`, the crawler will search broken links not only on the main page but also in deeper subdomains.For more technical details on the input, head over to the [Input tab](https://apify.com/jancurn/find-broken-links/input-schema).
### Input example
Here's an input example for checking the Apify Blog for bad links. We've enabled the crawler to check subdomains as well but limited the inspection to 1,000 pages.
```json
{
"baseUrl": "https://blog.apify.com",
"maxPages": 1000,
"notificationEmails": [
"[email protected]"
],
"saveOnlyBrokenLinks": true,
"crawlSubdomains": true
}
```
## Output
Once the links checker finishes the crawl, it will save a report of the broken links into your **key-value store**. You will find reports in two formats there:- `OUTPUT` contains a machine-readable JSON report
- `OUTPUT.html` contains an easy-to-read HTML report### Output example as JSON
Here's an example of dataset of a successful Broken Links Checker run. The error message is included in the report and can be found at the bottom of the example.
```json
[
{
"url": "https://blog.apify.com",
"title": "Apify Blog: Web scraping and automation stories",
"links": [
{
"url": "https://apify.com/",
"normalizedUrl": "https://apify.com",
"httpStatus": 200,
"fragment": "",
"fragmentValid": true,
"crawled": true
},
{
"url": "https://apify.com/about",
"normalizedUrl": "https://apify.com/about",
"httpStatus": 200,
"fragment": "",
"fragmentValid": true,
"crawled": true
},
{
"url": "https://apify.com/jobs",
"normalizedUrl": "https://apify.com/jobs",
"httpStatus": 200,
"fragment": "",
"fragmentValid": true,
"crawled": true
},
{
"url": "https://apify.com/web-scraping",
"normalizedUrl": "https://apify.com/web-scraping",
"httpStatus": null,
"errorMessage": "Error: Navigation timed out after 120 seconds.\n at handleRequestTimeout (/home/myuser/node_modules/apify/build/crawlers/crawler_utils.js:19:11)\n at PuppeteerCrawler._handleNavigationTimeout (/home/myuser/node_modules/apify/build/crawlers/browser_crawler.js:418:54)\n at PuppeteerCrawler._handleNavigation (/home/myuser/node_modules/apify/build/crawlers/browser_crawler.js:401:18)\n at async PuppeteerCrawler._handleRequestFunction (/home/myuser/node_modules/apify/build/crawlers/browser_crawler.js:343:13)\n at async wrap (/home/myuser/node_modules/@apify/timeout/index.js:73:27)",
"fragment": "",
"fragmentValid": true,
"crawled": true
},
...
```
## Other SEO audit tools
You can find other free SEO tools in the [Related actors](https://apify.com/jancurn/find-broken-links/related-actors) tab: Web Page Analyzer, SEO Audit Tool, Google Search Results Scraper. You should also check out [5 powerful scrapers to add to your SEO tool kit](https://blog.apify.com/5-powerful-scrapers-to-add-to-your-seo-toolkit/).