https://github.com/endlesstrax/pyanchor
Find your broken links, so users don't.
https://github.com/endlesstrax/pyanchor
anchor-tags cli dead-links hacktoberfest link-checker python python3
Last synced: 9 months ago
JSON representation
Find your broken links, so users don't.
- Host: GitHub
- URL: https://github.com/endlesstrax/pyanchor
- Owner: EndlessTrax
- License: mit
- Created: 2020-07-28T19:57:05.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2025-02-02T00:04:53.000Z (12 months ago)
- Last Synced: 2025-03-29T04:07:00.835Z (10 months ago)
- Topics: anchor-tags, cli, dead-links, hacktoberfest, link-checker, python, python3
- Language: Python
- Homepage: https://pypi.org/project/pyanchor/
- Size: 567 KB
- Stars: 67
- Watchers: 4
- Forks: 8
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# PyAnchor
[](https://badge.fury.io/py/pyanchor)

Dead links are an annoyance for websites with an extensive amount of content. A side from the
negative impact on SEO, dead links are an annoyance for any user that clicks on one.
PyAnchor is primarily for checking the HTTP response on all links on a page. You can integrate it
into your development workflow so that users never see a 404 in the first place.
## Install
> Requires Python 3.8 and above.
It is recommended that you install this package in a virtual or isoloated environment. The easiest way to do this
is with pipx.
```shell
pipx install pyanchor
```
Alternatively, you can install it with pip into your virtual environment:
MacOS / Linux:
```shell
python3 -m pip install pyanchor
```
Windows:
```pwsh
python -m pip install pyanchor
```
## Using the CLI
The CLI can be invoked with the `pyanchor` command. A URL **must** be provided unless it's the help page.
To get the help page:
```shell
pyanchor --help
```

Basic example for a single page:
> Note: all provided URLs must include a valid HTTP scheme.
```shell
pyanchor https://mysite.com/
```

If you want to check all links on a website, and not just a single page, a `sitemap.xml` URL may be provided and
flagged with `--sitemap`.
Example:
```shell
pyanchor https://mysite.com/sitemap.xml --sitemap
```

By default, successful requests are not printed to the terminal. To see all urls with a `200`
response add the `--verbose` flag.
```shell
pyanchor https://mysite.com --verbose
```

```shell
pyanchor https://mysite.com/sitemap.xml --sitemap --verbose
```

If you wish the output the results to a csv file, instead of to the terminal (default), then you may wish to use
the `--output-csv` flag:
```shell
pyanchor https://mysite.com --output-csv output/path/to/file
```
## But wait, there's more...
To integrate PyAnchor into your application, you can import the `LinkResults` class. `LinkResults`
requires a URL.
Example:
```
>>> from pyanchor.link_checker import LinkResults
>>> r = LinkResults("https://mysite.com/")
>>> r.results
{200: ["https://mysite.com/about/", "https://mysite.com/contact/"], 500: ["https://mysite.com/doh!/"]}
```
As you can see the `results` attribute is a dictionary containing all response codes returned as a
dictionary key, with a list of URLs that achieve that response code as the dictionary value.
### Analyzing Links
PyAnchor give you the ability to use the `LinkAnalysis` class to check the links in a given URL for unsafe and obsolete attributes.
To check for obsolete attributes use the `obsolete_attrs` property:
```
>>> from pyanchor.link_checker import LinkAnalysis
>>> r = LinkAnalysis("https://mysite.com/")
>>> r.obsolete_attrs
{'/about/link-1': ['charset', 'rev'], '/about/link-2': ['name']}
```
Likewise you can check for unsafe linkes with `unsafe_attrs`:
```
>>> from pyanchor.link_checker import LinkAnalysis
>>> r = LinkAnalysis("https://mysite.com/")
>>> r.unsafe_attrs
{Link 4: True, Link 5: False}
```
Any link that **does not** include `rel="noopener"` when the `target` attribute is used will return `True`. As in, **it is True that this link is unsafe**. Therfore, links with appropriate attributes will return `False`.
## Feedback
If you find a bug, please [file an issue](https://github.com/EndlessTrax/pyanchor/issues).
If you have feature requests, please [file an issue](https://github.com/EndlessTrax/pyanchor/issues)
and use the appropriate label.
## Support
If you would like to show your support for the project,
[you can sponsor me on Github](https://github.com/sponsors/EndlessTrax)? 🤓
## How to Contribute
Please **raise an issue before making a PR**, so that the issue and implementation can be discussed before you write any code. This will save you time, and increase the chances of your PR being merged without significant changes.
Please **make PR's on a new branch**, and _not_ on main/master.
Please **include tests** for any PR's that include code (unless current tests cover your code contribution).
Please **add documentation** for any new features or flags.
## Contributors
Thank you to:
- [Zingzy](https://github.com/Zingzy) for PR [#55](https://github.com/EndlessTrax/pyanchor/pull/55)
- [mrbazzan](https://github.com/mrbazzan) for PR [#25](https://github.com/EndlessTrax/pyanchor/pull/25)
- [Icelain](https://github.com/Icelain) for PR [#11](https://github.com/EndlessTrax/pyanchor/pull/11)
- [wevnasc](https://github.com/wevnasc) for PR [#8](https://github.com/EndlessTrax/pyanchor/pull/8)
- [muditshamz](https://github.com/muditshamz) for PR [#6](https://github.com/EndlessTrax/pyanchor/pull/6)