{"id":18451518,"url":"https://github.com/geoffreybauduin/website-checker","last_synced_at":"2025-04-19T14:02:31.887Z","repository":{"id":57708318,"uuid":"228075272","full_name":"geoffreybauduin/website-checker","owner":"geoffreybauduin","description":"Performs useful checks against a website, such as 404 errors reporting, structured data validation...","archived":false,"fork":false,"pushed_at":"2019-12-15T15:54:46.000Z","size":21,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-16T14:07:28.385Z","etag":null,"topics":["crawler","seo","structured-data","web-spider","website"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/geoffreybauduin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-14T19:11:20.000Z","updated_at":"2022-11-23T03:09:05.000Z","dependencies_parsed_at":"2022-09-14T13:12:27.137Z","dependency_job_id":null,"html_url":"https://github.com/geoffreybauduin/website-checker","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geoffreybauduin%2Fwebsite-checker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geoffreybauduin%2Fwebsite-checker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geoffreybauduin%2Fwebsite-checker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geoffreybauduin%2Fwebsite-checker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/geoffreybauduin","download_url":"https://codeload.github.com/geoffreybauduin/website-checker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249494566,"owners_count":21281662,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","seo","structured-data","web-spider","website"],"created_at":"2024-11-06T07:28:55.126Z","updated_at":"2025-04-18T12:40:39.842Z","avatar_url":"https://github.com/geoffreybauduin.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# website-checker\n\n[![Go Report Card](https://goreportcard.com/badge/github.com/geoffreybauduin/website-checker)](https://goreportcard.com/report/github.com/geoffreybauduin/website-checker)\n\nPerforms multiple checks against your website, mostly:\n\n- Goes through every `img`, `script`, `a`, `link[rel=\"stylesheet\"]` tags, and stores the availability of the resource\n- Stores a map of the dependencies between each resource\n- Can perform validation of structured data using [Yandex Structured Data Validator](https://tech.yandex.com/validator/doc/dg/concepts/about-docpage/)\n\n## Installation\n\n```\ngo get -u github.com/geoffreybauduin/website-checker\ngo install github.com/geoffreybauduin/website-checker/cmd/website-checker\n```\n\n## Usage\n\n```\nusage: website-checker --urls=URLS [\u003cflags\u003e]\n\nChecks 404 and other stuff by crawling your website\n\nFlags:\n  --help                         Show context-sensitive help (also try --help-long and --help-man).\n  --workers=10                   Number of workers to perform the work\n  --urls=URLS ...                URLs to check\n  --ignore-urls=IGNORE-URLS ...  Ignore those URLs and do not attempt to fetch them. Expecting a regexp\n  --no-external-inspection       Do not inspect external urls\n  --check-structured-data=CHECK-STRUCTURED-DATA  \n                                 Check structured data validity\n  --yandex-api-key=YANDEX-API-KEY  \n                                 Yandex API Key\n```\n\n### Explained examples\n\n```\nwebsite-checker --urls http://localhost:1313 --no-external-inspection --ignore-urls \"^https://docs\\.google\\.com\" --workers=5 --check-structured-data=yandex --yandex-api-key=1234\n```\n\n- Will crawl the website located at http://localhost:1313 and all its dependencies.\n- Will not fetch the dependencies from the pages that are outside of the host `localhost:1313`.\n- Any url starting with `https://docs.google.com` will be ignored.\n- Will perform 5 tasks in parallel\n- Will fetch any `application/ld+json` script tags and perform validation of those against Yandex API, using the api key `1234`\n\n## Contributing\n\nSee [CONTRIBUTING.md](https://github.com/geoffreybauduin/website-checker/blob/master/CONTRIBUTING.md)\n\n## License\n\nMIT License\n\nCopyright (c) 2019 Geoffrey Bauduin\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeoffreybauduin%2Fwebsite-checker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgeoffreybauduin%2Fwebsite-checker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeoffreybauduin%2Fwebsite-checker/lists"}