{"id":39469277,"url":"https://github.com/3mdeb/seo-spy","last_synced_at":"2026-01-18T04:53:08.670Z","repository":{"id":183824794,"uuid":"670815296","full_name":"3mdeb/seo-spy","owner":"3mdeb","description":"SEO Spy is a Python-based web scraping tool that functions as an SEO error checking tool, leveraging the capabilities of the renowned web scraper Scrapy.","archived":false,"fork":false,"pushed_at":"2023-08-03T01:23:09.000Z","size":22,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2023-08-11T08:01:10.469Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/3mdeb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-07-25T22:52:01.000Z","updated_at":"2023-07-25T22:52:38.000Z","dependencies_parsed_at":null,"dependency_job_id":"22d285c7-cd1e-462c-808e-3332635d8c78","html_url":"https://github.com/3mdeb/seo-spy","commit_stats":null,"previous_names":["3mdeb/seo-spy"],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/3mdeb/seo-spy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3mdeb%2Fseo-spy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3mdeb%2Fseo-spy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3mdeb%2Fseo-spy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3mdeb%2Fseo-spy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/3mdeb","download_url":"https://codeload.github.com/3mdeb/seo-spy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/3mdeb%2Fseo-spy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28530404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-18T04:53:08.559Z","updated_at":"2026-01-18T04:53:08.654Z","avatar_url":"https://github.com/3mdeb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SEO Spy\n\nSEO Spy is a Python-based web scraping tool that functions as an SEO error\nchecking tool, leveraging the capabilities of the renowned web scraper\n[Scrapy](https://scrapy.org/).\n\n## Installation\n\nCreate new python virtual environment.\n\n``` bash\n$ virtualenv venv\n```\n\nActivate virtual environment.\n\n```bash\n$ source venv/bin/activate\n```\n\nInstall requirements.\n\n```bash\npip install -r requirements.txt\n```\n\n## Usage\n\n```bash\nusage: main.py [-h] -d DOMAIN (-o | -c)\n\nSEO Spy is a Python-based web scraping tool that functions as an SEO error\nchecking tool, leveraging the capabilities of the renowned web scraper\nScrapy.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d DOMAIN, --domain DOMAIN\n                        URL of the tested domain. Examples:\n                        http://127.0.0.1:8000 https://docs.dasharo.com\n  -o, --orphan          Run orphan pages check\n  -c, --canonical       Run canonical links check\n```\n\n## Current features\n\n### Orphaned Pages\n\nOrphan pages are web pages within a website that lack incoming internal links\nfrom other pages, rendering them isolated and not accessible through navigation\nor internal linking structures. Due to the absence of internal links,\nsearch engines may struggle to discover and index these pages, leading\nto reduced visibility in search results.\n\nSEO Spy identifies sites that are in the site map, but have no internal links\nleading to them.\n\n#### Example output\n\n```bash\n2023-07-25 23:56:57 [orphan_pages_spider] ERROR: Orphan pages found:\n2023-07-25 23:56:57 [orphan_pages_spider] ERROR: http://127.0.0.1:8000/variants/protectli_ptx01/hardware-matrix/\n2023-07-25 23:56:57 [orphan_pages_spider] ERROR: http://127.0.0.1:8000/variants/protectli_ptx01/test-matrix/\n2023-07-25 23:56:57 [scrapy.statscollectors] INFO: Dumping Scrapy stats:\n{'custom/orphan_pages': ['http://127.0.0.1:8000/variants/protectli_ptx01/hardware-matrix/',\n                         'http://127.0.0.1:8000/variants/protectli_ptx01/test-matrix/'],\n...\n2023-07-25 23:56:57 [scrapy.core.engine] INFO: Spider closed (finished)\n================================================\nOrphan pages found:\n================================================\nhttp://127.0.0.1:8000/variants/protectli_ptx01/hardware-matrix/\nhttp://127.0.0.1:8000/variants/protectli_ptx01/test-matrix/\n```\n\n### Canonical links\n\nCanonical links, also known as canonical tags or rel=\"canonical\" links,\nare HTML elements used to address duplicate content issues on the internet.\nWhen multiple versions of the same content exist on different URLs, website\nowners and developers can add a canonical link tag to the HTML header of the\nduplicate pages. This tag specifies the URL of the preferred version\n(the canonical page) that should be considered as the main or authoritative\nsource. Search engines then understand that the canonical URL is the primary\none to index and display in search results, consolidating the ranking signals\nfor all duplicate versions onto the preferred URL. By using canonical links,\nwebsite owners can improve search engine optimization (SEO) efforts and ensure\nthat search engines attribute the content's relevance and authority to a single,\npreferred page, avoiding dilution of search rankings and confusion\nin search results.\n\nSEO Spy identifies sites that have no canonical links.\n\n#### Example output\n\n```bash\n2023-07-26 19:13:26 [canonical_link_spider] ERROR: Pages with no canonical link found:\n2023-07-26 19:13:26 [canonical_link_spider] ERROR: https://3mdeb.com/tags/\n2023-07-26 19:13:26 [canonical_link_spider] ERROR: https://3mdeb.com/categories/\n2023-07-26 19:13:26 [scrapy.statscollectors] INFO: Dumping Scrapy stats:\n{'custom/canonical': ['https://3mdeb.com/tags/',\n                      'https://3mdeb.com/categories/'],\n 'downloader/request_bytes': 7340,\n 'downloader/request_count': 28,\n...\n================================================\nPages with no canonical link found:\n================================================\nhttps://3mdeb.com/tags/\nhttps://3mdeb.com/categories/\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F3mdeb%2Fseo-spy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F3mdeb%2Fseo-spy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F3mdeb%2Fseo-spy/lists"}