{"id":19687153,"url":"https://github.com/trinitronx/spyder","last_synced_at":"2026-05-10T12:49:35.404Z","repository":{"id":136206417,"uuid":"1557333","full_name":"trinitronx/Spyder","owner":"trinitronx","description":"A simple web spider written in python","archived":false,"fork":false,"pushed_at":"2012-05-03T00:21:48.000Z","size":122,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-10T07:46:59.329Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://lyraphase.com/wp/projects/spyder/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trinitronx.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-04-01T20:47:03.000Z","updated_at":"2021-03-12T07:18:24.000Z","dependencies_parsed_at":"2023-03-12T12:15:48.174Z","dependency_job_id":null,"html_url":"https://github.com/trinitronx/Spyder","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinitronx%2FSpyder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinitronx%2FSpyder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinitronx%2FSpyder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinitronx%2FSpyder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trinitronx","download_url":"https://codeload.github.com/trinitronx/Spyder/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240993952,"owners_count":19890419,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T18:33:15.747Z","updated_at":"2026-05-10T12:49:35.361Z","avatar_url":"https://github.com/trinitronx.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Spyder - A simple spider written in python\n\nWhen called on a url, it will spider the pages and any links found up to the depth specified.\nAfter it's done, it will print a list of resources that it found.\nCurrently, the resources it tries to find are:\n\nimages   -  any images found on the page (ie: \u003cimg src=\"THIS\"/\u003e)\nstyles   -  any external stylesheets found on the page.  CSS included via '@import' is currently only supported if within a style tag!\n            (ie: \u003clink rel=\"stylesheet\" src=\"THIS\"/\u003e  OR \u003cstyle\u003e@import url('THIS');\u003c/style\u003e )\nscripts  -  any external scripts found in the page (ie: \u003cscript src=\"THIS\"\u003e )\nlinks    -  any urls found on the page.  'Fragments' are discarded. (ie: \u003ca href=\"THIS#this-is-a-fragment\"\u003e )\nemails   -  any email addresses found on the page (ie: \u003ca href=\"mailto:THIS\"\u003e )\n\nInternally, it uses html.parser.HTMLParser to parse pages, and both urllib.request, urllib.parse for making requests and doing url parsing.\n\nUsage: Spyder.py -u http://www.example.com\n\nOptions:\n  -h, --help            show this help message and exit\n  -u URL, --url=URL     The url to start spidering from.\n  -d, --debug           Print debugging information (very verbose).\n  -l LEVEL, --level=LEVEL\n                        Specify recursion maximum depth level depth.  The\n                        default maximum depth is 5.\n  -H SPAN_HOSTS, --span-hosts=SPAN_HOSTS\n                        Enable spanning across hosts when spidering. The\n                        default is to limit spidering to one domain.\n  -F FILTER_HOSTS, --filter-hosts=FILTER_HOSTS\n                        After finished, filter the list of resources printed\n                        to the target domain. The default is to print ALL\n                        resources found.\n\n\nThe original reason I made this was to do some url discovery for website benchmarking.\nAn example script for doing something like this, 'www-benchmark.py', is included.  It uses apache benchmark as an example.\nEventually I'll be experimenting with 'siege' for benchmarking \u0026 server stress-testing.\n\n\nNOTE: Currently the spider can throw exceptions in certain cases (mainly character encoding stuff, but there are probably other bugs too)\n      Getting *working* character encoding detection is a goal, and is sorta-working... ish?  Help in this area would be appreciated!\n      Filtering the results by domain is almost working too\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrinitronx%2Fspyder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrinitronx%2Fspyder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrinitronx%2Fspyder/lists"}