{"id":29151139,"url":"https://github.com/azshurith/depth-crawler","last_synced_at":"2026-04-20T14:06:03.862Z","repository":{"id":300734806,"uuid":"913136332","full_name":"Azshurith/Depth-Crawler","owner":"Azshurith","description":"A simple yet powerful Python web crawler that explores a given domain up to a specified depth and outputs a JSON sitemap of URLs and page titles.","archived":false,"fork":false,"pushed_at":"2025-06-23T09:31:18.000Z","size":11,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-01T00:08:02.062Z","etag":null,"topics":["crawler","puppeteer","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Azshurith.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-07T05:23:58.000Z","updated_at":"2025-06-23T10:59:09.000Z","dependencies_parsed_at":"2025-06-23T10:43:01.345Z","dependency_job_id":null,"html_url":"https://github.com/Azshurith/Depth-Crawler","commit_stats":null,"previous_names":["azshurith/depth-crawler-python"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Azshurith/Depth-Crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azshurith%2FDepth-Crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azshurith%2FDepth-Crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azshurith%2FDepth-Crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azshurith%2FDepth-Crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Azshurith","download_url":"https://codeload.github.com/Azshurith/Depth-Crawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azshurith%2FDepth-Crawler/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265262555,"owners_count":23736413,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","puppeteer","python"],"created_at":"2025-07-01T00:08:02.082Z","updated_at":"2025-09-22T17:38:13.759Z","avatar_url":"https://github.com/Azshurith.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Depth Crawler (Python)\n\nA simple yet powerful **Python** web crawler that explores a given domain up to a specified depth and outputs a JSON sitemap of URLs and page titles.\n\n## 🚀 Features\n\n- Crawls recursively within a domain up to your chosen depth  \n- Records each page’s URL and its HTML `\u003ctitle\u003e`  \n- Outputs results as a JSON file  \n- Displays crawl stats: pages visited, links found, total \u0026 average time per link\n\n## 🔧 Requirements\n\n- Python 3.x  \n- `requests` – for HTTP requests  \n- `beautifulsoup4` – for parsing HTML\n\n## ⚙️ Installation\n\nClone the repository and install dependencies:\n\n```bash\ngit clone https://github.com/Azshurith/Depth-Crawler-Python.git\ncd Depth-Crawler-Python\nmake install\n```\n\n## 🛠 Makefile Commands\n\nThe project includes a simple `Makefile` with the following commands:\n\n- **Install dependencies**  \n  ```bash\n  make install\n  ```\n\n- **Run the crawler**  \n  ```bash\n  make build\n  ```\n\nThis will execute `python ./src/Main.py`.\n\n## 🎯 Usage\n\nRun the crawler:\n```bash\nmake build\n```\n\n- You’ll be prompted to enter the target URL (e.g., `https://example.com`) and crawl depth.\n- A file called `sitemap.json` will be generated with the results.\n\n## 📊 Output Format\n\n```json\n[\n  {\n    \"url\": \"https://example.com\",\n    \"title\": \"Example Domain\",\n    \"links\": [\"https://example.com/page1\", ...]\n  },\n  ...\n]\n```\n\n## 📈 Crawl Report\n\nAfter the run, you'll see a summary showing:\n\n- Total pages visited  \n- Total links gathered  \n- Max depth reached  \n- Total time taken  \n- Average time per link\n\n## ✅ How to Contribute\n\n- Fork the repo and submit PRs for improvements  \n- Open issues for bugs or feature suggestions\n\n## 📝 License\n\nAdd your license info here (e.g., MIT)\n\n## 👤 Author\n\nYour name or GitHub profile link here\n\n---\n\n**Happy crawling!**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazshurith%2Fdepth-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fazshurith%2Fdepth-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazshurith%2Fdepth-crawler/lists"}