{"id":18456541,"url":"https://github.com/akashrajpurohit/node-crawler","last_synced_at":"2026-04-27T18:32:22.041Z","repository":{"id":42070604,"uuid":"188354121","full_name":"AkashRajpurohit/node-crawler","owner":"AkashRajpurohit","description":"Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain","archived":false,"fork":false,"pushed_at":"2023-12-05T02:15:24.000Z","size":49,"stargazers_count":1,"open_issues_count":4,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-03-11T13:34:00.309Z","etag":null,"topics":["crawler","node-crawler","nodejs","url"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AkashRajpurohit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"ko_fi":"akashrajpurohit","github":"AkashRajpurohit","custom":["https://paypal.me/RajpurohitAkash"]}},"created_at":"2019-05-24T04:50:14.000Z","updated_at":"2024-09-03T12:51:20.000Z","dependencies_parsed_at":"2024-11-06T08:12:04.765Z","dependency_job_id":"9a4f1f0c-ae3c-4df0-9b3f-566cadcbcd97","html_url":"https://github.com/AkashRajpurohit/node-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AkashRajpurohit/node-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashRajpurohit%2Fnode-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashRajpurohit%2Fnode-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashRajpurohit%2Fnode-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashRajpurohit%2Fnode-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AkashRajpurohit","download_url":"https://codeload.github.com/AkashRajpurohit/node-crawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashRajpurohit%2Fnode-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32349471,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T17:12:42.749Z","status":"ssl_error","status_checked_at":"2026-04-27T17:12:41.658Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","node-crawler","nodejs","url"],"created_at":"2024-11-06T08:11:56.033Z","updated_at":"2026-04-27T18:32:22.025Z","avatar_url":"https://github.com/AkashRajpurohit.png","language":"JavaScript","funding_links":["https://ko-fi.com/akashrajpurohit","https://github.com/sponsors/AkashRajpurohit","https://paypal.me/RajpurohitAkash"],"categories":[],"sub_categories":[],"readme":"# Nodejs Crawler\n\n### It is a basic nodejs crawler to crawl any domain and get all the urls from that domain\n\nSample Input HTML page server at ```localhost:4000```\n\n```html\n\u003c!DOCTYPE html\u003e\n\u003chtml lang=\"en\"\u003e\n\u003chead\u003e\n\t\u003cmeta charset=\"UTF-8\"\u003e\n\t\u003ctitle\u003eHello World\u003c/title\u003e\n\u003c/head\u003e\n\u003cbody class=\"body-color\"\u003e\n\t\u003cnav\u003e\n\t\t\u003cul\u003e\n\t\t\t\u003cli\u003e\u003ca href=\"/index.html\"\u003eHome\u003c/a\u003e\u003c/li\u003e\n\t\t\t\u003cli\u003e\u003ca href=\"/about.html\"\u003eAbout\u003c/a\u003e\u003c/li\u003e\n\t\t\t\u003cli\u003e\u003ca href=\"/contact.html\"\u003eContact\u003c/a\u003e\u003c/li\u003e\n\t\t\t\u003cli\u003e\u003ca href=\"/blog.html\"\u003eBlogs\u003c/a\u003e\u003c/li\u003e\n\t\t\u003c/ul\u003e\n\t\u003c/nav\u003e\n\t\u003csection class=\"main\"\u003e\n\t\t\u003ch1 class=\"red full-width\"\u003eHello\u003c/h1\u003e\n\t\t\u003ch3 class=\"blue full-width\"\u003eWorld\u003c/h3\u003e\n\t\t\u003cp\u003eLorem ipsum dolor sit amet, consectetur adipisicing elit. Nulla, laudantium, omnis. Ea quaerat minima, nostrum doloremque repellendus! Ratione quasi, non eligendi quidem at culpa animi vitae id eius corrupti deleniti.\n\t\t\t\u003cimg src=\"https://fakeimg.pl/300/\" alt=\"Some image\" /\u003e\n\t\t\u003c/p\u003e\n\t\u003c/section\u003e\n\t\u003cmain class=\"container\"\u003e\n\t\t\u003cp\u003eThis is some more dummy text\u003c/p\u003e\n\t\t\u003ch4\u003eLorem ipsum dolor sit amet, consectetur adipisicing elit. Quaerat vitae dolor, atque, excepturi numquam cumque ut iusto, odio perferendis cum rem saepe eveniet voluptatum fuga debitis et illo distinctio eligendi!\u003c/h4\u003e\n\t\u003c/main\u003e\n\t\u003cdiv class=\"div_content\"\u003e\n\t\tHi there, this is empty div with no children :(\n\t\u003c/div\u003e\n\t\u003csection class=\"different\"\u003e\n\t\t\u003cp id=\"p-id\" data-attr=\"custom-attribute\"\u003eDifferent section\u003c/p\u003e\n\t\u003c/section\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n```\n\nOutput:\n```\n💻💻💻 Scraping...\n\n{ links:\n   [ { linkText: 'Home', linkUrl: '/index.html' },\n     { linkText: 'About', linkUrl: '/about.html' },\n     { linkText: 'Contact', linkUrl: '/contact.html' },\n     { linkText: 'Blogs', linkUrl: '/blog.html' } ],\n  requestTime: 64,\n  title: 'Hello World',\n  url: 'http://localhost:4000' }\n\n🥳🥳🥳 Done...\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakashrajpurohit%2Fnode-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakashrajpurohit%2Fnode-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakashrajpurohit%2Fnode-crawler/lists"}