{"id":20109847,"url":"https://github.com/hyper63/hyper-adapter-spider","last_synced_at":"2025-11-27T08:02:01.658Z","repository":{"id":48393505,"uuid":"384938382","full_name":"hyper63/hyper-adapter-spider","owner":"hyper63","description":"Spider is an adapter for the hyper crawler port/service","archived":false,"fork":false,"pushed_at":"2021-07-28T12:01:38.000Z","size":144,"stargazers_count":0,"open_issues_count":5,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-02T18:32:34.412Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hyper63.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-11T12:07:02.000Z","updated_at":"2021-07-28T11:37:15.000Z","dependencies_parsed_at":"2022-09-14T00:31:52.818Z","dependency_job_id":null,"html_url":"https://github.com/hyper63/hyper-adapter-spider","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":"hyper63/adapter-template","purl":"pkg:github/hyper63/hyper-adapter-spider","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fhyper-adapter-spider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fhyper-adapter-spider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fhyper-adapter-spider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fhyper-adapter-spider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hyper63","download_url":"https://codeload.github.com/hyper63/hyper-adapter-spider/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fhyper-adapter-spider/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286079811,"owners_count":27282121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-27T02:00:05.795Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T18:09:36.879Z","updated_at":"2025-11-27T08:02:01.643Z","avatar_url":"https://github.com/hyper63.png","language":"JavaScript","readme":"\u003ch1 align=\"center\"\u003ehyper-adapter-spider\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\nThis spider adapter allows clients to create web crawling jobs that can be invoked using the\n`start` method, then the spider will use the source url to build a list of links that it should\ncrawl for a given site. For each link the spider will pull down the content in a headless browser\nand run a script command within the browser dom. This script command needs to return a object with\na title property and content property that will be used to generate a search document. The search document\ncan then be posted to a target endpoint for consumption by a search engine or AI algorithm.\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\nThis spider requires a serverless implementation of a headless browser to build the links required to\ncrawl and to create the content document for each link. Embedded in this project is an architect app\nthat creates two serverless endpoints for the headless browser. See [Setup](#setup) for more information on\nhow to initialize the headless browser and how to pass it to the adapter.\n\u003c/p\u003e\n\n## Table of Contents\n\n- [Getting Started](#getting-started)\n- [Configuration](#configuration)\n- [Setup](#setup)\n- [Testing](#testing)\n\n## Getting Started\n\nIn order to use this adapter, you will need an AWS Account, and a IAM user, this\nuser will need access to S3, the IAM credentials should have full access to s3\nbuckets `hyper-crawler-*`. You will also need an IAM user that can deploy lambda\nfunctions using architect. NOTE: this does not have to be the same IAM user\naccount.\n\nPassing credentials, you can choose to manually pass the credentials via env\nvariables or explicitly through the hyper adapter configuration.\n\n## Configuration\n\n```js\nconst links = \"https://aws.xxxx.com/links\"\nconst content = \"https://aws.xxxx.com/content\"\n...\nadapters: [\n  { port: 'crawler', plugins: [spider({links, content})]}\n]\n...\n```\n\n## Setup\n\nDeploy architect app (requires nodejs)\n\n- Install aws cli\n- Install `npm i -g @architect/architect aws-sdk`\n- Setup your aws credentials\n\n```\ncd arc\narc deploy production\n```\n\n\u003e NOTE: If you would like to setup a staging env you can run `arc deploy`\n\n## Testing\n\nrun `./scripts/test.sh` to lint, check format, and run tests\n\nrun `./scripts/harness.sh` to spin up a local instance of `hyper` using your\nadapter for the data port\n\n## TODO\n\n- Add automation to set adapter name\n- Add automation to set `port`\n- Add automation to scaffold adapter methods, based on selected `port`\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyper63%2Fhyper-adapter-spider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhyper63%2Fhyper-adapter-spider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyper63%2Fhyper-adapter-spider/lists"}