{"id":20206318,"url":"https://github.com/lsegg/scraper-api-challenge","last_synced_at":"2026-06-09T15:01:19.869Z","repository":{"id":259169630,"uuid":"876121269","full_name":"lsegg/scraper-api-challenge","owner":"lsegg","description":"Data extraction package which supports CLI and API requests.","archived":false,"fork":false,"pushed_at":"2024-10-22T02:49:13.000Z","size":113,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-03T09:43:20.462Z","etag":null,"topics":["css-selectors","html-scraper","scraper"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lsegg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-21T12:41:05.000Z","updated_at":"2024-10-22T02:54:20.000Z","dependencies_parsed_at":"2024-10-23T07:42:41.758Z","dependency_job_id":"c284745d-9f09-4a98-ab4c-3a2177cce3a0","html_url":"https://github.com/lsegg/scraper-api-challenge","commit_stats":null,"previous_names":["lsegg/scraper-api-challenge"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lsegg/scraper-api-challenge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsegg%2Fscraper-api-challenge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsegg%2Fscraper-api-challenge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsegg%2Fscraper-api-challenge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsegg%2Fscraper-api-challenge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lsegg","download_url":"https://codeload.github.com/lsegg/scraper-api-challenge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsegg%2Fscraper-api-challenge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34112225,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["css-selectors","html-scraper","scraper"],"created_at":"2024-11-14T05:23:02.312Z","updated_at":"2026-06-09T15:01:19.717Z","avatar_url":"https://github.com/lsegg.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scraper API and CLI\n\nThis is a step by step guide on how to use this simple data extraction package which supports CLI and API requests.\n\n## Built with 🛠️\n\n- [Axios](https://axios-http.com/) - Promise based HTTP client\n- [Cheerio](https://cheerio.js.org/) - Library for parsing and manipulating HTML\n- [Express](https://expressjs.com/) - Web framework for Node.js\n- [Jest](https://jestjs.io/) - JavaScript Testing Framework\n- [Node.js](https://nodejs.org/) - JavaScript runtime environment\n- [NPM](https://www.npmjs.com/) - Package manager for Node.js\n\n## Installation ⚙️\n\n1. Run `npm i` to install the package dependencies.\n\n## CLI Usage ✅\n\nRun:\n\n```\n  node cli-scraper.js \u003chtmlSource\u003e \u003cselectorSource\u003e\n```\n\n- _htmlSource_ can either be an html file or a web URl.\n- _selectorSource_ is a JSON of keys with css selectors as values.\n- In case of repetitive data, the property _\\_\\_root_ is required.\n\nE.g. `node cli-scraper.js examples/input1.html examples/selector1.json`\n\nThe results will be logged in the console and written in the _scrapedData.json_ file inside the examples folder.\n\n## API Usage ✅\n\n1. Run `npm run dev` to start the server.\n2. Use curl, postman or another API testing tool to make your API requests.\n3. The HTTP method should be POST and the body should be a JSON with html and selectors properties:\n\n- _html_ can either be an html file stringified or a web URl.\n- _selectors_ is an object of keys with css selectors as values.\n- In case of repetitive data, the property _\\_\\_root_ is required.\n\nE.g.\n\n```\ncurl -X POST http://localhost:3000/scrape -H \"Content-Type: application/json\" -d '{\"html\": \"https://github.com/\", \"selectors\": {\"title\": \"h1:first-child\"}}'\n```\n\n## Requirements ⚙️\n\n- [Node.js](https://nodejs.org/)\n- [NPM](https://www.npmjs.com/)\n- A text editor like [Visual Studio Code](https://code.visualstudio.com/)\n- An API testing platform like [Postman](https://www.postman.com/)\n\n## Notes 📋\n\n- I based my libraries decision on most popular and downloaded npm options.\n- The first example provided in the challenge description is wrong since there's no \"p\" element child of \"h1\"\n- The second example provided in the challenge description was modified to include tbody because of the cheerio load function [behaviour](https://cheerio.js.org/docs/basics/loading#load)\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsegg%2Fscraper-api-challenge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flsegg%2Fscraper-api-challenge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsegg%2Fscraper-api-challenge/lists"}