{"id":22189970,"url":"https://github.com/phatpham9/scraper","last_synced_at":"2026-05-14T20:32:37.451Z","repository":{"id":47999934,"uuid":"113737863","full_name":"phatpham9/scraper","owner":"phatpham9","description":"An html scraper microservice based on x-ray \u0026 micro","archived":false,"fork":false,"pushed_at":"2023-01-24T07:28:06.000Z","size":811,"stargazers_count":2,"open_issues_count":15,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-09-11T18:50:45.475Z","etag":null,"topics":["es6","html-scraper","joi","micro","microservice","nodejs","scraper","x-ray"],"latest_commit_sha":null,"homepage":"https://scraper.fun","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phatpham9.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-12-10T09:27:50.000Z","updated_at":"2023-01-24T07:13:02.000Z","dependencies_parsed_at":"2023-02-13T19:02:34.870Z","dependency_job_id":null,"html_url":"https://github.com/phatpham9/scraper","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/phatpham9/scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phatpham9%2Fscraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phatpham9%2Fscraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phatpham9%2Fscraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phatpham9%2Fscraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phatpham9","download_url":"https://codeload.github.com/phatpham9/scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phatpham9%2Fscraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33042159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["es6","html-scraper","joi","micro","microservice","nodejs","scraper","x-ray"],"created_at":"2024-12-02T11:40:54.563Z","updated_at":"2026-05-14T20:32:37.429Z","avatar_url":"https://github.com/phatpham9.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# scraper\n\nAn html scraper microservice based on x-ray \u0026 micro\n\n[![Package Version](https://img.shields.io/github/package-json/v/phatpham9/scraper.svg)]()\n[![Travis](https://img.shields.io/travis/phatpham9/scraper.svg)](https://travis-ci.org/phatpham9/scraper)\n[![David](https://img.shields.io/david/phatpham9/scraper.svg)](https://github.com/phatpham9/scraper)\n[![David Dev](https://img.shields.io/david/dev/phatpham9/scraper.svg)](https://github.com/phatpham9/scraper)\n\n## Features\n\n- [x-ray](https://github.com/matthewmueller/x-ray): An html scraper\n- [micro](https://github.com/zeit/micro): Asynchronous HTTP microservices\n- [joi](https://github.com/hapijs/joi): Object schema validation\n\n## Usage\n\n**Request**\n\nSend a `GET` request to `/scrape` endpoint with query string if:\n\n1. Scraping a text\n\n| Params     | Required | Description                           |\n|------------|----------|---------------------------------------|\n| s-url      | yes      | destination website url to be scraped |\n| s-selector | yes      | css selector of data to be extracted  |\n\n2. Scraping multiple of data objects\n\n| Params     | Required | Description                               |\n|------------|----------|-------------------------------------------|\n| s-url      | yes      | destination website url to be scraped     |\n| s-scope    | yes      | css selector of data's scope              |\n| s-limit    | no       | limit number of objects returned          |\n| [selector] | yes      | css selector of each data to be extracted |\n\n**Response**\n\nA text or an array of objects in json whose keys are specified selectors in the request's query string.\n\n## Examples\n\n### Scraping Bitcoin price in USD from [CoinMarketCap](coinmarketcap.com)\n\n- Request (uri encoded): `https://scraper.fun/scrape?s-url=https://coinmarketcap.com\u0026s-selector=%23id-bitcoin%20.price`\n- Response: as shown below\n\n\u003cimg style=\"text-align: center;\" src=\"./example-images/btc-price.png\" /\u003e\n\n### Scraping top 3 coins' price\n\n- Request (uri encoded): `https://scraper.fun/scrape?s-url=https://coinmarketcap.com\u0026s-scope=table%23currencies%20tbody%20tr\u0026name=.currency-name%20.currency-name-container\u0026price=.price\u0026s-limit=3`\n- Response: as shown below\n\n\u003cimg style=\"text-align: center;\" src=\"./example-images/top-3-price.png\" /\u003e\n\n## Development \u0026 deployment guide\n\n### Getting started\n\nMake sure [NodeJS](https://nodejs.org) (9.0.0 or newer), [Yarn](https://yarnpkg.com) or [NPM](https://npmjs.com) installed on your local machine. Then install project dependencies by running:\n\n```bash\nyarn\n```\n\n### Start developing\n\n```bash\nyarn start\n```\n\nThe service will be up at `127.0.0.1:9000` by default\n\n### Testing\n\nWe use ESLint to lint source code. Simply run:\n\n```bash\nyarn test\n```\n\n### Running in production mode\n\nBy the command:\n\n```bash\nPORT=80 yarn serve\n```\n\nThe app will be up at `127.0.0.1`\n\n### Deploy using Docker\n\nYou can use the existing docker image from https://hub.docker.com/r/phatpham9/scraper by running:\n\n```bash\ndocker pull phatpham9/scraper\ndocker run -d -p 80:80 phatpham9/scraper\n```\n\nThe app will be up at `127.0.0.1`\n\n### Deploy to CaptainDuckDuck\n\n[CaptainDuckDuck](https://github.com/githubsaturn/captainduckduck) is a nice heroku-liked tool to deploy your apps easily. You need to install CaptainDuckDuck client on your local, follow [the instruction here](https://github.com/githubsaturn/captainduckduck/wiki/Getting-Started) to do it then run on your local:\n\n```bash\ncaptainduckduck deploy\n```\n\nThat's it!\n\n### Deploy to Heroku\n\nClick the below button to deploy to Heroku dyno\n\n[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy)\n\n## Contributing\n\n1. Fork this repository to your own GitHub account and then clone it to your local device\n2. Follow the Development guide or just simply run: `yarn start`\n3. Lint code by running: yarn test\n4. Create a pull request for us\n\n## Contributing\n\n* Phat Pham ([@phatpham9](https://github.com/phatpham9))\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphatpham9%2Fscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphatpham9%2Fscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphatpham9%2Fscraper/lists"}