{"id":24461661,"url":"https://github.com/ndmen/scrapingbakery","last_synced_at":"2026-05-18T02:32:55.750Z","repository":{"id":219337590,"uuid":"748141858","full_name":"ndmen/scrapingbakery","owner":"ndmen","description":"This repository contains the implementation of a web scraping API designed to retrieve product information from a specified URL. The API is built using NestJS and employs asynchronous processing to handle requests efficiently.","archived":false,"fork":false,"pushed_at":"2024-01-26T19:29:16.000Z","size":182,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-02T01:24:15.521Z","etag":null,"topics":["cache","cheerio","nestjs","scraper"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ndmen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-25T11:10:43.000Z","updated_at":"2024-01-26T19:30:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"405b0e90-de65-4c90-ba34-acdbeee36acb","html_url":"https://github.com/ndmen/scrapingbakery","commit_stats":null,"previous_names":["ndmen/scrapingbakery"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ndmen/scrapingbakery","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndmen%2Fscrapingbakery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndmen%2Fscrapingbakery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndmen%2Fscrapingbakery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndmen%2Fscrapingbakery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ndmen","download_url":"https://codeload.github.com/ndmen/scrapingbakery/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndmen%2Fscrapingbakery/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269982055,"owners_count":24507301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-11T02:00:10.019Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cache","cheerio","nestjs","scraper"],"created_at":"2025-01-21T04:29:16.699Z","updated_at":"2026-05-18T02:32:50.731Z","avatar_url":"https://github.com/ndmen.png","language":"TypeScript","funding_links":["https://opencollective.com/nest","https://paypal.me/kamilmysliwiec"],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"http://nestjs.com/\" target=\"blank\"\u003e\u003cimg src=\"https://nestjs.com/img/logo-small.svg\" width=\"200\" alt=\"Nest Logo\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n[circleci-image]: https://img.shields.io/circleci/build/github/nestjs/nest/master?token=abc123def456\n[circleci-url]: https://circleci.com/gh/nestjs/nest\n\n  \u003cp align=\"center\"\u003eA progressive \u003ca href=\"http://nodejs.org\" target=\"_blank\"\u003eNode.js\u003c/a\u003e framework for building efficient and scalable server-side applications.\u003c/p\u003e\n    \u003cp align=\"center\"\u003e\n\u003ca href=\"https://www.npmjs.com/~nestjscore\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/npm/v/@nestjs/core.svg\" alt=\"NPM Version\" /\u003e\u003c/a\u003e\n\u003ca href=\"https://www.npmjs.com/~nestjscore\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/npm/l/@nestjs/core.svg\" alt=\"Package License\" /\u003e\u003c/a\u003e\n\u003ca href=\"https://www.npmjs.com/~nestjscore\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/npm/dm/@nestjs/common.svg\" alt=\"NPM Downloads\" /\u003e\u003c/a\u003e\n\u003ca href=\"https://circleci.com/gh/nestjs/nest\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/circleci/build/github/nestjs/nest/master\" alt=\"CircleCI\" /\u003e\u003c/a\u003e\n\u003ca href=\"https://coveralls.io/github/nestjs/nest?branch=master\" target=\"_blank\"\u003e\u003cimg src=\"https://coveralls.io/repos/github/nestjs/nest/badge.svg?branch=master#9\" alt=\"Coverage\" /\u003e\u003c/a\u003e\n\u003ca href=\"https://discord.gg/G7Qnnhy\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/badge/discord-online-brightgreen.svg\" alt=\"Discord\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/nest#backer\" target=\"_blank\"\u003e\u003cimg src=\"https://opencollective.com/nest/backers/badge.svg\" alt=\"Backers on Open Collective\" /\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/nest#sponsor\" target=\"_blank\"\u003e\u003cimg src=\"https://opencollective.com/nest/sponsors/badge.svg\" alt=\"Sponsors on Open Collective\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://paypal.me/kamilmysliwiec\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/badge/Donate-PayPal-ff3f59.svg\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://opencollective.com/nest#sponsor\"  target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/badge/Support%20us-Open%20Collective-41B883.svg\" alt=\"Support us\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://twitter.com/nestframework\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/twitter/follow/nestframework.svg?style=social\u0026label=Follow\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n  \u003c!--[![Backers on Open Collective](https://opencollective.com/nest/backers/badge.svg)](https://opencollective.com/nest#backer)\n  [![Sponsors on Open Collective](https://opencollective.com/nest/sponsors/badge.svg)](https://opencollective.com/nest#sponsor)--\u003e\n\n## Description\n\nThis repository contains the implementation of a web scraping API designed to retrieve product information from a specified URL. The API is built using NestJS and employs asynchronous processing to handle requests efficiently.\n\nFeatures\n- Receives requests containing a product ID and initiates asynchronous processing.\n- Responds with an HTTP 200 status code and a unique process identifier upon request reception.\n- Initiates the scraping process of the target URL and transforms the website data into a unified JSON format.\n- Includes a 10-second timeout to simulate data processing.\n- Responds with a \"not ready\" status if queried with the process identifier during the timeout period.\n- Provides the final result via the same endpoint after the processing is complete.\n\nNote\n\n- This project uses NestJS and cache for processing purposes. In a real-world scenario, Redis would be used for processing, and PostgreSQL for storing results.\n\nData Retrieval Methods\n\nTo retrieve product information as per the requirements outlined in the task, the following methods were considered:\n\n1. Open Graph in Meta Tags: Parsing meta tags with Open Graph protocol to extract product information.\n\n2. Schema Parsing: Extracting product details from structured data using schema markup.\n\n3. HTML Markup Parsing: Parsing HTML markup to identify and extract product information.\n\n4. Script Tag Parsing: Extracting data from JavaScript scripts embedded within the HTML.\n\nFor the given task, the preferred method of data retrieval was Script Tag Parsing. This method was chosen because it provided the necessary information required by the task. Specifically, it allowed for the extraction of product identifiers and specifications required for further processing.\n\n## Installation\n\n```bash\n$ npm install\n```\n\n## Running the app\n\n```bash\n\n# watch mode\n$ npm run start:dev\n```\n\n## Using documentation\n\nOpen swagger http://localhost:3000/swagger/#/scraper/ScraperController_scrapeProduct and try to send post method with data:\n\n```bash\n{\n  \"productId\": \"air-presto-mens-shoes-JlLlWz\"\n}\n```\n\n## Support\n\nNest is an MIT-licensed open source project. It can grow thanks to the sponsors and support by the amazing backers. If you'd like to join them, please [read more here](https://docs.nestjs.com/support).\n\n## Stay in touch\n\n- Author - [Kamil Myśliwiec](https://kamilmysliwiec.com)\n- Website - [https://nestjs.com](https://nestjs.com/)\n- Twitter - [@nestframework](https://twitter.com/nestframework)\n\n## License\n\nNest is [MIT licensed](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fndmen%2Fscrapingbakery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fndmen%2Fscrapingbakery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fndmen%2Fscrapingbakery/lists"}