{"id":29673717,"url":"https://github.com/hanivan/nestjs-browser-parser","last_synced_at":"2025-07-22T22:07:48.996Z","repository":{"id":299582262,"uuid":"1003495890","full_name":"Hanivan/nestjs-browser-parser","owner":"Hanivan","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-26T08:22:39.000Z","size":178,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-26T09:30:40.626Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Hanivan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-17T08:32:02.000Z","updated_at":"2025-06-26T08:22:43.000Z","dependencies_parsed_at":"2025-06-17T09:41:44.629Z","dependency_job_id":"1403694a-d2cb-4c1f-8361-88f4a9d74d36","html_url":"https://github.com/Hanivan/nestjs-browser-parser","commit_stats":null,"previous_names":["hanivan/nestjs-browser-parser"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Hanivan/nestjs-browser-parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hanivan%2Fnestjs-browser-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hanivan%2Fnestjs-browser-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hanivan%2Fnestjs-browser-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hanivan%2Fnestjs-browser-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Hanivan","download_url":"https://codeload.github.com/Hanivan/nestjs-browser-parser/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hanivan%2Fnestjs-browser-parser/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266580734,"owners_count":23951277,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-22T22:07:46.369Z","updated_at":"2025-07-22T22:07:48.980Z","avatar_url":"https://github.com/Hanivan.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NestJS Browser Parser\n\nA powerful NestJS module for parsing HTML content with JavaScript support using Playwright Core. This module provides comprehensive features for web scraping, data extraction, and automation with both CDP (Chrome DevTools Protocol) and built-in browser support.\n\n## 🚀 Features\n\n- **🎭 Playwright Integration**: Uses playwright-core for reliable JavaScript-enabled HTML parsing\n- **🔗 Dual Browser Mode**: Support for both CDP connection and built-in browser\n- **📱 Responsive**: Full viewport and device emulation support  \n- **🔍 CSS \u0026 Limited XPath**: Extract data using CSS selectors (XPath support planned)\n- **📸 Screenshots \u0026 PDFs**: Generate screenshots and PDF documents\n- **⚡ JavaScript Execution**: Execute custom JavaScript on pages\n- **🛡️ Proxy Support**: HTTP, HTTPS, SOCKS proxies with authentication\n- **🎨 Rich Configuration**: Extensive customization options\n- **📊 Response Metadata**: Headers, cookies, timing, and metrics\n- **🔧 TypeScript**: Full type safety and IntelliSense support\n- **🧹 Auto Cleanup**: Automatic resource management and cleanup\n\n## 📦 Installation\n\n```bash\nnpm install playwright-core cheerio\n# or\nyarn add playwright-core cheerio\n```\n\n## 🛠️ Quick Start\n\n### Basic Setup\n\n```typescript\nimport { Module } from '@nestjs/common';\nimport { BrowserParserModule } from './browser-parser.module';\n\n@Module({\n  imports: [BrowserParserModule.forRoot()],\n})\nexport class AppModule {}\n```\n\n### Using the Service\n\n```typescript\nimport { Injectable } from '@nestjs/common';\nimport { BrowserParserService } from './browser-parser.service';\n\n@Injectable()\nexport class ScrapingService {\n  constructor(private readonly browserParser: BrowserParserService) {}\n\n  async scrapeWebsite(url: string) {\n    const response = await this.browserParser.fetchHtml(url, {\n      verbose: true,\n      timeout: 30000,\n    });\n\n    const title = this.browserParser.extractSingle(response.html, 'title');\n    return { title, status: response.status };\n  }\n}\n```\n\n## 🎛️ Configuration\n\n### Built-in Browser (Default)\n\n```typescript\nBrowserParserModule.forRoot({\n  loggerLevel: 'debug',\n  headless: true,\n  browserConnection: {\n    type: 'builtin',\n    args: ['--no-sandbox', '--disable-dev-shm-usage'],\n  },\n})\n```\n\n### CDP (Chrome DevTools Protocol)\n\n```typescript\nJSParserModule.forRoot({\n  loggerLevel: 'debug',\n  browserConnection: {\n    type: 'cdp',\n    cdpUrl: 'ws://localhost:9222/devtools/browser',\n  },\n})\n```\n\n### Async Configuration\n\n```typescript\nBrowserParserModule.forRootAsync({\n  useFactory: (configService: ConfigService) =\u003e ({\n    loggerLevel: configService.get('LOG_LEVEL', 'error'),\n    headless: configService.get('HEADLESS', 'true') === 'true',\n    browserConnection: {\n      type: configService.get('BROWSER_TYPE', 'builtin'),\n      cdpUrl: configService.get('CDP_URL'),\n    },\n  }),\n  inject: [ConfigService],\n})\n```\n\n## 📖 API Reference\n\n### BrowserParserService Methods\n\n#### `fetchHtml(url, options?)`\n\nFetch HTML content from a URL with JavaScript execution.\n\n```typescript\nconst response = await browserParser.fetchHtml('https://example.com', {\n  timeout: 30000,\n  waitForSelector: '.dynamic-content',\n  userAgent: 'Custom Bot 1.0',\n  viewport: { width: 1024, height: 768 },\n  proxy: {\n    server: 'http://proxy.example.com:8080',\n    username: 'user',\n    password: 'pass',\n  },\n});\n```\n\n#### `extractSingle(html, selector, type?, attribute?, options?)`\n\nExtract a single value from HTML.\n\n```typescript\nconst title = jsParser.extractSingle(html, 'title');\nconst description = jsParser.extractSingle(html, 'meta[name=\"description\"]', 'css', 'content');\n```\n\n#### `extractMultiple(html, selector, type?, attribute?, options?)`\n\nExtract multiple values from HTML.\n\n```typescript\nconst links = jsParser.extractMultiple(html, 'a', 'css', 'href');\nconst headings = jsParser.extractMultiple(html, 'h1, h2, h3');\n```\n\n#### `extractStructuredFromHtml(html, schema)`\n\nExtract structured data using a schema.\n\n```typescript\nconst data = jsParser.extractStructuredFromHtml(html, {\n  title: { selector: 'title', type: 'css' },\n  links: { selector: 'a', type: 'css', attribute: 'href', multiple: true },\n  price: { \n    selector: '.price', \n    type: 'css',\n    transform: (text) =\u003e parseFloat(text.replace('$', ''))\n  },\n});\n```\n\n#### `takeScreenshot(url, options?)`\n\nCapture a screenshot of a webpage.\n\n```typescript\nconst screenshot = await jsParser.takeScreenshot('https://example.com', {\n  type: 'png',\n  fullPage: true,\n  clip: { x: 0, y: 0, width: 800, height: 600 },\n});\n```\n\n#### `generatePDF(url, options?)`\n\nGenerate a PDF of a webpage.\n\n```typescript\nconst pdf = await jsParser.generatePDF('https://example.com', {\n  format: 'A4',\n  printBackground: true,\n  margin: { top: '1cm', bottom: '1cm' },\n});\n```\n\n#### `evaluateOnPage(url, evaluationFunction, options?)`\n\nExecute JavaScript on a page.\n\n```typescript\nconst result = await jsParser.evaluateOnPage(\n  'https://example.com',\n  '() =\u003e ({ title: document.title, elementCount: document.querySelectorAll(\"*\").length })'\n);\n```\n\n## 🌐 Browser Configuration\n\n### Built-in Browser\n\nUses Playwright's bundled Chromium:\n\n```typescript\n{\n  browserConnection: {\n    type: 'builtin',\n    executablePath: '/path/to/chrome', // Optional custom Chrome\n    args: ['--no-sandbox', '--disable-dev-shm-usage'],\n    ignoreDefaultArgs: false,\n  }\n}\n```\n\n### CDP Connection\n\nConnect to existing Chrome instance:\n\n```bash\n# Start Chrome with remote debugging\ngoogle-chrome --remote-debugging-port=9222 --no-first-run --no-default-browser-check\n```\n\n```typescript\n{\n  browserConnection: {\n    type: 'cdp',\n    cdpUrl: 'ws://localhost:9222/devtools/browser',\n  }\n}\n```\n\n## 🚀 Run the Demo\n\n```bash\nnpm run start:dev\n```\n\n## 📄 License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhanivan%2Fnestjs-browser-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhanivan%2Fnestjs-browser-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhanivan%2Fnestjs-browser-parser/lists"}