{"id":32267766,"url":"https://github.com/sukhcha-in/dart_web_scraper","last_synced_at":"2025-10-22T22:03:35.125Z","repository":{"id":244802845,"uuid":"816346210","full_name":"sukhcha-in/dart_web_scraper","owner":"sukhcha-in","description":"Powerful, easy-to-use scraper for web pages and APIs. Chain parsers and transforms to extract exactly the data you need.","archived":false,"fork":false,"pushed_at":"2025-09-11T13:43:19.000Z","size":653,"stargazers_count":10,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-22T22:03:25.219Z","etag":null,"topics":["htmlparser","jsonparser","parser","parsing","scraper","scraping","webscraper","webscraping"],"latest_commit_sha":null,"homepage":"https://pub.dev/packages/dart_web_scraper","language":"Dart","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sukhcha-in.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-06-17T14:56:09.000Z","updated_at":"2025-10-18T18:06:45.000Z","dependencies_parsed_at":"2024-06-28T16:03:58.693Z","dependency_job_id":"3be085d9-3231-4614-b1a5-a2261f7ea495","html_url":"https://github.com/sukhcha-in/dart_web_scraper","commit_stats":null,"previous_names":["sukhcha-in/dart_web_scraper"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/sukhcha-in/dart_web_scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sukhcha-in%2Fdart_web_scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sukhcha-in%2Fdart_web_scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sukhcha-in%2Fdart_web_scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sukhcha-in%2Fdart_web_scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sukhcha-in","download_url":"https://codeload.github.com/sukhcha-in/dart_web_scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sukhcha-in%2Fdart_web_scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280520829,"owners_count":26344439,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["htmlparser","jsonparser","parser","parsing","scraper","scraping","webscraper","webscraping"],"created_at":"2025-10-22T22:03:01.533Z","updated_at":"2025-10-22T22:03:35.119Z","avatar_url":"https://github.com/sukhcha-in.png","language":"Dart","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Pub version](https://img.shields.io/pub/v/dart_web_scraper?logo=dart\u0026style=plastic)](https://pub.dev/packages/dart_web_scraper)\n[![Pub Likes](https://img.shields.io/pub/likes/dart_web_scraper)](https://pub.dev/packages/dart_web_scraper)\n[![Pub Points](https://img.shields.io/pub/points/dart_web_scraper)](https://pub.dev/packages/dart_web_scraper)\n[![Pub Monthly Downloads](https://img.shields.io/pub/dm/dart_web_scraper)](https://pub.dev/packages/dart_web_scraper)\n\nConfig-based, reusable web scraper for web and API scraping. Scrape, parse web pages or APIs without writing parsers or scraping logic, using simple key/value based configs.\n\n## Features\n\n- Config-based, reusable web scraper for web and API scraping.\n- Scrape, parse web pages or APIs without writing parsers or scraping logic, using simple key/value based configs.\n- 15+ Built-in parsers.\n- Data cleaning and transformations.\n\n\n\n![Comparison](https://raw.githubusercontent.com/sukhcha-in/dart_web_scraper/main/dart_web_scraper.jpg)\n\n\n## Getting Started\n\nInstall it with a `flutter` command:\n\n```bash\n$ flutter pub add dart_web_scraper\n```\n\nInstall it with a `dart` command:\n\n```bash\n$ dart pub add dart_web_scraper\n```\n\n## Usage\n\nThis is the most basic example of scraping quotes from quotes.toscrape.com\n\n```dart\nimport 'package:dart_web_scraper/dart_web_scraper.dart';\n\nvoid main() async {\n  WebScraper webScraper = WebScraper();\n\n  Map\u003cString, Object\u003e result = await webScraper.scrape(\n    url: Uri.parse(\"https://quotes.toscrape.com\"),\n    scraperConfig: ScraperConfig(\n      parsers: [\n        Parser(\n          id: \"quotes\",\n          parents: [\"_root\"],\n\n          /// _root is default parent\n          type: ParserType.element,\n          selectors: [\n            \".quote\",\n          ],\n          multiple: true,\n        ),\n        Parser(\n          id: \"quote\",\n          parents: [\"quotes\"],\n          type: ParserType.text,\n          selectors: [\n            \"span.text\",\n          ],\n        ),\n      ],\n    ),\n  );\n\n  print(result);\n}\n\n```\n### WebScraper `class`\n\nHigh-level web scraper that combines HTML fetching and data parsing.\n\n```dart\nWebScraper WebScraper()\n\n// Main scraping method\nFuture\u003cMap\u003cString, Object\u003e\u003e scrape({\n  // The URL to scrape\n  required Uri url,\n  // Scraper configuration for the URL\n  ScraperConfig? scraperConfig,\n  // Map of domain names to lists of scraper configurations\n  ScraperConfigMap? scraperConfigMap,\n  // Enable debug logging, also dumps scraped file into current path /dump folder\n  bool debug = false,\n  // Pre-fetched HTML document (optional, avoids HTTP request if provided)\n  String? html,\n  // Custom cookies to include in HTTP requests (Overrides ScraperConfig cookies)\n  Map\u003cString, String\u003e? overrideCookies,\n  // Custom HTTP headers to include in requests (Overrides ScraperConfig headers)\n  Map\u003cString, String\u003e? overrideHeaders,\n  // Overrides proxy config in ScraperConfig and HTTP Parser options\n  ProxyAPIConfig? overrideProxyAPIConfig,\n  // Custom user agent string (overrides ScraperConfig userAgent)\n  String? overrideUserAgent,\n})\n```\n\n### ScraperConfig `class`\n\nConfiguration for targeting and scraping specific types of URLs.\n\n```dart\nScraperConfig ScraperConfig({\n  // List of URL path patterns that this configuration should handle\n  List\u003cString\u003e pathPatterns = const [],\n  // List of parsers that define how to extract data from the page\n  required List\u003cParser\u003e parsers,\n  // Whether HTML content needs to be fetched from the URL\n  bool requiresHtml = true,\n  // URL preprocessing and cleaning configuration\n  UrlCleaner? urlCleaner,\n  // Proxy API config\n  ProxyAPIConfig? proxyAPIConfig;\n  // Cookies for base request\n  Map\u003cString, String\u003e? cookies;\n  // Headers for base request\n  Map\u003cString, String\u003e? headers;\n  // Whether to force a fresh HTTP request even if HTML is provided\n  bool forceRefresh = false,\n  // User agent device type for base request\n  UserAgentDevice userAgent = UserAgentDevice.mobile,\n})\n```\n\n#### UrlCleaner `class`\n\nClean the URL before it's passed to a scraper.\n\n```dart\nUrlCleaner UrlCleaner({\n  // Set whitelisted or blacklisted URL parameters.\n  List\u003cString\u003e? whitelistParams,\n  List\u003cString\u003e? blacklistParams,\n  // Set custom static parameters to a URL.\n  Map\u003cString, String\u003e? appendParams,\n})\n```\n\n---\n\n### Parser `class`\n\nEasy to use and reusable parser class :)\n\n```dart\nParser Parser({\n  // `id` is used for final result.\n  // Child parsers can reference to parent parser using `id`.\n  // You can have multiple parsers with same id and same parent and will execute one by one and stop execution once data is successfully parsed by one parser.\n  required String id,\n  // A child can have multiple parents, it will execute once parent parser is successfully executed.\n  required List\u003cString\u003e parents,\n  // Set the parser types.\n  required ParserType type,\n  // List of selectors will execute one by one and stop execution once data is successfully parsed by one selector.\n  List\u003cString\u003e selectors = const [],\n  // Set parser for private usage. Will be not added to final result.\n  bool isPrivate = false,\n  // Set multiple to `true` if data is a List.\n  bool multiple = false,\n  // Some parsers require additional options to work properly.\n  // You can pass these options here.\n  // For example, `ParserType.table` requires `ParserOptions.table`.\n  ParserOptions? parserOptions;\n  // Data transformation options to apply after extraction.\n  TransformationOptions? transformationOptions,\n  // Custom cleaner function, clean the data and return data.\n  Object? Function(Data, bool)? cleaner,\n  // If you plan to create configs from JSON, you can pass cleaner name here.\n  // This cleaner should be registered in `CleanerRegistry` to be used.\n  // If you pass cleaner along with cleanerName, cleanerName function will be ignored.\n  String? cleanerName,\n})\n```\n\n---\n\n### Parser Types\n\n| Type                         | Description                                                              | Selector                                                                                                                                       | ParserOptions                 |\n| ---------------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------- |\n| `ParserType.element`         | Extracts element nodes from HTML using CSS selectors.                    | CSS selector required.                                                                                                                         | `-`                           |\n| `ParserType.attribute`       | Extracts attribute value from HTML element using CSS selectors.          | Use CSS selector to select an element and **append attribute name with `::`**. Ex: `div#myid::name` where `name` refers to the attribute name. | `-`                           |\n| `ParserType.text`            | Extracts text from HTML element using CSS selectors.                     | CSS selector required.                                                                                                                         | `-`                           |\n| `ParserType.image`           | Extracts image URL from HTML element.                                    | CSS selector required. After selecting an element it tries to find `src` attribute.                                                            | `-`                           |\n| `ParserType.url`             | Extracts URL from an HTML element                                        | CSS selector required. After selecting an element it tries to find `href` attribute.                                                           | `-`                           |\n| `ParserType.urlParam`        | From an URL it extracts query parameter.                                 | Add parameter name in selector.                                                                                                                | `ParserOptions.urlParam`      |\n| `ParserType.table`           | Extracts data from HTML table.                                           | CSS selector required. Select table using this selector.                                                                                       | `ParserOptions.table`         |\n| `ParserType.sibling`         | Used when target element doesn't have a valid selector but sibling does. | CSS selector is required.                                                                                                                      | `ParserOptions.sibling`       |\n| `ParserType.strBetween`      | Extracts the string between two strings.                                 | Not required                                                                                                                                   | `ParserOptions.stringBetween` |\n| `ParserType.http`            | Get data using http request                                              | Not required                                                                                                                                   | `ParserOptions.http`          |\n| `ParserType.json`            | Decode JSON string or extract data.                                      | [json_path](https://pub.dev/packages/json_path) syntax should be used as a selector                                                            | `-`                           |\n| `ParserType.jsonld`          | Extracts all Ld+Json objects and places them into a list                 | Not required                                                                                                                                   | `-`                           |\n| `ParserType.jsonTable`       | Extracts data from JSON as table.                                        | [json_path](https://pub.dev/packages/json_path) syntax should be used as a selector                                                            | `ParserOptions.table`         |\n| `ParserType.json5decode`     | Decodes JSON5 syntax                                                     | Not required                                                                                                                                   | `-`                           |\n| `ParserType.staticVal`       | Useful if you want to set static values to final result                  | Not required                                                                                                                                   | `ParserOptions.staticValue`   |\n| `ParserType.returnUrlParser` | Returns URL which was passed to WebScraper                               | Not required                                                                                                                                   | `-`                           |\n\n#### Data injection to `selector`\n\nYou can inject previously parsed data using `\u003cslot\u003e`. For example:\n\n```dart\nselectors: [\n  // for css selector\n  \"div#\u003cslot\u003eid\u003c/slot\u003e\"\n  // or for json path:\n  r\"$.data.\u003cslot\u003eid\u003c/slot\u003e.value\"\n]\n```\n\nYou can also inject data using `slot` into `ParserOptions.http`'s `url` field. For example:\n\n```dart\nParser(\n  id: \"json\",\n  parents: [\"product_id\"],\n  type: ParserType.http,\n  isPrivate: true,\n  parserOptions: ParserOptions.http(\n    HttpParserOptions(\n      url: \"https://example.com/productdetails/\u003cslot\u003eproduct_id\u003c/slot\u003e\",\n      method: HttpMethod.get,\n      responseType: HttpResponseType.json,\n    ),\n  ),\n),\n```\n\n---\n\n### ParserOptions `class`\n\nParser-specific configuration options that control how individual parsers behave during data extraction.\n\nUse the appropriate named constructor for the parser type you're configuring:\n\n```dart\n// For HTTP parsers\nParserOptions.http(options: HttpParserOptions(...))\n\n// For table and JSON table parsers\nParserOptions.table(options: TableParserOptions(...))\n\n// For sibling parsers\nParserOptions.sibling(options: SiblingParserOptions(...))\n\n// For static value parsers\nParserOptions.staticValue(options: StaticValueParserOptions(...))\n\n// For string between parsers\nParserOptions.stringBetween(options: StringBetweenParserOptions(...))\n\n// For URL parameter parsers\nParserOptions.urlParam(options: UrlParamParserOptions(...))\n```\n\n---\n\n### TransformationOptions `class`\n\nComprehensive data transformation system that can be applied to extracted data.\n\n```dart\nTransformationOptions TransformationOptions({\n  // Text to add to the beginning\n  String? prepend,\n  // Text to add to the end\n  String? append,\n  // List of values to check for matches, returns boolean\n  List\u003cObject\u003e? match,\n  // Index to extract from a list (negative for reverse indexing)\n  int? nth,\n  // Delimiter to split data by\n  String? splitBy,\n  // Whether to decode URL-encoded strings\n  bool? urldecode,\n  // Whether to convert map values to a list\n  bool? mapToList,\n  // Regular expression extraction options\n  RegexTransformationOptions? regexMatch,\n  // Regular expression replacement options\n  RegexReplaceTransformationOptions? regexReplace,\n  // Text replacement options\n  ReplaceTransformationOptions? replace,\n  // Text cropping options (start/end)\n  CropTransformationOptions? crop,\n  // Extract text between two strings\n  StringBetweenTransformationOptions? stringBetween,\n  // Extract sibling elements\n  SiblingTransformationOptions? sibling,\n  // Table processing options\n  TableTransformationOptions? table,\n  // Return static value options\n  StaticValueTransformationOptions? staticValue,\n  // Custom order for applying transformations\n  List\u003cTransformationType\u003e? transformationOrder,\n})\n```\n\n## Creating configs from JSON\n\nYou can now create configs from JSON string using `ScraperConfig.fromJson` method.\n\n## Cleaner Registry for parsers created using JSON\n\nYou can register cleaners using `CleanerRegistry.register` method. This is useful when you want to create configs from JSON and want to use custom cleaner for `Parser`.\n\nFor example:\n\n```dart\nCleanerRegistry.register('formatPrice', (data, extractedData, debug) {\n  return '\\$${data.obj}';\n});\n```\n\nthen pass the cleaner name in `Parser`:\n\n```dart\nParser(\n  //...\n  cleanerName: 'formatPrice',\n  //...\n)\n```\n\n## Credits\n\n[json_path](https://pub.dev/packages/json_path) - JSON path selector\\\n[json5](https://pub.dev/packages/json5) - JSON5 syntax decoder\n\n\u003cimg src=\"https://profile-counter.deno.dev/dart_web_scraper/count.svg\" alt=\"Visitor's Count\" /\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsukhcha-in%2Fdart_web_scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsukhcha-in%2Fdart_web_scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsukhcha-in%2Fdart_web_scraper/lists"}