{"id":14982005,"url":"https://github.com/webmiddle/webmiddle","last_synced_at":"2025-10-29T11:31:07.642Z","repository":{"id":8869035,"uuid":"60006796","full_name":"webmiddle/webmiddle","owner":"webmiddle","description":"Node.js framework for modular web scraping and data extraction","archived":false,"fork":false,"pushed_at":"2022-12-09T17:10:15.000Z","size":2650,"stargazers_count":14,"open_issues_count":43,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-11T11:43:24.538Z","etag":null,"topics":["data-extraction","framework","jsx","jsx-components","modular","nodejs","web-scraping"],"latest_commit_sha":null,"homepage":"https://webmiddle.github.io/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/webmiddle.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-05-30T11:57:37.000Z","updated_at":"2023-11-09T14:08:11.000Z","dependencies_parsed_at":"2023-01-13T15:15:21.629Z","dependency_job_id":null,"html_url":"https://github.com/webmiddle/webmiddle","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webmiddle%2Fwebmiddle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webmiddle%2Fwebmiddle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webmiddle%2Fwebmiddle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webmiddle%2Fwebmiddle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/webmiddle","download_url":"https://codeload.github.com/webmiddle/webmiddle/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238817356,"owners_count":19535515,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-extraction","framework","jsx","jsx-components","modular","nodejs","web-scraping"],"created_at":"2024-09-24T14:04:38.293Z","updated_at":"2025-10-29T11:31:07.298Z","avatar_url":"https://github.com/webmiddle.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://travis-ci.org/webmiddle/webmiddle\"\u003e\u003cimg alt=\"Build Status\" src=\"https://travis-ci.org/webmiddle/webmiddle.svg?branch=master\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://codecov.io/gh/webmiddle/webmiddle\"\u003e\u003cimg alt=\"Coverage Status\" src=\"https://img.shields.io/codecov/c/github/webmiddle/webmiddle/master.svg?maxAge=43200\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n# webmiddle\n\n\u003e Node.js framework for modular web scraping and data extraction\n\nThe building block of any webmiddle application is the [JSX](http://facebook.github.io/jsx/) component.  \nEach component executes one task or controls the execution of other tasks by composing other components.\n\n```jsx\nconst FetchPageLinks({ url, query, name }) = () =\u003e\n  \u003cPipe\u003e\n    \u003cHttpRequest contentType=\"text/html\" url={url} /\u003e\n\n    {rawHtml =\u003e\n      \u003cHtmlToJson name={name} from={rawHtml} content={\n        {\n          anchors: $$.within(\"a\", $$.pipe(\n            $$.filter(el =\u003e el.text().toUpperCase().indexOf(query.toUpperCase()) !== -1),\n            $$.map({\n              url: $$.attr(\"href\"),\n              text: $$.getFirst()\n            })\n          ))\n        }\n      }/\u003e\n    }\n  \u003c/Pipe\u003e\n```\n\nThe framework provides a set of core components for the most common operations, but there is no difference between a core component and a component that you may want to develop yourself.\n\nWebmiddle applications can be quickly turned into REST APIs, allowing remote access via HTTP or WebSocket.\nUse [webmiddle-devtools](https://github.com/webmiddle/webmiddle-devtools) for running and debugging your components and test them remotely.\n\n## Links\n\n- [Getting Started](https://webmiddle.github.io/docs/introduction/getting-started)\n- [Try it live](https://repl.it/@Maluen/webmiddle-try-it-out)\n- [Starter App repository](https://github.com/webmiddle/webmiddle-starter-app)\n- [Devtools repository](https://github.com/webmiddle/webmiddle-devtools)\n\n## Features\n\nBuilt-in features provided by the core components:\n\n- **[Concurrency](https://webmiddle.github.io/docs/control-flow/parallel)**, for executing multiple asynchronous components at the same time.\n- **[HTTP](https://webmiddle.github.io/docs/fetching/httprequest)** requests.\n- **[Puppeteer](https://webmiddle.github.io/docs/fetching/browser)** requests, for SPAs and pages using client-side generated content.\n- **[Cookie JAR](https://webmiddle.github.io/docs/fetching/managercookie)**, for sharing cookies among different components and webmiddle objects.\n- **[Caching](https://webmiddle.github.io/docs/storing/resume)**, for resuming work in case of crash.\n- **[Error handling](https://webmiddle.github.io/docs/webmiddle/errorboundary)**, via customizable retries and catch options.\n- **Resource transformations**\n  - **[HTML/XML to JSON](https://webmiddle.github.io/docs/transforming/cheeriotojson)**\n  - **[JSON to JSON](https://webmiddle.github.io/docs/transforming/jsonselecttojson)**\n\n## Core packages\n\n\u003ctable align=\"center\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e\u003cb\u003eName\u003c/b\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cb\u003eDescription\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-manager-cookie\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-manager-cookie\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-manager-cookie.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-component-pipe\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-component-pipe\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-component-pipe.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-component-parallel\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-component-parallel\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-component-parallel.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-component-resume\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-component-resume\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-component-resume.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-component-http-request\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-component-http-request\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-component-http-request.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-component-browser\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-component-browser\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-component-browser.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-component-cheerio-to-json\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-component-cheerio-to-json\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-component-cheerio-to-json.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-component-jsonselect-to-json\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-component-jsonselect-to-json\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-component-jsonselect-to-json.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-server\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-server\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-server.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003ewebmiddle-client\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://badge.fury.io/js/webmiddle-client\"\u003e\u003cimg src=\"https://badge.fury.io/js/webmiddle-client.svg\" alt=\"npm version\" height=\"18\"\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n## Open source ecosystem\n\nCreate your own components and publish them to npm!\n\nOne of the main philosophies of the framework is **reuse**, by creating an ecosystem where components can be published as separate npm modules to be usable in other projects.\n\n**NOTE**: If you think that a component / feature is so common and general that it should be in the core, [open an issue](https://github.com/webmiddle/webmiddle/issues/new) or just do a pull request!\n\n## Contributing\n\nThis is a monorepo, i.e. all the core components and the main webmiddle package are all in this single repository.\n\nIt uses [Yarn](https://yarnpkg.com) and [Lerna](https://github.com/lerna/lerna) for managing the monorepo, as you might have guessed from the lerna.json file.\n\nStart by installing the root dependencies with:\n\n```bash\nyarn\n```\n\nThen install all the packages dependencies and link the packages together by running:\n\n```bash\nyarn run lerna bootstrap\n```\n\nBuild all the packages by running:\n\n```bash\nyarn run build\n```\n\nTo run the tests for all the packages at once and get coverage info, execute:\n\n```bash\nyarn run test\n```\n\n\u003e **NOTE**: make sure to build before running the tests.\n\n\u003e **NOTE**: If you are on Windows, you might need to run the install and bootstrap commands as administrator.\n\nEach [package](https://github.com/webmiddle/webmiddle/tree/master/packages) uses the same build / test system.\n\nOnce you are inside a package folder, you can build it by running `yarn run build` or `yarn run build:watch` (for rebuilding on every change).\n\nTests use [AVA](https://github.com/avajs/ava), thus they can be written in modern JavaScript, moreover they will also run concurrently. You can run the tests with `yarn run test`. To run the tests on every change you can use `yarn run test:watch`. The latter option is highly recommended while developing, as it also produces a much more detailed output.\n\nFor running the same npm command in all the packages, use `lerna run`, example:\n\n```bash\nyarn run lerna run build\n```\n\nFor running arbitrary commands, use `lerna exec`, example:\n\n```bash\nyarn run lerna -- exec -- rm -rf ./node_modules\n```\n\nSee [Lerna commands](https://github.com/lerna/lerna#commands) for more info.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebmiddle%2Fwebmiddle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwebmiddle%2Fwebmiddle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebmiddle%2Fwebmiddle/lists"}