{"id":13571757,"url":"https://github.com/ulixee/hero","last_synced_at":"2025-05-14T13:06:16.939Z","repository":{"id":37551195,"uuid":"390541506","full_name":"ulixee/hero","owner":"ulixee","description":"The web browser built for scraping","archived":false,"fork":false,"pushed_at":"2025-05-11T10:17:32.000Z","size":84252,"stargazers_count":1170,"open_issues_count":57,"forks_count":55,"subscribers_count":18,"default_branch":"main","last_synced_at":"2025-05-11T11:24:40.867Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ulixee.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP-DoubleAgent.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"ulixee"}},"created_at":"2021-07-28T23:50:43.000Z","updated_at":"2025-05-11T10:15:05.000Z","dependencies_parsed_at":"2024-03-01T16:46:47.647Z","dependency_job_id":"6c203157-6191-47b6-b8f1-a80ab5331e26","html_url":"https://github.com/ulixee/hero","commit_stats":{"total_commits":2627,"total_committers":19,"mean_commits":"138.26315789473685","dds":0.09821088694328128,"last_synced_commit":"80f690803b1bb8b0896237b5fda06f19d8814340"},"previous_names":[],"tags_count":32,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulixee%2Fhero","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulixee%2Fhero/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulixee%2Fhero/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulixee%2Fhero/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ulixee","download_url":"https://codeload.github.com/ulixee/hero/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253559843,"owners_count":21927686,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T14:01:05.771Z","updated_at":"2025-05-14T13:06:16.927Z","avatar_url":"https://github.com/ulixee.png","language":"TypeScript","funding_links":["https://github.com/sponsors/ulixee"],"categories":["TypeScript","Web Frontend"],"sub_categories":["JS Libraries \u0026 Utilities"],"readme":"# Ulixee Hero\n\nA few cool highlights about Hero:\n\n- [x] **Built for scraping** - it's the first modern headless browsers designed specifically for scraping instead of just automated testing.\n- [x] **Designed for web developers** - We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools.\n- [x] **Powered by Chrome** - The powerful Chrome engine sits under the hood, allowing for lightning fast rendering.\n- [x] **Emulates any modern browser** - Emulators make it easy to disguise your script as practically any browser.\n- [x] **Avoids detection along the entire stack** - Don't be blocked because of TLS fingerprints in your networking stack.\n\nCheck out our [website for more details](https://ulixee.org).\n\n## Installation\n\n\nYou can get a playground started with Hero very quickly. A playground is a one-time use hero instance that will shut down once you've run a single script. This is great for quick scripts or testing.\n\n```shell script\nnpm i --save @ulixee/hero-playground\n```\n\nOnce you're ready to graduate to deploying, check out the docs here: [Deploying Hero](./docs/advanced-concepts/deployment).\n\n## Usage\n\nHero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching:\n\n```js\nconst Hero = require('@ulixee/hero-playground');\n\n(async () =\u003e {\n  const hero = new Hero();\n  await hero.goto('https://example.org');\n  const title = await hero.document.title;\n  const intro = await hero.document.querySelector('p').textContent;\n  await hero.close();\n})();\n```\n\nBrowse the [full API docs](https://ulixee.org/docs/hero).\n\n## Using this Repository\n\nThis is a Monorepo to work on the Browser Detect + Evade workflow of building an automated engine. It requires Yarn workspaces.\n\nYou can work with the project by:\n\n1. Cloning the repository and installing git submodules (you can add --recursive to your initial clone request).\n2. Run `yarn build`. NOTE: you must run this command to build typescript files.\n\n### Using devenv for an isolated development sandbox (using nix)\n\nUsing this setup everything will be automatically installed with the exact same versions for everyone. This avoids a lot of installation issues, and can help automate a lot of boring setup jobs.\n\n1. [Install nix ](https://determinate.systems/posts/determinate-nix-installer/)\n2. [Install devenv](https://devenv.sh/getting-started/#1-install-nix)\n3. [Install direnv ](https://direnv.net/docs/installation.html). Only needed if you want everything the auto load.\n4. [Make sure direnv works with zsh or other shell ](https://direnv.net/docs/hook.html)\n\n### Browser Profiles\n\nIf you want to work with profiles (ie, update Emulator Data, generate Double Agent probes, etc), you'll need to download the BrowserProfiles data: `$ yarn workspace @ulixee/unblocked-browser-profiler downloadData`. This will clone data into a folder called `browser-profile-data` adjacent to the `unblocked` folder.\n\n## Unblocked\n\nThis project maintains a suite of tools for protecting the web's open knowledge. Its primary function is to create a web-scraping engine that mimics a human interacting with a website - both from a user behavior, as well as from a \"browser\" perspective.\n\n### Unblocked Projects\n\nThis repository is home to several of the projects needed to create an \"unblocked\" automated browser engine. We imagine a world where there are many participants sharing evasions and emulations for all the web features into a [single repository](./plugins). They will live right next to an advanced bot blocking [detection engine][double-agent] that can analyze every facet of a web scraping session (TCP, TLS, HTTP, DOM, User Interactions, etc). A [profiler](./browser-profiler) that can run all [detections][double-agent] using real browser/operating systems to generate [profiles][profiles] of true browser signatures. And an implementation of an [agent][agent] that can run all the evasions and run unblocked.\n\n- [Specifications][spec]. This contains generic specifications for what an automated browser needs to expose so that it can be hooked into to emulate a normal, headed browser engine. To properly mask the differences between headless Chrome on a linux machine, and a headed Chrome running on a home operating system, a series of \"hooks\" needs to be exposed. These include things like before browsers start, web pages launch, and web workers have a javascript environment. This specification will be the minimum spec needed to open up the browser to plugin authors.\n- [JsPath][jspath]. A specification is provided for a method to serialize DOM nodes, properties and visibility information so it can be remotely queried.\n- [Agent][agent]. A basic automated engine that implements the full reference [Specifications][spec].\n- [Plugins](./plugins). Unblocked community plugins that enhance a browser to mask Browser, Network, User Interaction and Operating System \"markers\" that can be used to block web scrapers.\n- [DoubleAgent][double-agent]. A series of tests that can be run to analyze real Browsers on real machines, and then compare all the detected markers to an automated setup.\n- [DoubleAgent Stacks](./double-agent-stacks). Runners for common scraper stacks. This can also serve as a workflow example for your own stack.\n- [Real User Agents][real-user-agents]. A library that collects real Chromium releases and UserAgent strings collected from real browsers. This is used to generate UserAgent strings for various combinations of Browsers and Operating Systems.\n- [Browser Profile Data][profiles]. A data repository containing profiles of real browsers using BrowserStack, Dockers and Local Doms. Includes deep diffing various environments of Chrome (headed, headless, with devtools, browserstack, between runs, etc).\n- [Browser Profiler](./browser-profiler). Profiler to automatically collect [Browser Profile Data][profiles]. Automation to recreate files is driven from Profile Data project.\n- [Emulator Builder](./browser-emulator-builder). A library to use the collected data from Browser Profile Data to \"patch\" runtime headless Chrome to match headed Chrome on a home Operating System.\n- [Mission Impossible]. Real world measurement of what DOM Apis are being analyzed on the top websites, and how many are detecting and blocking the Unblocked Agent + Community Plugins. _To be imported_\n\n## Questions\n\nJoin us on the [Ulixee Discord](https://discord.gg/tMAycnemHU) for any questions or comments (it's a sister project).\n\n## Contributing\n\nSee [How to Contribute](https://ulixee.org/how-to-contribute) for ways to get started.\n\nThis project has a [Code of Conduct](https://ulixee.org/code-of-conduct). By interacting with this repository, organization, or community you agree to abide by its terms.\n\nWe'd love your help in making Hero a better tool. Please don't hesitate to send a pull request.\n\n## License\n\n[MIT](LICENSE.md)\n\n[agent]: ./agent\n[double-agent]: ./double-agent\n[spec]: ./specification\n[jspath]: ./js-path\n[profiles]: https://github.com/ulixee/browser-profile-data\n[real-user-agents]: ./real-user-agents\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulixee%2Fhero","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fulixee%2Fhero","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulixee%2Fhero/lists"}