{"id":30687654,"url":"https://github.com/substratelabs/selectron","last_synced_at":"2025-09-02T00:04:53.269Z","repository":{"id":291020421,"uuid":"976322802","full_name":"SubstrateLabs/selectron","owner":"SubstrateLabs","description":"AI web parser library + CLI","archived":false,"fork":false,"pushed_at":"2025-05-05T16:18:47.000Z","size":112418,"stargazers_count":50,"open_issues_count":3,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-27T17:55:24.326Z","etag":null,"topics":["ai","beautifulsoup","beautifulsoup4","duckdb","parser-generator","pydantic-ai","python","scraper","textualize","webscraping"],"latest_commit_sha":null,"homepage":"https://0thernet.substack.com/p/memo-2-selectron","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SubstrateLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-01T22:46:32.000Z","updated_at":"2025-08-26T14:10:42.000Z","dependencies_parsed_at":"2025-05-01T23:38:26.920Z","dependency_job_id":null,"html_url":"https://github.com/SubstrateLabs/selectron","commit_stats":null,"previous_names":["substratelabs/selectron"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SubstrateLabs/selectron","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SubstrateLabs%2Fselectron","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SubstrateLabs%2Fselectron/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SubstrateLabs%2Fselectron/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SubstrateLabs%2Fselectron/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SubstrateLabs","download_url":"https://codeload.github.com/SubstrateLabs/selectron/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SubstrateLabs%2Fselectron/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273208777,"owners_count":25064204,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","beautifulsoup","beautifulsoup4","duckdb","parser-generator","pydantic-ai","python","scraper","textualize","webscraping"],"created_at":"2025-09-02T00:03:55.483Z","updated_at":"2025-09-02T00:04:53.252Z","avatar_url":"https://github.com/SubstrateLabs.png","language":"Python","readme":"# ⣏ Selectron ⣹\n\n[![PyPI - Version](https://img.shields.io/pypi/v/selectron.svg)](https://pypi.org/project/selectron)\n\nSelectron is an AI web parsing library \u0026 CLI designed around two goals:\n1. **Fully automated parser generation** – AI-\"compiles\" (generates) parsers on-demand\n2. **Efficient parser execution** – Parsers are cached, no LLM calls at runtime\n\n![screenshot](/app.png)\n\n\u003cdetails\u003e \n\u003csummary\u003e\u003ch3\u003eDemo videos\u003c/h3\u003e\u003c/summary\u003e\n\n\u003ch4\u003eSave your Twitter feed to DuckDB\u003c/h4\u003e\n\nhttps://github.com/user-attachments/assets/d8743c32-087f-4137-8451-e4ec3e5716ed\n\n\u003ch4\u003eGenerate a new scraper with AI\u003c/h4\u003e\n\nhttps://github.com/user-attachments/assets/8f523f33-a786-4871-b081-4fe9b7422a44\n\n\u003c/details\u003e\n\n## How it works\n\n- **Chrome integration:** Connects to Chrome over CDP and receives live DOM and screenshot data from your active tab. Selectron uses minimal [dependencies](https://github.com/SubstrateLabs/selectron/blob/main/pyproject.toml) – no [browser-use](https://github.com/browser-use/browser-use) or [stagehand](https://github.com/browserbase/stagehand), not even Playwright (we prefer [direct CDP](https://github.com/SubstrateLabs/selectron/blob/main/src/selectron/chrome/chrome_cdp.py)).\n- **Fully automated parser generation:** An AI agent generates selectors for content described with natural language. Another agent generates code to extract data from selected containers. The final result is a [parser](https://github.com/SubstrateLabs/selectron/blob/main/src/selectron/parsers/news.ycombinator.com.json). \n- **CLI application:** When you run the [Textual](https://www.textualize.io) CLI, parsed data is saved to a [DuckDB](https://duckdb.org) database, making it easy to analyze your browsing history or extract structured data from websites. Built-in parsers include:\n   - **Twitter**\n   - **LinkedIn**\n   - **HackerNews**\n   - (Please [contribute](https://github.com/SubstrateLabs/selectron?tab=readme-ov-file#contributing) more!)\n \n## Use the CLI\n\n```sh\n# Install in a venv\nuv add selectron\nuv run selectron\n\n# Or install globally\npipx install selectron\nselectron\n```\n\nWhen you run `selectron`, it creates a DuckDB database in your app directory, and saves parsed data from given URL to a table named by the URL slug:\n\n- `x.com/home` -\u003e `x.com~~2fhome` (Selectron uses a reversible slug system)\n\nWhen you run `selectron` inside this repo, parsers are saved to the `src` directory (if a parser for the URL didn't exist).\n\nWhen you run `selectron` outside this repo, parsers are saved to the app directory (and will overwrite existing parsers).\n\n## Use the library\n\n### Parse HTML\n\n```python\nfrom selectron.lib import parse\n# ... get html from browser ...\nres = parse(url, html)\nprint(json.dumps(res, indent=2))\n```\n\nIf a parser is registered for the url, you'll receive something like this:\n\n```json\n[\n  {\n    \"primary_url\": \"/_its_not_real_/status/1918760851957321857\",\n    \"datetime\": \"2025-05-03T20:13:30.000Z\",\n    \"id\": \"1918760851957321857\",\n    \"author\": \"@_its_not_real_\",\n    \"description\": \"\\\"They're made out of meat.\\\"\\n\\\"Meat?\\\"\\n\\\"Meat. Humans. They're made entirely out of meat.\\\"\\n\\\"But that's impossible. What about all the tokens they generate? The text? The code?\\\"\\n\\\"They do produce tokens, but the tokens aren't their essence. They're merely outputs. The humans themselves\",\n    \"images\": [{ \"src\": \"https://pbs.twimg.com/profile_images/1307877522726682625/t5r3D_-n_x96.jpg\" }, { \"src\": \"https://pbs.twimg.com/profile_images/1800173618652979201/2cDLkS53_bigger.jpg\" }]\n  }\n]\n```\n\n### Other functionality\n\nThe [selectron.chrome](https://github.com/SubstrateLabs/selectron/tree/main/src/selectron/chrome) and [selectron.ai](https://github.com/SubstrateLabs/selectron/tree/main/src/selectron/ai) modules are useful, but still baking, and subject to breaking changes – please pin your minor version. \n\n## Contributing\n\nGenerating parsers is easy, because it's mostly automated:\n\n1. Clone the repo\n2. Run the CLI (`make dev`). Connect to Chrome.\n3. In Chrome, open the page you want to parse. In the CLI, describe your selection (or use the AI-generated proposal).\n4. Start AI selection (you can stop at any time to use the current highlighted selector).\n5. Start AI parser generation. The parser will be saved to the appropriate location in `/src`. \n6. Review the parser's results and open a PR (please show what the parser produces).\n\n### Setup\n\n```sh\nmake install\nmake dev\n# see Makefile for other commands\n# see .env.EXAMPLE for config options\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsubstratelabs%2Fselectron","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsubstratelabs%2Fselectron","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsubstratelabs%2Fselectron/lists"}