{"id":14484078,"url":"https://github.com/raznem/parsera","last_synced_at":"2025-04-11T06:28:45.166Z","repository":{"id":252911976,"uuid":"841461904","full_name":"raznem/parsera","owner":"raznem","description":"Lightweight library for scraping web-sites with LLMs","archived":false,"fork":false,"pushed_at":"2025-03-19T15:35:27.000Z","size":1917,"stargazers_count":1059,"open_issues_count":3,"forks_count":63,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-04-03T22:05:42.497Z","etag":null,"topics":["ai","ai-scraping","data-extraction","llm","opensource","playwright","python","scraping","webscraping"],"latest_commit_sha":null,"homepage":"https://docs.parsera.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raznem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-12T13:04:33.000Z","updated_at":"2025-04-02T21:53:33.000Z","dependencies_parsed_at":"2024-08-27T17:20:46.261Z","dependency_job_id":"c69b0567-870e-47c3-acb4-288c687e2ffa","html_url":"https://github.com/raznem/parsera","commit_stats":null,"previous_names":["raznem/parsera"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raznem%2Fparsera","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raznem%2Fparsera/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raznem%2Fparsera/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raznem%2Fparsera/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raznem","download_url":"https://codeload.github.com/raznem/parsera/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248354184,"owners_count":21089770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-scraping","data-extraction","llm","opensource","playwright","python","scraping","webscraping"],"created_at":"2024-09-03T01:00:55.118Z","updated_at":"2025-04-11T06:28:45.148Z","avatar_url":"https://github.com/raznem.png","language":"Python","funding_links":[],"categories":["Python","数据 Data","A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"# 📦 Parsera\n\n[![Discord](https://img.shields.io/badge/Discord-7289da?style=for-the-badge)](https://discord.gg/gYXwgQaT7p)\n[![Downloads](https://img.shields.io/pepy/dt/parsera?style=for-the-badge)](https://pepy.tech/project/parsera)\n\u003ca href=\"https://apify.com/parsera-labs/parsera?fpr=czveg\"\u003e\u003cimg src=\"https://apify.com/ext/run-on-apify.png\" alt=\"Run Parsera Actor on Apify\" width=\"126\" height=\"28\" /\u003e\u003c/a\u003e\n\nLightweight Python library for scraping websites with LLMs. \nYou can test it on [Parsera website](https://parsera.org).\n\n## Why Parsera?\nBecause it's simple and lightweight. With interface as simple as:\n```python\nscraper = Parsera()\nresult = scraper.run(url=url, elements=elements)\n```\n\n## Table of Contents\n- [Installation](#Installation)\n- [Documentation](#Documentation)\n- [Basic usage](#Basic-usage)\n- [Running with Jupyter Notebook](#Running-with-Jupyter-Notebook)\n- [Running with CLI](#Running-with-CLI)\n- [Running in Docker](#Running-in-Docker)\n\n## Installation\n\n```shell\npip install parsera\nplaywright install\n```\n\n## Documentation\n\nCheck out [documentation](https://docs.parsera.org) to learn more about other features, like running custom models and playwright scripts.\n\n## Basic usage\n\nFirst, set up `PARSERA_API_KEY` env variable (If you want to run custom LLM see [Custom Models](https://docs.parsera.org/features/custom-models/)).\nYou can do this from python with:\n```python\nimport os\n\nos.environ[\"PARSERA_API_KEY\"] = \"YOUR_PARSERA_API_KEY_HERE\"\n```\n\nNext, you can run a basic version:\n```python\nfrom parsera import Parsera\n\nurl = \"https://news.ycombinator.com/\"\nelements = {\n    \"Title\": \"News title\",\n    \"Points\": \"Number of points\",\n    \"Comments\": \"Number of comments\",\n}\n\nscraper = Parsera()\nresult = scraper.run(url=url, elements=elements)\n```\n\n`result` variable will contain a json with a list of records:\n```json\n[\n   {\n      \"Title\":\"Hacking the largest airline and hotel rewards platform (2023)\",\n      \"Points\":\"104\",\n      \"Comments\":\"24\"\n   },\n    ...\n]\n```\n\nThere is also `arun` async method available:\n```python\nresult = await scrapper.arun(url=url, elements=elements)\n```\n\n## Running with Jupyter Notebook:\nEither place this code at the beginning of your notebook:\n```python\nimport nest_asyncio\nnest_asyncio.apply()\n```\n\nOr instead of calling `run` method use async `arun`.\n\n## Running with CLI\n\nBefore you run `Parsera` as command line tool don't forget to put your `OPENAI_API_KEY` to env variables or `.env` file\n\n### Usage\n\nYou can configure elements to parse using `JSON string` or `FILE`.\nOptionally, you can provide `FILE` to write output and amount of `SCROLLS`, that you want to do on the page\n\n```sh\npython -m parsera.main URL {--scheme '{\"title\":\"h1\"}' | --file FILENAME} [--scrolls SCROLLS] [--output FILENAME]\n```\n\n## Running in Docker\n\nIn case of issues with your local environment you can run Parsera with Docker, [see documentation](https://docs.parsera.org/features/docker/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraznem%2Fparsera","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraznem%2Fparsera","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraznem%2Fparsera/lists"}