{"id":14064776,"url":"https://github.com/mendableai/firecrawl-py","last_synced_at":"2025-04-14T14:31:32.233Z","repository":{"id":232902032,"uuid":"785487209","full_name":"mendableai/firecrawl-py","owner":"mendableai","description":"Crawl and convert any website into clean markdown","archived":false,"fork":false,"pushed_at":"2024-05-27T17:21:02.000Z","size":5,"stargazers_count":47,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-28T03:32:30.351Z","etag":null,"topics":["ai","crawler","llm","python","scraper"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/firecrawl-py/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mendableai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-12T01:24:42.000Z","updated_at":"2025-03-09T13:45:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"db6a89b4-9626-4e20-818b-e2f23efb6f7a","html_url":"https://github.com/mendableai/firecrawl-py","commit_stats":null,"previous_names":["mendableai/firecrawl-py"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mendableai%2Ffirecrawl-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mendableai%2Ffirecrawl-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mendableai%2Ffirecrawl-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mendableai%2Ffirecrawl-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mendableai","download_url":"https://codeload.github.com/mendableai/firecrawl-py/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248897166,"owners_count":21179549,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","crawler","llm","python","scraper"],"created_at":"2024-08-13T07:04:04.375Z","updated_at":"2025-04-14T14:31:31.937Z","avatar_url":"https://github.com/mendableai.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Firecrawl Python SDK has moved to [main repo](https://github.com/mendableai/firecrawl)\n\n\n## [OLD] Firecrawl Python SDK\n\nThe Firecrawl Python SDK is a library that allows you to easily scrape and crawl websites, and output the data in a format ready for use with language models (LLMs). It provides a simple and intuitive interface for interacting with the Firecrawl API.\n\n## Installation\n\nTo install the Firecrawl Python SDK, you can use pip:\n\n```bash\npip install firecrawl-py\n```\n\n## Usage\n\n1. Get an API key from [firecrawl.dev](https://firecrawl.dev)\n2. Set the API key as an environment variable named `FIRECRAWL_API_KEY` or pass it as a parameter to the `FirecrawlApp` class.\n\n\nHere's an example of how to use the SDK:\n\n```python\nfrom firecrawl import FirecrawlApp\n\n# Initialize the FirecrawlApp with your API key\napp = FirecrawlApp(api_key='your_api_key')\n\n# Scrape a single URL\nurl = 'https://mendable.ai'\nscraped_data = app.scrape_url(url)\n\n# Crawl a website\ncrawl_url = 'https://mendable.ai'\ncrawl_params = {\n    'crawlerOptions': {\n        'excludes': ['blog/*'],\n        'includes': [], # leave empty for all pages\n        'limit': 1000,\n    }\n}\ncrawl_result = app.crawl_url(crawl_url, params=crawl_params)\n```\n\n### Scraping a URL\n\nTo scrape a single URL, use the `scrape_url` method. It takes the URL as a parameter and returns the scraped data as a dictionary.\n\n```python\nurl = 'https://example.com'\nscraped_data = app.scrape_url(url)\n```\n\n### Crawling a Website\n\nTo crawl a website, use the `crawl_url` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.\n\nThe `wait_until_done` parameter determines whether the method should wait for the crawl job to complete before returning the result. If set to `True`, the method will periodically check the status of the crawl job until it is completed or the specified `timeout` (in seconds) is reached. If set to `False`, the method will return immediately with the job ID, and you can manually check the status of the crawl job using the `check_crawl_status` method.\n\n```python\ncrawl_url = 'https://example.com'\ncrawl_params = {\n    'crawlerOptions': {\n        'excludes': ['blog/*'],\n        'includes': [], # leave empty for all pages\n        'limit': 1000,\n    }\n}\ncrawl_result = app.crawl_url(crawl_url, params=crawl_params, wait_until_done=True, timeout=5)\n```\n\nIf `wait_until_done` is set to `True`, the `crawl_url` method will return the crawl result once the job is completed. If the job fails or is stopped, an exception will be raised.\n\n### Checking Crawl Status\n\nTo check the status of a crawl job, use the `check_crawl_status` method. It takes the job ID as a parameter and returns the current status of the crawl job.\n\n```python\njob_id = crawl_result['jobId']\nstatus = app.check_crawl_status(job_id)\n```\n\n## Error Handling\n\nThe SDK handles errors returned by the Firecrawl API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message.\n\n## Contributing\n\nContributions to the Firecrawl Python SDK are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.\n\n## License\n\nThe Firecrawl Python SDK is open-source and released under the [MIT License](https://opensource.org/licenses/MIT).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmendableai%2Ffirecrawl-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmendableai%2Ffirecrawl-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmendableai%2Ffirecrawl-py/lists"}