{"id":30713920,"url":"https://github.com/catbraaain/search-crawl","last_synced_at":"2026-05-09T06:12:51.841Z","repository":{"id":301918899,"uuid":"1010614082","full_name":"CatBraaain/search-crawl","owner":"CatBraaain","description":"Search the web and crawl content stealthily, with optional extraction using LLMs.","archived":false,"fork":false,"pushed_at":"2025-09-16T10:30:34.000Z","size":504,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-16T11:44:31.057Z","etag":null,"topics":["crawl","crawler","fastapi","playwright","scrape","scraping","searxng"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CatBraaain.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-29T12:58:47.000Z","updated_at":"2025-09-16T10:30:38.000Z","dependencies_parsed_at":"2025-08-19T06:05:55.011Z","dependency_job_id":"aa61b05a-26f5-4251-bed4-cf467b8d2318","html_url":"https://github.com/CatBraaain/search-crawl","commit_stats":null,"previous_names":["catbraaain/searxng"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CatBraaain/search-crawl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CatBraaain%2Fsearch-crawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CatBraaain%2Fsearch-crawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CatBraaain%2Fsearch-crawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CatBraaain%2Fsearch-crawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CatBraaain","download_url":"https://codeload.github.com/CatBraaain/search-crawl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CatBraaain%2Fsearch-crawl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29198178,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-07T14:35:27.868Z","status":"ssl_error","status_checked_at":"2026-02-07T14:25:51.081Z","response_time":63,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawl","crawler","fastapi","playwright","scrape","scraping","searxng"],"created_at":"2025-09-03T04:09:16.892Z","updated_at":"2026-02-07T15:33:03.461Z","avatar_url":"https://github.com/CatBraaain.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SearchCrawl\n\nA FastAPI project providing a search and crawl API, with optional content extraction using LLMs.\nSimply provide a search query, and it automatically searches and crawls websites.\nIf desired, you can also extract structured content from the crawled pages using your custom instructions with LLMs.\n\n\n## Features\n\n- **Search, Crawl, and Extract in a Single Step**\n  Perform search queries, crawl resulting websites, and extract content using custom instructions with LLMs—all in one request.\n  You can specify the format for passing crawled results to the LLM.\n  By default, the entire page content is provided in Markdown format.\n\n- **Undetected search**\n  Powered by [SearXNG](https://github.com/searxng/searxng) for stealthy, meta search.\n\n- **Undetected crawl**\n  Powered by [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright) for stealthy web crawling, with support for JavaScript-rendered content.\n\n- **Flexible crawl scope**\n  Follow pagination links, internal links, or all links based on configuration.\n  Supports multi-page crawling with configurable depth, page limits, and concurrency.\n\n- **Cache system**\n  Stores search and crawl results persistently with a 24-hour default TTL, preventing frequent requests from triggering IP bans. Cache settings are configurable.\n\n- **OpenAPI support**\n  Provides an OpenAPI specification.\n  This means you can automatically generate API clients in many languages (e.g., Python, TypeScript, Java) using tools like `openapi-generator-cli`.\n\n- **Prebuilt Python client**\n  A ready-to-use Python API client is included.\n\n\n## API Endpoints\n\n### Search API\n- `/search`: Search for websites by query.\n- `/search-crawl`: Combine search and crawl functionality.\n- `/search-crawl-extract`: Search, crawl, and extract structured data in one step.\n\n### Crawl API\n- `/crawl`: Crawl a website using a crawl request.\n- `/crawl-many`: Crawl multiple websites concurrently using a crawl many request.\n- `/crawl-extract`: Crawl and immediately extract structured data.\n\n## Getting Started\n\n### 1: Prepare compose.yaml\nCreate a `compose.yaml` in your project. Remote include requires Docker Compose \u003e= v2.21.0:\n```yaml\n# compose.yaml\ninclude:\n  - https://github.com/CatBraaain/search-crawl.git\n```\n\n\u003cdetails\u003e\u003csummary\u003eAlternative: Traditional way or older Docker Compose\u003c/summary\u003e\n\n```bash\ngit clone https://github.com/CatBraaain/search-crawl\ncd search-crawl\n```\n\u003c/details\u003e\n\n### 2: Prepare .env\nSet environment variables for extract function:\n```dotenv\n# .env\nLLM_MODEL=\"xxxxxxxxxx\"\nLLM_API_KEY=\"xxxxxxxxxx\"\n```\nThe model name should follow the [LiteLLM documentation](https://docs.litellm.ai/docs/providers)\nExamples: \"openai/gpt-5\", \"gemini/gemini-2.5-pro\", \"anthropic/claude-4\", \"deepseek/deepseek-chat\"\n\n### 3: Run Server\nRun the service:\n```bash\ndocker compose up --wait\n```\n\n\u003cdetails\u003e\u003csummary\u003eIf Docker Compose version \u003c v2.34.0\u003c/summary\u003e\n\n```bash\nSET COMPOSE_EXPERIMENTAL_GIT_REMOTE=True\ndocker compose up --wait\n```\n\u003c/details\u003e\n\n## Test the API\n\n### Request via curl\n```bash\ndocker compose up --wait\n# Linux / macOS\ncurl http://localhost:8000/search --json '{\"q\":\"hello world\"}'\n# Windows (PowerShell)\ncurl http://localhost:8000/search --json \"{\\\"q\\\":\\\"hello world\\\"}\"\n```\n\n### Request via Python SDK\nInstall the Python client:\n```bash\nuv init\nuv add git+https://github.com/CatBraaain/search-crawl.git#subdirectory=search_crawl_client\n```\n\nRun examples from the [examples](examples) directory\n\n#### Search + Crawl:\n```bash\nuv run examples/search_crawl.py\n```\n\nExpected output:\n```bash\nURL: https://en.wikipedia.org/wiki/%22Hello,_World!%22_program\nTITLE: \"Hello, World!\" program - Wikipedia\nMARKDOWN:\nTraditional first example of a computer programming language\nA **\"Hello, World!\" program** is usually a simple [computer program](/wiki/Computer_program \"Computer program\") that emits (or displays) t...\n```\n\n#### Search + Crawl + Extract:\n```bash\nuv run examples/search_crawl_extract.py\n```\n\nExpected output:\n```bash\npopulation=8005176000 source_url='https://worldpopulationreview.com'\n```\n\n## OpenAPI Document\nAfter starting the service, visit:\n- Swagger UI: http://localhost:8000/docs\n- ReDoc: http://localhost:8000/redoc\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatbraaain%2Fsearch-crawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcatbraaain%2Fsearch-crawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatbraaain%2Fsearch-crawl/lists"}