{"id":26512919,"url":"https://github.com/sammcj/firecrawler","last_synced_at":"2026-02-26T12:03:11.770Z","repository":{"id":282815996,"uuid":"949692188","full_name":"sammcj/firecrawler","owner":"sammcj","description":"A lightweight frontend for self-hosted Firecrawl instances","archived":false,"fork":false,"pushed_at":"2025-03-17T04:29:16.000Z","size":256,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-17T05:25:21.301Z","etag":null,"topics":["crawling","fetching","firecrawl","llm","markdown","md","scraping"],"latest_commit_sha":null,"homepage":"https://smcleod.net","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sammcj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["sammcj"],"buy_me_a_coffee":"sam.mcleod"}},"created_at":"2025-03-17T01:30:33.000Z","updated_at":"2025-03-17T04:29:19.000Z","dependencies_parsed_at":"2025-03-17T05:25:34.210Z","dependency_job_id":"b7431c4b-21a2-49b7-a03a-bf78f79a45b8","html_url":"https://github.com/sammcj/firecrawler","commit_stats":null,"previous_names":["sammcj/firecrawler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sammcj%2Ffirecrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sammcj%2Ffirecrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sammcj%2Ffirecrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sammcj%2Ffirecrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sammcj","download_url":"https://codeload.github.com/sammcj/firecrawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244734196,"owners_count":20501018,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawling","fetching","firecrawl","llm","markdown","md","scraping"],"created_at":"2025-03-21T04:19:04.375Z","updated_at":"2026-02-26T12:03:11.541Z","avatar_url":"https://github.com/sammcj.png","language":"JavaScript","funding_links":["https://github.com/sponsors/sammcj","https://buymeacoffee.com/sam.mcleod"],"categories":[],"sub_categories":[],"readme":"# Firecrawler\n\nA lightweight frontend for self-hosted Firecrawl API instances. This playground provides a user-friendly interface for using Firecrawl's web scraping and crawling capabilities.\n\n## Features\n\n- **Scrape Mode**: Convert a single URL to markdown, HTML, or take screenshots\n- **Crawl Mode**: Discover and scrape multiple pages from a starting URL\n- **Extract Mode**: Extract structured data from web pages using LLM\n- **CORS-free**: Uses a proxy server to avoid CORS issues when connecting to your Firecrawl API instance\n\n![screenshot](screenshot.png)\n\n## Getting Started\n\n### Running Locally\n\n1. Configure environment variables:\n   ```shell\n   cp .example.env .env\n   ```\n   Edit the `.env` file to set your desired configuration.\n2. Install dependencies and run:\n   ```shell\n   npm i\n   npm start\n   ```\n3. Open your browser and navigate to `http://localhost:3000`\n4. Enter your Firecrawl API endpoint (default: http://firecrawl:3002)\n5. Enter your API key if required\n6. Choose a mode (Scrape, Crawl, or Extract), enter a URL, and click \"Run\"\n\n### Using Docker\n\n1. Configure environment variables:\n   ```\n   cp .example.env .env\n   ```\n   Then edit the `.env` file to set your desired configuration.\n\n2. Build and run using Docker Compose:\n   ```\n   docker-compose up -d\n   ```\n\n3. Open your browser and navigate to `http://localhost:3000`\n\n## Modes\n\n### Scrape Mode\n\nScrape mode allows you to convert a single URL to various formats:\n\n- **Markdown**: Clean, readable markdown format\n- **HTML**: Raw HTML content\n- **Screenshot**: Visual capture of the page\n- **Links**: Extract all links from the page\n\nAdvanced options include:\n\n- Only Main Content: Filter out navigation, footers, etc.\n- Remove Base64 Images: Exclude embedded images\n- Wait For: Time to wait for dynamic content to load\n- Timeout: Maximum time to wait for the page to load\n\n### Crawl Mode\n\nCrawl mode allows you to discover and scrape multiple pages from a starting URL:\n\n- **Max Depth**: How many links deep to crawl\n- **Page Limit**: Maximum number of pages to crawl\n- **Ignore Sitemap**: Skip sitemap.xml discovery\n- **Allow External Links**: Crawl links to external domains\n- **Include/Exclude Paths**: Filter which paths to crawl\n\n### Extract Mode\n\nExtract mode allows you to extract structured data from web pages using LLM:\n\n- **Extraction Prompt**: Instructions for what data to extract\n- **JSON Schema**: Optional schema for structured data extraction\n\n## API Compatibility\n\nThis playground is designed to work with self-hosted Firecrawl API instances. It's compatible with the Firecrawl API v1 endpoints.\n\n## Development\n\nThis is a lightweight application built with vanilla JavaScript, HTML, and CSS. Dependencies are loaded from CDNs:\n\n- Milligram CSS for minimal styling\n- Marked.js for markdown rendering\n- Highlight.js for syntax highlighting\n\nNo build process is required - simply edit the files and refresh the browser to see changes.\n\n## Technical Details\n\n- **Server**: Node.js with Express\n- **Proxy**: Custom HTTP proxy middleware\n- **Configuration**: Environment variables via dotenv (.env file)\n\n## Firecrawler Examples\n\nHere are some examples of how to use the Firecrawler with different modes.\n\n### Scrape Mode Examples\n\n### Basic Markdown Conversion\n\n1. Enter URL: `https://smcleod.net`\n2. Select Format: `markdown`\n3. Enable \"Only Main Content\"\n4. Click \"Run\"\n\n#### Screenshot Capture\n\n1. Enter URL: `https://news.ycombinator.com`\n2. Select Formats: `markdown`, `screenshot`\n3. Set Wait For: `3000` (3 seconds)\n4. Click \"Run\"\n\n#### HTML Extraction\n\n1. Enter URL: `https://github.com`\n2. Select Formats: `html`, `markdown`\n3. Disable \"Only Main Content\" to get the full page\n4. Click \"Run\"\n\n### Crawl Mode Examples\n\n#### Basic Website Crawl\n\n1. Switch to \"Crawl\" mode\n2. Enter URL: `https://smcleod.net`\n3. Set Max Depth: `2`\n4. Set Page Limit: `10`\n5. Select Format: `markdown`\n6. Click \"Run\"\n\n#### Blog Crawl with Path Filtering\n\n1. Switch to \"Crawl\" mode\n2. Enter URL: `https://smcleod.net/about`\n3. Set Max Depth: `3`\n4. Set Page Limit: `20`\n5. Include Paths: `blog,posts`\n6. Exclude Paths: `admin,login,register`\n7. Click \"Run\"\n\n### Extract Mode Examples\n\n#### Basic Content Extraction\n\n1. Switch to \"Extract\" mode\n2. Enter URL: `https://smcleod.net`\n3. Extraction Prompt: `Extract the main heading, summary, and author from this page.`\n4. Click \"Run\"\n\n#### Structured Data Extraction\n\n1. Switch to \"Extract\" mode\n2. Enter URL: `https://news.ycombinator.com`\n3. Extraction Prompt: `Extract the top 5 stories with their titles, points, and authors.`\n4. JSON Schema:\n```json\n{\n  \"type\": \"object\",\n  \"properties\": {\n    \"stories\": {\n      \"type\": \"array\",\n      \"items\": {\n        \"type\": \"object\",\n        \"properties\": {\n          \"title\": { \"type\": \"string\" },\n          \"points\": { \"type\": \"number\" },\n          \"author\": { \"type\": \"string\" }\n        }\n      }\n    }\n  }\n}\n```\n5. Click \"Run\"\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsammcj%2Ffirecrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsammcj%2Ffirecrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsammcj%2Ffirecrawler/lists"}