{"id":30738918,"url":"https://github.com/firecrawl/grok-4-fire-enrich","last_synced_at":"2025-09-03T22:39:45.554Z","repository":{"id":303991068,"uuid":"1017425233","full_name":"firecrawl/grok-4-fire-enrich","owner":"firecrawl","description":null,"archived":false,"fork":false,"pushed_at":"2025-07-10T15:53:11.000Z","size":383,"stargazers_count":42,"open_issues_count":0,"forks_count":7,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-18T19:02:30.547Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/firecrawl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-10T14:08:45.000Z","updated_at":"2025-08-06T13:54:16.000Z","dependencies_parsed_at":"2025-07-10T21:03:01.264Z","dependency_job_id":"d5438d54-1054-4226-82da-656118e90c27","html_url":"https://github.com/firecrawl/grok-4-fire-enrich","commit_stats":null,"previous_names":["mendableai/grok-4-fire-enrich","firecrawl/grok-4-fire-enrich"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/firecrawl/grok-4-fire-enrich","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firecrawl%2Fgrok-4-fire-enrich","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firecrawl%2Fgrok-4-fire-enrich/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firecrawl%2Fgrok-4-fire-enrich/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firecrawl%2Fgrok-4-fire-enrich/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/firecrawl","download_url":"https://codeload.github.com/firecrawl/grok-4-fire-enrich/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firecrawl%2Fgrok-4-fire-enrich/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273523587,"owners_count":25120863,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-03T02:00:09.631Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-03T22:39:40.611Z","updated_at":"2025-09-03T22:39:45.535Z","avatar_url":"https://github.com/firecrawl.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Grok 4 Fire Enrich - AI-Powered Data Enrichment Tool\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://media4.giphy.com/media/v1.Y2lkPTc5MGI3NjExNjJwMnF2cW5zbXBhbGV6NXBpb3lkZmVhMWEwY3hmdmt3d3ZtbWc5YSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/QhpbWI09KyFZ0rwD72/giphy.gif\" alt=\"Grok 4 Fire Enrich Demo\" width=\"100%\" /\u003e\n\u003c/div\u003e\n\nTurn a simple list of emails into a rich dataset with company profiles, funding data, tech stacks, and more. Powered by [Firecrawl](https://www.firecrawl.dev/), a multi-agent AI system, and Grok 4 for intelligent agent execution.\n\n## Technologies\n\n- **Firecrawl**: Web scraping and content aggregation\n- **Grok 4**: AI model powering the agent base execution\n- **OpenAI**: Intelligent data extraction and synthesis\n- **Next.js 15**: Modern React framework with App Router\n\n[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fmendableai%2Ffire-enrich\u0026env=FIRECRAWL_API_KEY,OPENAI_API_KEY,GROK_API_KEY\u0026envDescription=API%20keys%20required%20for%20Grok%204%20Fire%20Enrich\u0026envLink=https%3A%2F%2Fgithub.com%2Fmendableai%2Ffire-enrich%23required-api-keys)\n\n## Setup\n\n### Required API Keys\n\n| Service   | Purpose                              | Get Key                                                              |\n| --------- | ------------------------------------ | -------------------------------------------------------------------- |\n| Firecrawl | Web scraping and content aggregation | [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) |\n| Grok 4    | AI model for agent execution         | [x.ai/api](https://x.ai/api)                                         |\n| OpenAI    | Intelligent data extraction          | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) |\n\n### Quick Start\n\n1. Clone this repository\n2. Create a `.env.local` file with your API keys:\n   ```\n   FIRECRAWL_API_KEY=your_firecrawl_key\n   GROK_API_KEY=your_grok_4_key\n   OPENAI_API_KEY=your_openai_key\n   ```\n3. Install dependencies: `npm install` or `yarn install`\n4. Run the development server: `npm run dev` or `yarn dev`\n5. Open [http://localhost:3000](http://localhost:3000)\n\n## Example Enrichment\n\n**Before:**\n\n```json\n{\n  \"email\": \"erez@wiz.io\"\n}\n```\n\n**After:**\n\n```json\n{\n  \"email\": \"erez@wiz.io\",\n  \"companyName\": \"Wiz\",\n  \"industry\": \"Cybersecurity\",\n  \"employeeCount\": \"1001-5000\",\n  \"yearFounded\": 2020,\n  \"headquarters\": \"New York, NY\",\n  \"fundingStage\": \"Series D\",\n  \"totalRaised\": \"$900M\",\n  \"website\": \"https://www.wiz.io\",\n  \"sources\": [\n    \"https://www.wiz.io/about\",\n    \"https://techcrunch.com/2023/02/27/wiz-confirms-300m-at-a-10b-valuation-to-build-out-its-cloud-security-platform/\"\n  ]\n}\n```\n\n## How It Works\n\n### Architecture Overview: Following \"ericciarla@firecrawl.dev\" Through the System\n\nLet's see exactly how Grok 4 Fire Enrich processes a real example - enriching data for the email ericciarla@firecrawl.dev.\n\n```mermaid\ngraph TD\n    Start[\"Input: ericciarla@firecrawl.dev - Industry, CEO, Funding Stage, Tech Stack\"]:::primary\n\n    Start --\u003e|1. Extract Domain| Domain[\"Domain: firecrawl.dev - Corporate email detected\"]:::primary\n\n    Domain --\u003e|2. Start Orchestration| Orchestrator[\"Agent Orchestrator - Executes agents in optimized sequence - Each phase builds on previous data\"]:::synthesis\n\n    %% Phase 1: Discovery\n    Orchestrator --\u003e|Phase 1| Discovery[\"Discovery Agent - Finds basic company info first\"]:::agent\n\n    Discovery --\u003e|Parallel searches| DiscSearch[\"Parallel Searches: Firecrawl company, firecrawl.dev, What is Firecrawl\"]:::search\n\n    DiscSearch --\u003e|Firecrawl API| DiscFC[\"3 concurrent API calls - Returns company website and basic information\"]:::firecrawl\n\n    DiscFC --\u003e|Extracts| DiscData[\"Company: Firecrawl - Website: firecrawl.dev - Type: B2B SaaS\"]:::source\n\n    %% Phase 2: Company Profile\n    DiscData --\u003e|Phase 2| Profile[\"Company Profile Agent - Uses company name from Phase 1 to find industry details\"]:::agent\n\n    Profile --\u003e|Parallel searches| ProfSearch[\"Parallel Searches: Firecrawl industry classification, Firecrawl web scraping API, Developer tools Firecrawl\"]:::search\n\n    ProfSearch --\u003e|Firecrawl API| ProfFC[\"3 concurrent API calls - Searches industry-specific sources\"]:::firecrawl\n\n    ProfFC --\u003e|Extracts| ProfData[\"Industry: Developer Tools - Sub-category: Web Scraping APIs - Market: B2B SaaS\"]:::source\n\n    %% Phase 3: Financial\n    ProfData --\u003e|Phase 3| Funding[\"Financial Intel Agent - Searches for funding using company and industry context\"]:::agent\n\n    Funding --\u003e|Parallel searches| FundSearch[\"Parallel Searches: Firecrawl funding rounds, Mendable AI acquisition Firecrawl, Firecrawl investors crunchbase\"]:::search\n\n    FundSearch --\u003e|Firecrawl API| FundFC[\"3 concurrent API calls - Checks TechCrunch, Crunchbase, venture news sites\"]:::firecrawl\n\n    FundFC --\u003e|Extracts| FundData[\"Funding: Seed Stage - Part of Mendable AI - YC-backed company\"]:::source\n\n    %% Phase 4: Tech Stack\n    FundData --\u003e|Phase 4| Tech[\"Tech Stack Agent - Analyzes GitHub and tech docs - HTML source analysis\"]:::agent\n\n    Tech --\u003e|Parallel searches| TechSearch[\"Parallel Searches: github.com/mendableai/firecrawl, Firecrawl API documentation, Direct HTML analysis\"]:::search\n\n    TechSearch --\u003e|Firecrawl API| TechFC[\"3 concurrent API calls - HTML meta tag analysis - GitHub repo scan\"]:::firecrawl\n\n    TechFC --\u003e|Extracts| TechData[\"Tech Stack: Node.js, Python, Redis, Playwright, Kubernetes\"]:::source\n\n    %% Phase 5: General\n    TechData --\u003e|Phase 5| General[\"General Purpose Agent - Handles custom field CEO - Uses all previous context\"]:::agent\n\n    General --\u003e|Targeted search| GenSearch[\"Focused Search: Firecrawl CEO founder Eric, Eric Ciarla Firecrawl, LinkedIn company search\"]:::search\n\n    GenSearch --\u003e|Firecrawl API| GenFC[\"3 concurrent API calls - Cross-references multiple sources\"]:::firecrawl\n\n    GenFC --\u003e|Extracts| GenData[\"CEO: Eric Ciarla - Co-founder and CEO of Firecrawl - Previously at Mendable AI\"]:::source\n\n    %% Final Synthesis\n    DiscData --\u003e Synthesis\n    ProfData --\u003e Synthesis\n    FundData --\u003e Synthesis\n    TechData --\u003e Synthesis\n    GenData --\u003e Synthesis\n\n    Synthesis[\"GPT-4o Final Synthesis - Combines all agent findings - Resolves conflicts, validates data\"]:::synthesis\n\n    Synthesis --\u003e|Outputs| Results\n\n    subgraph Results[Enriched Data]\n        R1[\"Industry: Developer Tools / Web Scraping - Source: firecrawl.dev/about\"]:::good\n        R2[\"CEO: Eric Ciarla Co-founder and CEO - Source: linkedin.com/company/firecrawl\"]:::good\n        R3[\"Funding: Seed Part of Mendable AI - Source: crunchbase.com\"]:::good\n        R4[\"Tech Stack: Node.js, Python, Redis, K8s - Source: github.com/mendableai/firecrawl\"]:::good\n    end\n\n    Results --\u003e|Final Output| Output[\"Updated CSV Row: ericciarla@firecrawl.dev - Complete profile with 4 new data points and sources\"]:::answer\n\n    classDef primary fill:#ff8c42,stroke:#ff6b1a,stroke-width:2px,color:#fff\n    classDef agent fill:#9c27b0,stroke:#7b1fa2,stroke-width:2px,color:#fff\n    classDef search fill:#e8e8e8,stroke:#999,stroke-width:2px,color:#333\n    classDef firecrawl fill:#ff6b1a,stroke:#ff4500,stroke-width:3px,color:#fff\n    classDef source fill:#ffa726,stroke:#ff8c42,stroke-width:2px,color:#000\n    classDef synthesis fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff\n    classDef good fill:#f5f5f5,stroke:#666,stroke-width:1px,color:#000\n    classDef answer fill:#333,stroke:#000,stroke-width:3px,color:#fff\n```\n\n### How Each Agent Works\n\nBehind the scenes, each agent is a specialized module with its own expertise, search strategies, and type-safe output schema:\n\n1. **Discovery Agent** (Phase 1)\n\n   - Establishes company basics: official name, website, type of business\n   - Essential first step that provides the foundation for all other agents\n   - **Returns**: Company name, website URL, business type\n   - **Schema**: `DiscoveryResult` with fields like `companyName`, `website`, `domain`\n\n2. **Company Profile Agent** (Phase 2)\n\n   - Uses verified company name to search for industry and market positioning\n   - Builds on Discovery data to ensure accurate industry classification\n   - **Returns**: Industry, sub-category, business model, market segment\n   - **Schema**: `ProfileResult` with `industry`, `headquarters`, `yearFounded`, `companyType`\n\n3. **Financial Intel Agent** (Phase 3)\n\n   - Leverages company name + industry context for targeted funding searches\n   - Knowing the industry helps identify relevant investor databases\n   - **Returns**: Funding stage, total raised, key investors, valuation\n   - **Schema**: `FundingResult` with `fundingStage`, `totalRaised`, `lastRoundAmount`, `investors`\n\n4. **Tech Stack Agent** (Phase 4)\n\n   - Analyzes technology with context of company type and funding stage\n   - HTML analysis, GitHub repos, and technical documentation\n   - **Returns**: Programming languages, frameworks, infrastructure, tools\n   - **Uses**: Direct `EnrichmentResult` schema for flexible tech stack extraction\n\n5. **General Purpose Agent** (Phase 5)\n   - Handles custom fields (like CEO, competitors, etc.) with full context\n   - Benefits from all previous data to make targeted searches\n   - **Returns**: Any custom field requested by the user\n   - **Uses**: Dynamic `EnrichmentResult` schema based on user-defined fields\n\n### Why Sequential Execution?\n\nThe agents execute in a carefully designed sequence where each phase builds upon the previous one:\n\n- **Context Building**: Each agent adds context that makes subsequent searches more accurate. For example, knowing a company's industry helps the funding agent search in the right venture databases.\n- **Data Validation**: Later agents can validate and refine data from earlier phases.\n- **Efficiency**: Prevents redundant searches by sharing discovered information across phases.\n- **Parallel Searches Within Phases**: While agents run sequentially, each agent performs multiple searches in parallel, maximizing speed.\n\nThis architecture balances accuracy with performance - we could run all agents in parallel, but the sequential approach with shared context produces significantly better results.\n\n### Extensibility Through Type-Safe Schemas\n\nEach agent uses [Zod](https://zod.dev/) schemas to ensure type safety and make the system easily extensible:\n\n```typescript\n// Example: Adding a new field to the FundingAgent\nconst FundingResult = z.object({\n  fundingStage: z.string().optional(),\n  totalRaised: z.string().optional(),\n  lastRoundAmount: z.string().optional(),\n  investors: z.array(z.string()).optional(),\n  // Add your new field here:\n  debtFinancing: z.string().optional(),\n});\n```\n\n**To extend Grok 4 Fire Enrich with new data extraction capabilities:**\n\n1. **Add to existing agent**: Modify the Zod schema in `/lib/agent-architecture/agents/[agent-name].ts`\n2. **Create a new agent**: Define a new schema and implement the `AgentBase` interface\n3. **Update the orchestrator**: Add routing logic to direct fields to your new agent\n4. **Use custom fields**: The General Agent handles any field not covered by specialized agents\n\nThe field routing system automatically categorizes user requests:\n\n- Fields with \"industry\" or \"headquarter\" → Company Profile Agent\n- Fields with \"fund\" or \"invest\" → Financial Intel Agent\n- Fields with \"employee\" or \"revenue\" → Metrics Agent\n- Fields with \"tech\" and \"stack\" → Tech Stack Agent\n- Everything else → General Purpose Agent\n\nThis design allows Grok 4 Fire Enrich to grow with your needs while maintaining type safety and predictable behavior.\n\n### Process Flow\n\n1.  **Upload \u0026 Parse**: Upload a CSV with emails. The system extracts the company domain from each email.\n2.  **Field Selection**: Choose the data points you need, from company descriptions to funding stages.\n3.  **Sequential Agent Execution**: Agents activate in phases, each building on previous discoveries for maximum accuracy.\n4.  **Parallel Searches Per Phase**: Within each phase, multiple searches run concurrently using the Firecrawl API.\n5.  **AI Synthesis**: GPT-4o analyzes all findings, resolves conflicts, and extracts structured data.\n6.  **Real-time Results**: Your table populates in real-time, complete with enriched data and source citations.\n\n### The Multi-Agent System\n\nGrok 4 Fire Enrich employs a sophisticated orchestration system that coordinates specialized extraction modules. These aren't autonomous AI agents, but rather purpose-built components that work together intelligently:\n\n- **Discovery Phase**: Establishes the foundation by identifying the company and its digital presence\n- **Profile Extraction**: Specialized logic for industry classification and business model analysis\n- **Financial Intelligence**: Targeted searches across venture databases and news sources\n- **Technical Analysis**: Deep inspection including HTML parsing and repository analysis\n- **Custom Field Handler**: Flexible extraction for any user-defined data points\n\nEach module uses GPT-4o for intelligent data extraction, but follows deterministic search patterns optimized through extensive testing. The agent base execution is powered by Grok 4, providing efficient orchestration of the extraction modules. This hybrid approach combines the reliability of structured programming with the flexibility of AI-powered comprehension.\n\n### Key Features\n\n- **Phased Extraction System**: Sequential modules that build context for increasingly accurate results.\n- **Drag \u0026 Drop CSV**: Simple, intuitive interface to get started in seconds.\n- **Customizable Fields**: Choose from a list of common data points or generate your own with natural language.\n- **Real-time Streaming**: Watch your data get enriched row-by-row via Server-Sent Events.\n- **Full Source Citations**: Every piece of data is linked back to the URL it was found on, ensuring complete transparency.\n- **Skip Common Providers**: Automatically skips personal emails (Gmail, Yahoo, etc.) to save on API calls and focus on company data.\n\n### Configuration \u0026 Unlimited Mode\n\nWhen you clone and run this repository locally, Grok 4 Fire Enrich automatically enables **Unlimited Mode**, removing the restrictions of the public demo. You can configure these limits in [`app/fire-enrich/config.ts`](app/fire-enrich/config.ts):\n\n```typescript\nconst isUnlimitedMode =\n  process.env.FIRE_ENRICH_UNLIMITED === \"true\" ||\n  process.env.NODE_ENV === \"development\";\n\nexport const FIRE_ENRICH_CONFIG = {\n  CSV_LIMITS: {\n    MAX_ROWS: isUnlimitedMode ? Infinity : 15,\n    MAX_COLUMNS: isUnlimitedMode ? Infinity : 5,\n  },\n  REQUEST_LIMITS: {\n    MAX_FIELDS_PER_ENRICHMENT: isUnlimitedMode ? 50 : 10,\n  },\n} as const;\n```\n\n## Our Open Source Philosophy\n\nLet's be blunt: professional data enrichment services are expensive for a reason. Our goal with Grok 4 Fire Enrich isn't to replicate every feature of mature platforms overnight. Instead, we want to build a powerful, open-source foundation that anyone can use, understand, and contribute to.\n\nThis is just the start. By open-sourcing it, we're inviting you to join us on this journey.\n\n- **Add a new agent?** Fork the repo and show us what you've got.\n- **Improve a data extraction prompt?** Open a pull request.\n- **Have a new feature idea?** Start a discussion in the issues.\n\nWe believe that by building in public, we can create a tool that is more accessible, affordable, and adaptable, thanks to the collective intelligence of the open-source community.\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nWe welcome contributions! Please feel free to submit a Pull Request.\n\n## Support\n\nFor questions and issues, please open an issue in this repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirecrawl%2Fgrok-4-fire-enrich","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffirecrawl%2Fgrok-4-fire-enrich","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirecrawl%2Fgrok-4-fire-enrich/lists"}