https://github.com/mendableai/firesearch
https://github.com/mendableai/firesearch
firecrawl langchain langgraph llm research
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/mendableai/firesearch
- Owner: mendableai
- Created: 2025-05-30T17:58:19.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-06-02T14:21:33.000Z (4 months ago)
- Last Synced: 2025-06-09T09:11:47.138Z (4 months ago)
- Topics: firecrawl, langchain, langgraph, llm, research
- Language: TypeScript
- Homepage: https://tools.firecrawl.dev/firesearch
- Size: 257 KB
- Stars: 256
- Watchers: 1
- Forks: 44
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Firesearch - AI-Powered Deep Research Tool
![]()
Comprehensive web research powered by [Firecrawl](https://www.firecrawl.dev/) and [LangGraph](https://www.langchain.com/langgraph)
## Technologies
- **Firecrawl**: Multi-source web content extraction
- **OpenAI GPT-4o**: Search planning and follow-up generation
- **Next.js 15**: Modern React framework with App Router[](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fmendableai%2Ffiresearch&env=FIRECRAWL_API_KEY,OPENAI_API_KEY&envDescription=API%20keys%20required%20for%20Firesearch&envLink=https%3A%2F%2Fgithub.com%2Fmendableai%2Ffiresearch%23required-api-keys)
## Setup
### Required API Keys
| Service | Purpose | Get Key |
|---------|---------|---------|
| Firecrawl | Web scraping and content extraction | [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) |
| OpenAI | Search planning and summarization | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) |### Quick Start
1. Clone this repository
2. Create a `.env.local` file with your API keys:
```
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
3. Install dependencies: `npm install` or `yarn install`
4. Run the development server: `npm run dev` or `yarn dev`## How It Works
### Architecture Overview
```mermaid
flowchart TB
Query["'Compare Samsung Galaxy S25
and iPhone 16'"]:::query
Query --> Break
Break["🔍 Break into Sub-Questions"]:::primary
subgraph SubQ["🌐 Search Queries"]
S1["iPhone 16 Pro specs features"]:::search
S2["Samsung Galaxy S25 Ultra specs"]:::search
S3["iPhone 16 vs Galaxy S25 comparison"]:::search
end
Break --> SubQ
subgraph FC["🔥 Firecrawl API Calls"]
FC1["Firecrawl /search API
Query 1"]:::firecrawl
FC2["Firecrawl /search API
Query 2"]:::firecrawl
FC3["Firecrawl /search API
Query 3"]:::firecrawl
end
S1 --> FC1
S2 --> FC2
S3 --> FC3
subgraph Sources["📄 Sources Found"]
R1["Apple.com ✓
The Verge ✓
CNET ✓"]:::source
R2["GSMArena ✓
TechRadar ✓
Samsung.com ✓"]:::source
R3["AndroidAuth ✓
TomsGuide ✓"]:::source
end
FC1 --> R1
FC2 --> R2
FC3 --> R3
subgraph Valid["✅ Answer Validation"]
V1["iPhone 16 specs ✓ (0.95)"]:::good
V2["S25 specs ✓ (0.9)"]:::good
V3["S25 price ❌ (0.3)"]:::bad
end
Sources --> Valid
Valid --> Retry
Retry{"Need info:
S25 pricing?"}:::check
subgraph Strat["🧠 Alternative Strategy"]
Original["Original: 'Galaxy S25 price'
❌ No specific pricing found"]:::bad
NewTerms["Try: 'Galaxy S25 MSRP cost'
'Samsung S25 pricing leak'
'S25 vs S24 price comparison'"]:::strategy
end
Retry -->|Yes| Strat
subgraph Retry2["🔄 Retry Searches"]
Alt1["Galaxy S25 MSRP retail"]:::search
Alt2["Samsung S25 pricing leak"]:::search
Alt3["S25 vs S24 price comparison"]:::search
end
Strat --> Retry2
subgraph FC2G["🔥 Retry API Calls"]
FC4["Firecrawl /search API
Alt Query 1"]:::firecrawl
FC5["Firecrawl /search API
Alt Query 2"]:::firecrawl
FC6["Firecrawl /search API
Alt Query 3"]:::firecrawl
end
Alt1 --> FC4
Alt2 --> FC5
Alt3 --> FC6
Results2["SamMobile ✓ ($899 leak)
9to5Google ✓ ($100 more)
PhoneArena ✓ ($899)"]:::source
FC4 --> Results2
FC5 --> Results2
FC6 --> Results2
Final["All answers found ✓
S25 price: $899"]:::good
Results2 --> Final
Synthesis["LLM synthesizes response"]:::synthesis
Final --> Synthesis
FollowUp["Generate follow-up questions"]:::primary
Synthesis --> FollowUp
Citations["List citations [1-10]"]:::primary
FollowUp --> Citations
Answer["Complete response delivered"]:::answer
Citations --> Answer
%% No path - skip retry and go straight to synthesis
Retry -->|No| Synthesis
classDef query fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff
classDef subq fill:#ffd4b3,stroke:#ff6b1a,stroke-width:1px,color:#333
classDef search fill:#ff8c42,stroke:#ff6b1a,stroke-width:2px,color:#fff
classDef source fill:#3a4a5c,stroke:#2c3a47,stroke-width:2px,color:#fff
classDef check fill:#ffeb3b,stroke:#fbc02d,stroke-width:2px,color:#333
classDef good fill:#4caf50,stroke:#388e3c,stroke-width:2px,color:#fff
classDef bad fill:#f44336,stroke:#d32f2f,stroke-width:2px,color:#fff
classDef strategy fill:#9c27b0,stroke:#7b1fa2,stroke-width:2px,color:#fff
classDef synthesis fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff
classDef answer fill:#3a4a5c,stroke:#2c3a47,stroke-width:3px,color:#fff
classDef firecrawl fill:#ff6b1a,stroke:#ff4500,stroke-width:3px,color:#fff
classDef label fill:none,stroke:none,color:#666,font-weight:bold
```### Process Flow
1. **Break Down** - Complex queries split into focused sub-questions
2. **Search** - Multiple searches via Firecrawl API for comprehensive coverage
3. **Extract** - Markdown content extracted from web sources
4. **Validate** - Check if sources actually answer the questions (0.7+ confidence)
5. **Retry** - Alternative search terms for unanswered questions (max 2 attempts)
6. **Synthesize** - GPT-4o combines findings into cited answer### Key Features
- **Smart Search** - Breaks complex queries into multiple focused searches
- **Answer Validation** - Verifies sources contain actual answers (0.7+ confidence)
- **Auto-Retry** - Alternative search terms for unanswered questions
- **Real-time Progress** - Live updates as searches complete
- **Full Citations** - Every fact linked to its source
- **Context Memory** - Follow-up questions maintain conversation context### Configuration
Customize search behavior by modifying [`lib/config.ts`](lib/config.ts):
```typescript
export const SEARCH_CONFIG = {
// Search Settings
MAX_SEARCH_QUERIES: 12, // Maximum number of search queries to generate
MAX_SOURCES_PER_SEARCH: 4, // Maximum sources to return per search query
MAX_SOURCES_TO_SCRAPE: 3, // Maximum sources to scrape for additional content
// Content Processing
MIN_CONTENT_LENGTH: 100, // Minimum content length to consider valid
SUMMARY_CHAR_LIMIT: 100, // Character limit for source summaries
// Retry Logic
MAX_RETRIES: 2, // Maximum retry attempts for failed operations
MAX_SEARCH_ATTEMPTS: 2, // Maximum attempts to find answers via search
MIN_ANSWER_CONFIDENCE: 0.7, // Minimum confidence (0-1) that a question was answered
// Timeouts
SCRAPE_TIMEOUT: 15000, // Timeout for scraping operations (ms)
} as const;
```### Firecrawl API Integration
Firesearch leverages Firecrawl's powerful `/search` endpoint:
#### `/search` - Web Search with Content
- **Purpose**: Finds relevant URLs AND extracts markdown content in one call
- **Usage**: Each decomposed query is sent to find 6-8 relevant sources with content
- **Response**: Returns URLs with titles, snippets, AND full markdown content
- **Key Feature**: The `scrapeOptions` parameter enables content extraction during search
- **Example**:
```
POST /search
{
"query": "iPhone 16 specs pricing",
"limit": 8,
"scrapeOptions": {
"formats": ["markdown"]
}
}
```### Search Strategies
When initial results are insufficient, the system automatically tries:
- **Broaden Keywords**: Removes specific terms for wider results
- **Narrow Focus**: Adds specific terms to target missing aspects
- **Synonyms**: Uses alternative terms and phrases
- **Rephrase**: Completely reformulates the query
- **Decompose**: Breaks complex queries into sub-questions
- **Academic**: Adds scholarly terms for research-oriented results
- **Practical**: Focuses on tutorials and how-to guides## Example Queries
- "Who are the founders of Firecrawl?"
- "When did NVIDIA release the RTX 4080 Super?"
- "Compare the latest iPhone, Samsung Galaxy, and Google Pixel flagship features"## License
MIT License