https://github.com/cameronking4/nextjs-firecrawl-starter
Nextjs 15 Firecrawl app to scrape doc links for an LLM. Use it as a starter kit to build your Firecrawl app. Turn any developer documentation into a GPT knowledge base. Pre-made Github Action to crawl and commit markdown responses to repo.
https://github.com/cameronking4/nextjs-firecrawl-starter
docs firecrawl github-actions llm nextjs rombo shadcn-ui web-scraping
Last synced: 9 months ago
JSON representation
Nextjs 15 Firecrawl app to scrape doc links for an LLM. Use it as a starter kit to build your Firecrawl app. Turn any developer documentation into a GPT knowledge base. Pre-made Github Action to crawl and commit markdown responses to repo.
- Host: GitHub
- URL: https://github.com/cameronking4/nextjs-firecrawl-starter
- Owner: cameronking4
- License: mit
- Created: 2025-01-13T18:54:58.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-14T00:12:23.000Z (about 1 year ago)
- Last Synced: 2025-05-14T02:16:45.231Z (about 1 year ago)
- Topics: docs, firecrawl, github-actions, llm, nextjs, rombo, shadcn-ui, web-scraping
- Language: TypeScript
- Homepage: https://nextjs-firecrawl-starter.vercel.app
- Size: 564 KB
- Stars: 13
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Next.js Firecrawl Starter

This Nextjs app aims to provide a modern web interface for crawling documentation and processing it for LLM use. Use the output `markdown`, `xml`, or `zip` files to build knowledge files to copy over to a vector database, a ChatGPT GPT, an OpenAI Assistant, Claude Artifacts, Vapi.ai, Aimdoc, or any other LLM tool.

The Next app generates a .md file, .xml file, or .zip of markdown files ready for LLM consumption, inspired by the [devdocs-to-llm](https://github.com/alexfazio/devdocs-to-llm) Jupyter notebook by Alex Fazio.
## Features
- 🌐 Serverless architecture using `Firecrawl API` v1
- ⚡ Real-time crawl status updates
- 🎨 Modern UI with dark/light mode support
- 📂 Crawl History using `Local Storage`
- 💥 Github Action defined to manually run crawl function and commit to /knowledge_bases folder
## Github Action
Use the Github Action template to define automations. Leverage Github Actions cron to schedule crawls for a given site and commit markdown file directly to repo.
### Manual Trigger
https://github.com/user-attachments/assets/fdc0f1a3-1632-44fc-b382-332377d73ed6
### Scheduled Crawl ([available on Github Marketplace](https://github.com/marketplace/actions/firecrawl-scheduled-action))
[](https://github.com/cameronking4/nextjs-firecrawl-starter/actions/workflows/crawl-docs.yml)
Add this to any Github Repo to start crawling on a schedule. It will commit the output results automatically after crawling to a specified folder. Default is to crawl `Hacker News` everyday at midnight and store results in the `/knowledge_bases` folder.
```yaml
name: Scheduled Crawl Action
# This workflow will automatically crawl the specified URL on a schedule and commit the results to your repository.
on:
schedule:
- cron: '0 0 * * *' # Replace with the cron expression for the schedule you want to use (e.g., '0 0 * * *' for daily at midnight UTC)
workflow_dispatch: # Allow manual triggering
jobs:
crawl:
runs-on: ubuntu-latest
permissions:
contents: write
id-token: write
actions: read
steps:
- uses: actions/checkout@v4
- name: Firecrawl Scheduled Action
uses: cameronking4/nextjs-firecrawl-starter@v1.0.0
with:
url: 'https://news.ycombinator.com' # Replace with the URL you want to crawl regularly
output_folder: 'knowledge_bases' # Replace with the folder name where the output commits will be saved
api_url: 'https://nextjs-firecrawl-starter.vercel.app' # Replace with the API URL of your Firecrawl API endpoint, this is the default URL for the starter app.
```
## OpenAPI Spec & Custom GPT Actions
You can use this project to serve endpoints for your LLM tools. In ChatGPT, you can click `Create a GPT` and then `Create Action` to allow your GPT to call the Firecrawl API endpoints and return results in chat.

### Quickstart
Add the Firecrawl actions to your GPT by copying and pasting this import URL in the Configure Tab:
```
https://nextjs-firecrawl-starter.vercel.app/api/openapi
```
This URL is defined and can be edited in the `/api/openapi/route.ts` file.
## Tech Stack
- **Framework**: Next.js 15.1.4
- **Styling**: Tailwind CSS
- **UI Components**:
- Radix UI primitives
- Shadcn/ui components
- **State Management**: React Hook Form
- **Animations**: Framer Motion & Rombo
- **Development**: TypeScript
- **API Routes**: Firecrawl API Key & Next.js App Router
## API Routes
The application uses Next.js App Router API routes for serverless functionality:
- `/api/crawl/route.ts` - Initiates a new crawl job
- `/api/crawl/status/[id]/route.ts` - Gets the status of an ongoing crawl
- `/api/map/route.ts` - Generates site maps
- `/api/scrape/route.ts` - Handles individual page scraping
## Getting Started
1. Clone the repository
2. Install dependencies:
```bash
npm install
# or
yarn install
# or
pnpm install
```
3. Create a `.env` file with your Firecrawl API key:
```env
FIRECRAWL_API_KEY=your_api_key_here
```
4. Run the development server:
```bash
npm run dev
# or
yarn dev
# or
pnpm dev
```
5. Open [http://localhost:3000](http://localhost:3000) in your browser
## Key Dependencies
- **UI & Components**:
- @radix-ui/* - Headless UI components
- class-variance-authority - Component variants
- tailwind-merge - Tailwind class merging
- lucide-react - Icons
- next-themes - Theme management
- framer-motion - Animations
- rombo - Animations
- **Forms & Validation**:
- react-hook-form - Form handling
- @hookform/resolvers - Form validation
- zod - Schema validation
## How It Works
1. **Crawling**: Uses Firecrawl API to crawl documentation sites and generate sitemaps
2. **Processing**: Extracts content and converts it to markdown format
3. **Status Tracking**: Real-time updates on crawl progress
4. **Results**: Displays processed content ready for LLM consumption
## Credits
This project is a Next.js implementation inspired by the [devdocs-to-llm](https://github.com/alexfazio/devdocs-to-llm) Jupyter notebook by Alex Fazio. The original project demonstrated how to use Firecrawl API to crawl developer documentation and prepare it for LLM use.
The original Jupyter notebook implementation provides:
- Documentation crawling with Firecrawl API
- Content extraction and markdown conversion
- Export capabilities to Rentry.co and Google Docs
This Next.js version builds upon these capabilities by:
- Adding a modern web interface
- Implementing real-time crawl status tracking
- Providing a serverless architecture for Firecrawl API processing
- Adding dark/light theme support
- Making the tool more accessible through a user-friendly UI deployed on Vercel
- Github Actions for manual and scheduled scraping
- OpenAPI Specification for LLM Tool Calling
## License
See the [LICENSE](LICENSE) file for details.