https://github.com/nickytonline/search-docs

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/nickytonline/search-docs
Owner: nickytonline
License: mit
Created: 2024-12-21T00:03:55.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-12-21T13:40:54.000Z (over 1 year ago)
Last Synced: 2025-06-09T11:02:23.470Z (about 1 year ago)
Language: TypeScript
Size: 107 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Documentation Crawler

A TypeScript-based web crawler specifically designed for documentation websites. It extracts content and stores it in a Turso database for further processing.

## Features

- Crawls documentation websites systematically
- Extracts meaningful content from main content areas
- Handles fragment identifiers and section titles
- Stores crawled content in a Turso database
- Includes rate limiting to be respectful to servers
- Skips irrelevant content (like "Skip to Content" links)

## Prerequisites

- Node.js (v20 or higher recommended)
- npm
- A [Turso](https://turso.tech) database account

## Installation

1. Clone the repository
2. Install dependencies:

```bash
npm install
```

3. Set up environment variables

4. Initialize the database:

```bash
npm run init-db
```

5. Run the crawler. This might take a while.

```bash
npm run crawl
```

6. At the same time, you can run the dev server to start searching the docs.

```bash
npm run dev
```

## Environment Variables

Copy `.env.example` to `.env` and configure the following variables:

| Variable | Description | Required |
|---------------------|------------------------------------------------|----------|
| `DOCS_BASE_URL` | The base URL of the website (e.g., https://some-site.com) | Yes |
| `TURSO_DATABASE_URL`| Your Turso database URL | Yes |
| `TURSO_AUTH_TOKEN` | Authentication token for Turso database access | Yes |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nickytonline/search-docs

Awesome Lists containing this project

README