https://github.com/outpoot/vyntr

Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com
https://github.com/outpoot/vyntr

crawler duckduckgo engine google python rust search tantivy web

Last synced: about 1 month ago
JSON representation

Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com

Host: GitHub
URL: https://github.com/outpoot/vyntr
Owner: outpoot
License: other
Created: 2025-03-02T16:47:19.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-05-01T11:11:20.000Z (about 2 months ago)
Last Synced: 2025-05-01T12:23:46.177Z (about 2 months ago)
Topics: crawler, duckduckgo, engine, google, python, rust, search, tantivy, web
Language: TypeScript
Homepage: https://vyntr.com
Size: 8.68 MB
Stars: 255
Watchers: 0
Forks: 20
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        


Vyntr.com - the independent search engine.

[Privacy Policy](https://vyntr.com/legal/privacy) | [Terms of Service](https://vyntr.com/legal/terms) | [License](LICENSE) | [YouTube video](https://youtu.be/-DzzCA1mGow)

Vyntr is a search engine project with multiple components:

## Components

- [Genesis](genesis/README.md) - Web crawler and content analyzer

- [Pulse](pulse/README.md) - Search indexing system using Tantivy

- [Lexicon](lexicon/README.md) - WordNet-based dictionary lookup service

- [Website](website/README.md) - Frontend interface at vyntr.com

## Setup

1. Create a `.env` file in the root directory:

```bash

# Database

PRIVATE_DB_URL="postgresql://postgres:your_password@serverip:port/postgres"

# AWS S3/Compatible Storage

S3_ENDPOINT="https://s3.eu-central-1.amazonaws.com"

S3_REGION="eu-central-1"

S3_BUCKET="vyntr"

AWS_ACCESS_KEY_ID="your-key-id"

AWS_SECRET_ACCESS_KEY="your-secret-key"

```

2. Set up the database:

```bash

cd genesis/tools/database

docker compose up -d

```

3. Set up individual components:

- Genesis crawler: Follow [genesis setup](genesis/README.md)

- Lexicon service: Follow [lexicon setup](lexicon/README.md)

- Website: Follow [website setup](website/README.md)

## Pipeline

1. `Genesis` crawler collects and analyzes web pages

2. Data is stored in partitioned JSONL files in `S3`

3. Content is cleaned through `dataset`.

3. Content is processed through `embedding` tools (vector), or `Pulse` (full-text).

4. Website frontend provides search interface.

## Requirements

- Python with [uv](https://github.com/astral-sh/uv) package manager

- Node.js

- PostgreSQL with pgvector

- Docker

- Bun runtime (for Lexicon service)

- Rust toolchain

## Dataset

The Vyntr dataset is not publicly available. For licensing inquiries, please contact [email protected].

You may also use the official API provided at https://vyntr.com/api.

## License

This project is licensed under the **Creative Commons Attribution-NonCommercial 4.0 International** License (**CC BY-NC 4.0)**. See the [LICENSE](LICENSE) file for details.

Individual components may have additional licensing requirements. See their respective directories for specific licensing information.

WordNet data used in Lexicon is subject to the [WordNet License](https://creativecommons.org/licenses/by/4.0/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/outpoot/vyntr

Awesome Lists containing this project

README

Vyntr.com - the independent search engine.