https://github.com/outpoot/vyntr
Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com
https://github.com/outpoot/vyntr
crawler duckduckgo engine google python rust search tantivy web
Last synced: 1 day ago
JSON representation
Independent search engine. Includes web crawling, search indexing, dictionary API, and more. https://vyntr.com
- Host: GitHub
- URL: https://github.com/outpoot/vyntr
- Owner: outpoot
- License: other
- Created: 2025-03-02T16:47:19.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2025-04-21T15:00:35.000Z (6 days ago)
- Last Synced: 2025-04-21T15:35:12.487Z (6 days ago)
- Topics: crawler, duckduckgo, engine, google, python, rust, search, tantivy, web
- Language: TypeScript
- Homepage: https://vyntr.com
- Size: 8.79 MB
- Stars: 10
- Watchers: 0
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Vyntr.com - the independent search engine.
[Privacy Policy](https://vyntr.com/legal/privacy) | [Terms of Service](https://vyntr.com/legal/terms) | [License](LICENSE) | [YouTube video](https://youtu.be/-DzzCA1mGow)Vyntr is a search engine project with multiple components:
## Components
- [Genesis](genesis/README.md) - Web crawler and content analyzer
- [Pulse](pulse/README.md) - Search indexing system using Tantivy
- [Lexicon](lexicon/README.md) - WordNet-based dictionary lookup service
- [Website](website/README.md) - Frontend interface at vyntr.com## Setup
1. Create a `.env` file in the root directory:
```bash
# Database
PRIVATE_DB_URL="postgresql://postgres:your_password@serverip:port/postgres"# AWS S3/Compatible Storage
S3_ENDPOINT="https://s3.eu-central-1.amazonaws.com"
S3_REGION="eu-central-1"
S3_BUCKET="vyntr"
AWS_ACCESS_KEY_ID="your-key-id"
AWS_SECRET_ACCESS_KEY="your-secret-key"
```2. Set up the database:
```bash
cd genesis/tools/database
docker compose up -d
```3. Set up individual components:
- Genesis crawler: Follow [genesis setup](genesis/README.md)
- Lexicon service: Follow [lexicon setup](lexicon/README.md)
- Website: Follow [website setup](website/README.md)## Pipeline
1. `Genesis` crawler collects and analyzes web pages
2. Data is stored in partitioned JSONL files in `S3`
3. Content is cleaned through `dataset`.
3. Content is processed through `embedding` tools (vector), or `Pulse` (full-text).
4. Website frontend provides search interface.## Requirements
- Python with [uv](https://github.com/astral-sh/uv) package manager
- Node.js
- PostgreSQL with pgvector
- Docker
- Bun runtime (for Lexicon service)
- Rust toolchain## Dataset
The Vyntr dataset is not publicly available. For licensing inquiries, please contact [email protected].You may also use the official API provided at https://vyntr.com/api.
## LicenseThis project is licensed under the **Creative Commons Attribution-NonCommercial 4.0 International** License (**CC BY-NC 4.0)**. See the [LICENSE](LICENSE) file for details.
Individual components may have additional licensing requirements. See their respective directories for specific licensing information.
WordNet data used in Lexicon is subject to the [WordNet License](https://creativecommons.org/licenses/by/4.0/).