https://github.com/eja/wikilite
Offline Lexical and Semantic Wikipedia Search
https://github.com/eja/wikilite
sqlite3 wikipedia
Last synced: about 1 month ago
JSON representation
Offline Lexical and Semantic Wikipedia Search
- Host: GitHub
- URL: https://github.com/eja/wikilite
- Owner: eja
- License: gpl-3.0
- Created: 2024-12-09T11:01:38.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-02-25T09:55:23.000Z (3 months ago)
- Last Synced: 2025-03-27T06:51:19.767Z (about 2 months ago)
- Topics: sqlite3, wikipedia
- Language: Go
- Homepage:
- Size: 4.37 MB
- Stars: 8
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome - eja/wikilite - Offline Lexical and Semantic Wikipedia Search (Go)
README
# Wikilite
Wikilite is a tool that allows you to create a local SQLite database of Wikipedia articles, indexed with FTS5 for fast and efficient lexical searching and optional embeddings for semantic searching. Built with Go, Wikilite provides a command-line interface (CLI) and an optional web interface, enabling offline access, browsing, and searching of Wikipedia content.
## Features
* **Fast and Flexible Lexical Searching**: Leverages FTS5 (Full-Text Search 5) for efficient and fast keyword-based searching within the SQLite database. This is great for finding exact matches of words and phrases in your query.
* **Enhanced Semantic Search**: Integrates ANN quantization and text embeddings for powerful semantic search capabilities. This complements the FTS5 search by finding results that are semantically similar to your query, even if they lack exact keyword matches. It handles issues like misspellings, plurals/singulars, and different verb tenses.
* **Offline Access**: Access Wikipedia articles without an active internet connection.
* **Command-Line Interface (CLI)**: Search and query the database directly from your terminal.
* **Web Interface (Optional)**: Browse and search articles through a user-friendly web interface.## Getting Started
1. **Clone the repository**: `git clone https://github.com/eja/wikilite.git`
2. **Build the Wikilite binary**: `make`
3. **Import Wikipedia data**: `./wikilite --wiki-import --db `### Web Interface
1. **Start the web server**: `./wikilite --web --db `
2. **Access the web interface**: Open a web browser and navigate to `http://localhost:35248`## API Overview
Wikilite provides a comprehensive RESTful API that supports both GET and POST methods for all endpoints. The main endpoints include:
* `/api/search`: Combined search across titles, content, and vectors (if enabled)
* `/api/search/title`: Search only article titles
* `/api/search/lexical`: Search title and article content
* `/api/search/semantic`: Vector-based semantic search
* `/api/search/distance`: Search word distance against the internal vocabulary
* `/api/article`: Retrieve complete articles by IDAll search endpoints support pagination through the `limit` parameter and return results in a consistent JSON format. For detailed API documentation and examples, please refer to the [API Documentation](API.md).
## Semantic Search Details
Wikilite utilizes text embeddings to power its semantic search capabilities. This means that instead of just looking for exact keyword matches (like FTS5 does), it searches for paragraphs that have a *similar meaning* to your query. This is particularly helpful in scenarios where:
* You have typos in your search query.
* You are using different wordings to express the same concept.
* The article uses synonyms or related terms instead of the precise words you searched for.The semantic search acts as a powerful complement to FTS5, allowing you to get more relevant results even when your query doesn't match directly.
## Pre-built Databases
Pre-built databases for several languages are also available on [Hugging Face](https://huggingface.co/datasets/eja/wikilite/tree/main). You can use these databases directly with Wikilite by downloading and decompressing them.
### Installing Pre-built Databases
You can install a pre-built database by using the `--setup` option from the command line. When you run this command, a list of available databases will be shown, allowing you to select and install the desired database along with the corresponding GGUF model. Note that all databases in the "lexical" folder are full-text search only and do not support semantic search.#### Example command:
```bash
./wikilite --setup`
```## Acknowledgments
* **Wikipedia**: For providing the valuable data that powers Wikilite.
* **SQLite**: For providing the robust database engine that enables fast and efficient local data storage.
* **Ollama**: For enabling the internal generation of embeddings, enhancing the semantic search capabilities of Wikilite.