https://github.com/fless-lab/xfinder
A high-performance desktop search application (AI Powered)
https://github.com/fless-lab/xfinder
Last synced: 4 months ago
JSON representation
A high-performance desktop search application (AI Powered)
- Host: GitHub
- URL: https://github.com/fless-lab/xfinder
- Owner: fless-lab
- Created: 2025-11-13T06:41:48.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2025-11-13T07:04:53.000Z (7 months ago)
- Last Synced: 2025-11-13T09:07:16.810Z (7 months ago)
- Language: Rust
- Size: 225 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# xfinder
**Advanced file search and retrieval system for Windows (will be extended to other OS) administrative environments**
## Overview
xfinder is a high-performance desktop search application designed for administrative users who need to locate files and information quickly across large document repositories. Built with Rust and native UI technologies, it provides enterprise-grade search capabilities in a lightweight package.
## Key Features
- **Fast Indexing**: Full-text search engine powered by Tantivy with sub-100ms query response time
- **Real-time Monitoring**: Automatic file system watching and index updates
- **Semantic Search**: AI-powered search understanding natural language queries
- **Email Integration**: Unified search across Outlook PST files, Thunderbird MBOX, and IMAP accounts
- **OCR Support**: Automatic text extraction from scanned PDFs and images (Tesseract 5)
- **Conversational Interface**: "Assist Me" mode providing contextual answers with verifiable sources
---
## Core Capabilities
### File Search
- Instant filename search with sub-100ms response for 100k+ files
- Fuzzy matching algorithm for typo-tolerant queries
- Advanced filtering by extension, date, size, and directory
- Global keyboard shortcut access (Ctrl+Shift+F)
### Content Indexing
- Full-text search across document contents (SQLite FTS5)
- Automatic detection and indexing of scanned PDFs
- OCR text extraction from images (JPEG, PNG, TIFF)
- Configurable by directory and file type
- Multi-language support (French and English priority)
### Semantic Search
- Natural language query understanding
- Vector-based similarity search using compact embeddings (LEANN)
- Conversational "Assist Me" mode with source attribution
- 97% smaller index size compared to traditional vector databases
### Email Search
- Outlook PST/MAPI integration
- Thunderbird MBOX parsing
- IMAP and Exchange server support
- Attachment indexing and search
### Real-time Updates
- File system monitoring via watchdog
- Automatic index updates on file creation, modification, and deletion
- Intelligent handling of file moves and renames
- Scheduled indexing with configurable intervals
---
## Technology Stack
| Component | Technology | Rationale |
|-----------|------------|-----------|
| **Language** | Rust | Memory safety, performance, concurrency |
| **UI Framework** | egui | Native, lightweight, GPU-accelerated |
| **Windowing** | winit | Cross-platform window management |
| **Rendering** | wgpu | Hardware-accelerated graphics |
| **Search Engine** | Tantivy | Lucene-like full-text search in Rust |
| **Database** | SQLite with FTS5 | Embedded, ACID-compliant, full-text capable |
| **Embeddings** | all-MiniLM-L6-v2 | Compact (80MB), multilingual, 384 dimensions |
| **Vector Database** | LEANN | Ultra-compact indices (97% size reduction) |
| **OCR** | Tesseract 5 | Industry standard, offline, multi-language |
| **File Monitoring** | notify-rs | Cross-platform filesystem events |
| **Email Parsing** | mailparse, libpff | PST and MBOX format support |
**Binary Size**: ~8MB base + 110MB (OCR + ML models) = 118MB total
---
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ UI Layer (egui) │
│ Search Interface | Configuration | Assist Me Mode │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Core Application (Rust) │
│ │
│ File System Watchdog → Indexer → Content Extractor │
│ Search Engine: Tantivy + SQLite FTS5 + LEANN │
│ Email Parser: PST/MBOX/IMAP │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Storage Layer │
│ tantivy_index/ | metadata.db (SQLite) | vectors.leann │
└─────────────────────────────────────────────────────────┘
```
---
## Getting Started
### For Developers
```bash
# Prerequisites
rustc >= 1.70
cargo >= 1.70
# Clone and build
git clone https://github.com/fless-lab/xfinder.git
cd xfinder
cargo build --release
# Run tests
cargo test
# Launch application
cargo run
```
### For End Users (Future)
```bash
# Installation
Download xfinder-setup.msi from releases
Run installer and follow prompts
# First Use
1. Launch xfinder
2. Select directories to monitor
3. Start indexing
4. Search using Ctrl+Shift+F
```
---
## Performance Targets
| Metric | Target | Measurement |
|--------|--------|-------------|
| Search query (100k files) | <100ms | P95 latency |
| Indexing throughput | >1000 files/min | Average on SSD |
| OCR processing (A4 page) | <5s | PaddleOCR/Tesseract standard quality |
| Semantic search | <3s | Including embedding generation |
| Index size overhead | <5% of corpus | Metadata + vectors |
| Memory footprint (idle) | <100MB | Application only |
| Cold start time | <500ms | To main window display |
---
## Design Decisions
### Language Priority
Multi-language support with French and English as primary targets. OCR and semantic search models selected for optimal French performance.
### Vector Database
LEANN selected for 97% index size reduction compared to FAISS. Proof-of-concept validation scheduled for Week 13-14.
### Email Parsing Strategy
- Primary: Windows MAPI API (requires Outlook installation)
- Fallback: libpff library for direct PST parsing
- Thunderbird: mailparse crate for MBOX files
### Network Drives
UNC path monitoring (`\\Server\Share`) supported via same watchdog mechanism as local drives.
### GPU Acceleration
Optional CUDA support for embedding generation provides 10x speed improvement at cost of 500MB additional dependencies. Disabled by default.
---
## Contributing
Project currently in active development. Contributions welcome after Phase 1 MVP completion.
## License
This software is licensed under a **Custom Non-Commercial License**.
### Permissions
- ✅ Free personal and non-commercial use
- ✅ Modification and distribution for non-commercial purposes
- ✅ Source code access and study
### Restrictions
- ❌ Commercial use prohibited without explicit written permission
- ❌ Sale or sublicensing of the software or derivative works
- ❌ Use in commercial products or paid services (SaaS)
For the complete license terms, see [LICENSE](LICENSE).
### Commercial Licensing
For commercial use inquiries, please contact:
**achilleatarmla@gmail.com**
## Project Status
**Current Phase**: Phase 1 - Core Search Implementation (Week 1)
**Last Updated**: 2025-11-12
**Version**: 0.1.0-alpha
---
Built with Rust for performance, security, and reliability.