https://github.com/imkrishsub/filefolio
A privacy-first document organization tool that uses local AI to automatically categorize, tag, and rename your PDF files. All processing happens on your machine with Ollama, keeping your documents completely private.
https://github.com/imkrishsub/filefolio
ai document-management fastapi local-first ocr ollama pdf privacy python sqlite
Last synced: 2 months ago
JSON representation
A privacy-first document organization tool that uses local AI to automatically categorize, tag, and rename your PDF files. All processing happens on your machine with Ollama, keeping your documents completely private.
- Host: GitHub
- URL: https://github.com/imkrishsub/filefolio
- Owner: imkrishsub
- License: mit
- Created: 2025-12-23T11:12:50.000Z (6 months ago)
- Default Branch: master
- Last Pushed: 2025-12-23T11:46:57.000Z (6 months ago)
- Last Synced: 2025-12-25T01:32:22.431Z (6 months ago)
- Topics: ai, document-management, fastapi, local-first, ocr, ollama, pdf, privacy, python, sqlite
- Language: JavaScript
- Homepage:
- Size: 264 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FileFolio
FileFolio helps privacy-conscious professionals keep large PDF collections searchable and organized using local AI. No cloud, no telemetry, all on your machine.
[](https://ko-fi.com/krishsub)

**Status:** Actively maintained, used on my own 1,000+ PDF collection. Expect breaking changes before v1.0, but I'm responsive to issues and feedback.
## Why FileFolio?
- You have hundreds of PDF bills, reports, or research papers scattered in folders.
- You care about privacy and do not want to upload them to cloud AI services.
- You still want smart search, auto-tagging, and reasonable file names.
FileFolio watches a folder, uses a local LLM via Ollama to analyze each PDF, and keeps everything searchable in one interface.
## Features
- **Automatic organization** – watches a folder and imports new PDFs, extracting text (with OCR), then generating categories and tags
- **Privacy-first** – all processing happens locally with Ollama, no cloud services, no telemetry or analytics
- **Fast retrieval** – full-text search across content and metadata, plus thumbnail previews
- **Disaster-proof** – backup and restore your entire library via ZIP
- **Multi-language support** – UI available in multiple languages
- **Dark mode** – toggle between light and dark themes
## Prerequisites
- Python 3.10+
- [Ollama](https://ollama.ai) installed locally
- Poppler (for PDF processing)
- macOS: `brew install poppler`
- Ubuntu/Debian: `apt-get install poppler-utils`
- Windows: Download from [poppler releases](https://github.com/oschwartz10612/poppler-windows/releases/)
- Tesseract (for OCR on scanned documents)
- macOS: `brew install tesseract`
- Ubuntu/Debian: `apt-get install tesseract-ocr`
- Windows: Download from [Tesseract releases](https://github.com/UB-Mannheim/tesseract/wiki)
## Quick start
1. **Clone the repository**
```bash
git clone https://github.com/imkrishsub/filefolio.git
cd filefolio
```
2. **Create and activate virtual environment**
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. **Install dependencies**
```bash
pip install -r requirements.txt
```
4. **Start Ollama** (in a separate terminal)
```bash
ollama serve
```
5. **Run the application**
```bash
python backend/main.py
```
6. **Open your browser**
Navigate to: http://127.0.0.1:8000
## Configuration
### Custom port
Set a custom port using the `PORT` environment variable:
```bash
PORT=8080 python backend/main.py
```
## Testing
```bash
pytest
```
Full API and functionality coverage including unit tests, integration tests, and frontend tests.
## Project structure
```
filefolio/
├── backend/
│ ├── main.py # FastAPI server
│ └── sync_service.py # Folder sync service
├── frontend/
│ ├── static/
│ │ ├── app.js # Frontend JavaScript
│ │ ├── style.css # Styles
│ │ └── i18n.json # Translations
│ └── templates/
│ └── index.html # Main interface
├── tests/ # Test suite
├── uploads/ # PDF storage (created on first run)
├── thumbnails/ # Document thumbnails (created on first run)
├── data/ # Database (created on first run)
├── setup.cfg # Linting and tool configuration
├── pytest.ini # Test configuration
└── requirements.txt
```
## How it works
1. **Upload** - Drag and drop a PDF file into the web interface, or sync a local folder to automatically import new files
2. **Extract** - Text is extracted from the PDF (with OCR fallback for scanned documents)
3. **Analyze** - A local LLM analyzes the content to determine category, tags, and suggest a filename
4. **Organize** - The document is saved with metadata in a local SQLite database
5. **Search** - Find documents by content, category, tags, or filename
## Tech stack
- **Backend**: FastAPI (Python)
- **Frontend**: Vanilla JavaScript
- **Database**: SQLite
- **AI/LLM**: Ollama
- **PDF Processing**: PyPDF, pdf2image, pytesseract
- **Styling**: Custom CSS
## Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue.
## License
MIT License - see [LICENSE](LICENSE) file for details.