An open API service indexing awesome lists of open source software.

https://github.com/vanities/docvault

Self-hosted personal finance and document workspace — multi-entity tax records, net worth across brokers/crypto/metals/real estate, Claude-parsed PDFs, Apple Health + DNA ingest, macro + crypto quant dashboards, and AI-generated strategy notes. One container, one volume, zero telemetry.
https://github.com/vanities/docvault

anthropic apple-health bun claude-api crypto-portfolio docker document-management genomics ghcr net-worth-tracker personal-finance portfolio-tracker privacy-first quant react self-hosted tax-records typescript vite

Last synced: 5 days ago
JSON representation

Self-hosted personal finance and document workspace — multi-entity tax records, net worth across brokers/crypto/metals/real estate, Claude-parsed PDFs, Apple Health + DNA ingest, macro + crypto quant dashboards, and AI-generated strategy notes. One container, one volume, zero telemetry.

Awesome Lists containing this project

README

          

# DocVault

![DocVault — self-hosted personal finance, health, and tax workspace](./docs/docvault-hero.png)

Self-hosted personal finance and document workspace. One container holds your tax records, tracks net worth across brokers / crypto / metals / real estate, parses financial PDFs with Claude, ingests Apple Health data from an iOS Shortcut, and surfaces macro + crypto quant signals alongside AI-generated strategy notes — all without sending data off your machine.

![DocVault tax year overview](./docs/screenshots/tax-year.png)

_Screenshots in this README are captured against the included `demo-data/` fixtures — the numbers, entities, and Strategy entry are fabricated._

## Try It Locally (Demo Mode)

Want to poke around before connecting real data? The repo ships with a `demo-data/` directory and a second Vite config that points the dev server at it.

```bash
bun install
# Terminal 1 — demo backend on port 3006, reading demo-data/
DOCVAULT_DATA_DIR=./demo-data \
DOCVAULT_PORT=3006 \
DOCVAULT_PASSWORD=demo \
DOCVAULT_MASTER_KEY=$(openssl rand -base64 32) \
bun run server/index.ts
# Terminal 2 — demo frontend on port 5174, proxying /api to :3006
vp dev --config vite.demo.config.ts
```

Open and sign in with `admin` / `demo` — full app, fake data. Your `./data/` stays untouched.

## Features

### Documents & Taxes

- **Multi-entity organization** — separate spaces for personal, LLCs, property, military, etc.
- **AI document parsing** — Claude Vision extracts structured data from W-2s, 1099s, K-1s, receipts, bank statements (~$0.003/page).
- **Auto file naming** — uploads are renamed to `{Source}_{Type}_{Date}.ext`.
- **Type-specific parsers** — 15 document-type parsers (W-2, 1099-NEC, K-1, statement, receipt, 1098, Koinly, Schedule C, etc.) normalize results into a single analytics module.
- **Federal tax summary** — Schedule C, K-1, capital gains, withholdings aggregated across all entities per year.
- **Solo 401(k) calculator** (IRS Pub 560 worksheet), estimated quarterly tracker, TN state view, mileage log, sales ledger, invoice tracking.
- **CPA package export** — one click to bundle an entity/year into a ZIP for your accountant.

![Federal tax consolidation](./docs/screenshots/federal-tax.png)

Solo 401(k) contribution calculator
Estimated quarterly tax tracker

### Net Worth & Portfolio

- **Unified portfolio view** across every account — brokerage, crypto, banks, metals, real estate.
- **Automatic daily snapshots** with a historical net worth chart.
- **Broker aggregation via SnapTrade** (Fidelity, Vanguard, Robinhood, etc.) with per-account history.
- **Crypto exchange balances** (Kraken, Coinbase, Gemini) + **Etherscan wallet scanning** across mainnet, Arbitrum, Optimism, Polygon, and Avalanche.
- **Precious metals** with live spot prices.
- **Real estate** with cost basis, equity, and property-level notes.
- **Bank balances + transactions** via SimpleFIN Bridge (16,000+ US institutions).

![Portfolio overview](./docs/screenshots/portfolio.png)

Crypto holdings
Brokerage aggregation

Bank balances and history
Precious metals tracker

Real estate with equity and mortgage amortization

### Quant Dashboards & Strategy

- **Quant section** with 28 endpoints powering dashboards for crypto (BTC risk, hash ribbons, drawdown), macro (Sahm rule, yield curve, NFCI, fed stance, recession probability), housing, GDP & growth, commodities, VIX term structure, global markets, and an auto-generated macro event calendar.
- **Strategy history** — AI-generated investment strategy notes authored by Claude Code via a `/strategy` skill that reads your portfolio + current quant signals and saves a regime-aware recommendation. Entries render as expandable cards with a signal grid and full markdown analysis (tables, allocations, action plans). Dollar amounts and percentages in the markdown are obscured when "Blur financial numbers" is on.

Crypto quant dashboard — BTC risk metric
Macro quant dashboard — Fed policy and business cycle

AI-generated strategy card

### Politics & Congressional Trading

In-house ingest of congressional and executive-branch disclosures — no external service, and **no API key for the trading data** (it's all public government filings).

- **Politician stock trades** — House & Senate Periodic Transaction Reports plus Trump's OGE-278-T disclosures, parsed into one normalized feed. Scanned / hand-filed PTRs that `pdftotext` can't read are recovered by **checkbox-form OCR** (gridline detection + per-cell pixel reading; poppler + tesseract are baked into the Docker image).
- **Options contract detail** — strike, expiry, call/put, and contract count are pulled from the filing's free-text `DESCRIPTION` field (e.g. Pelosi's _"Purchased 20 call options, strike $150, exp 1/15/27"_), not just the underlying ticker.
- **Consensus clustering** — surfaces when several members buy (or sell) the same ticker in the same direction within a window.
- **Copy-trade backtest** — a performance leaderboard: _"if you'd mirrored each politician's stock buys at the disclosed size, where would you be now?"_ Stock buys get real P&L (exact share counts when the filer states them, else estimated from the amount range and flagged); options report the underlying's move (the contract isn't priced). Recomputed daily; prices via yahoo-finance2 (keyless).
- **Filings archive** — every fetched PDF + extracted text + metadata is saved under the data dir, searchable and re-parseable without re-fetching.
- **Full-screen browse** — click a dashboard metric to search/filter/see-all of trades, bills, executive actions, or archived filings.
- **Bills & executive actions** — recent Congress.gov bills (the one piece needing a free [Congress.gov API key](https://api.congress.gov/sign-up/), set in Settings) + presidential executive actions (keyless, via the Federal Register).
- **Self-hosted member headshots** — portraits downloaded once and served from DocVault, not hot-linked.

**Populate it:** the feed refreshes daily on a forward-only schedule (new filings only). To pull the current year's history in one pass, `POST /api/politics/backfill` — it runs server-side (poll `/api/politics/feed` for progress). Everything except the Bills stream works with zero credentials.

> ⚠️ Disclosures carry a ~45-day legal reporting lag and report dollar **ranges**, not exact sizes — the backtest is honest about both (estimates flagged, options labeled as the underlying's move).

### AI Chat

Heavily inspired by [t3.chat](https://t3.chat) — a multi-thread Claude chat that can read across your entire vault. The sidebar lists every thread; the active conversation streams in the main panel with markdown rendering, image/PDF attachments, and tool calls shown as collapsible cards.

- **Claude OAuth subscription token** — paste the token from `claude setup-token` and chats are billed to your Claude.ai subscription instead of the API. Falls back to an API key if you'd rather pay per-token.
- **Voice input via Parakeet (or any OpenAI-compatible transcription service)** — point Settings → Chat & Voice at a `/audio/transcriptions` endpoint such as [parakeet-mlx](https://github.com/senstella/parakeet-mlx), faster-whisper-server, or lightning-whisper-mlx running on your LAN. Push-to-talk in the composer; audio never leaves your network.
- **Multi-thread sidebar** — threads persist locally; switch, rename-by-derivation, delete, or start fresh without losing context.
- **Tool-using agent** — Claude can list entities, read files, search documents, compute tax summaries, and tag/note files. Each tool call renders as an expandable card in the response.

![AI Chat — multi-thread conversation with tool calls and markdown rendering](./docs/screenshots/chat.png)

### Health (Apple Health)

- **iOS Shortcut daily sync** — a one-tap shortcut pushes HealthKit data to DocVault's `/api/health/ingest` endpoint.
- **Multi-person support** — each member of the household gets their own snapshot.
- **Overview + per-segment dashboards** — activity, heart, sleep, workouts, body composition.
- **Automatic illness detection** — rolling baseline over wrist temperature, heart rate, and HRV flags probable illness windows.
- Running ROI (vs BTC/SPX), workout segment insights, sleep quality scoring, recovery scoring.

![Health overview](./docs/screenshots/health.png)

Activity — steps, energy, exercise, recovery score
Heart — resting HR, HRV, recovery

Sleep — stages, quality score, duration
Workouts — counts, distance, streak

Body composition — weight trend, BMI

### Backup, Sync, Observability

- **Encrypted backup/restore** — AES-256-GCM zip of all config + parsed data, downloadable on demand.
- **Scheduled auto-backup** runs before each Dropbox sync.
- **Dropbox sync** — rclone-based one-way push of every entity folder on a configurable schedule (default 15 min).
- **Custom sync paths** — drop `.docvault-dropbox-map.json` to map entities to specific Dropbox folders.
- **Portfolio snapshot scheduler** + **Dropbox sync scheduler** configurable from Settings.
- **System Status panel** (Settings → System Status) — scheduler timers, next-run times, last error, and a live log viewer.

### Privacy

- **Blur financial numbers toggle** (Settings → Preferences) — obscures every dollar amount and percentage across the UI, including AI-generated markdown in the Strategy section.
- **Your data never leaves your machine** — no telemetry, no analytics, no remote database. One volume mount, one container.

### Other

- Mileage log with address autocomplete.
- Filing deadline reminders with recurring support.
- Username/password auth with session cookies.
- Docker-ready: single container, auto-published to GHCR (amd64 + arm64).

## Quick Start

```bash
bun install
bun start
```

Frontend: `http://localhost:5173` — Backend: `http://localhost:3005`

### Storage Setup

```bash
mkdir -p data/personal data/my-llc data/property
# or symlink existing folders
ln -s ~/Documents/taxes data/personal
```

## Docker

```bash
docker run -p 3005:3005 \
-v /path/to/documents:/data \
-e ANTHROPIC_API_KEY=sk-ant-... \
-e DOCVAULT_PASSWORD=yourpassword \
ghcr.io/vanities/docvault:latest
```

### Docker Compose

```yaml
services:
docvault:
image: ghcr.io/vanities/docvault:latest
ports:
- '3005:3005'
volumes:
- /path/to/documents:/data
environment:
- ANTHROPIC_API_KEY= # Required for AI parsing
- DOCVAULT_USERNAME=admin # Default: admin
- DOCVAULT_PASSWORD= # Required
restart: unless-stopped
```

## Environment Variables

| Variable | Required | Description |
| -------------------------------- | -------------- | ------------------------------------------------------------------------- |
| `ANTHROPIC_API_KEY` | For AI parsing | Claude Vision API key |
| `DOCVAULT_USERNAME` | No | Login username (default: `admin`) |
| `DOCVAULT_PASSWORD` | Yes | Login password; server startup fails closed without this unless opted out |
| `DOCVAULT_ALLOW_UNAUTHENTICATED` | No | Explicit local/demo-only opt-out (`true`, `1`, or `yes`) |
| `DOCVAULT_DATA_DIR` | No | Data directory path (default: `./data`) |
| `DOCVAULT_PORT` | No | Backend port (default: `3005`) |

All integrations (SimpleFIN, SnapTrade, Etherscan, Kraken, Coinbase, Gemini, Dropbox) are configured through Settings and stored in `data/.docvault-settings.json`.

## Data Files

Everything lives in `DOCVAULT_DATA_DIR` as `.docvault-*.json` — mount one volume and the install is portable:

| File | Purpose |
| -------------------------------------- | --------------------------------- |
| `.docvault-config.json` | Entity definitions |
| `.docvault-settings.json` | API keys and integration config |
| `.docvault-parsed.json` | Cached AI parse results |
| `.docvault-metadata.json` | Document tags and notes |
| `.docvault-reminders.json` | Filing deadline reminders |
| `.docvault-portfolio-snapshots-*.json` | Yearly portfolio snapshot history |
| `.docvault-broker-cache.json` | Cached brokerage aggregation |
| `.docvault-crypto-cache.json` | Cached crypto balances |
| `.docvault-simplefin-cache.json` | Cached bank balances |
| `.docvault-gold.json` | Precious metals holdings |
| `.docvault-property.json` | Real estate portfolio |
| `.docvault-strategy-history.json` | Saved Strategy entries |
| `.docvault-health.json` | Apple Health ingested data |
| `.docvault-sync-status.json` | Dropbox sync status |

## Tech Stack

| Layer | Technology |
| -------- | -------------------------------------------- |
| Frontend | React 19 + TypeScript + Tailwind CSS (Vite+) |
| Backend | Bun native server (`Bun.serve()`) |
| Storage | Local filesystem |
| AI | Anthropic Claude Vision API |
| Health | iOS Shortcuts + HealthKit ingest |
| CI/CD | GitHub Actions → GHCR |

## License

[GNU General Public License v3.0](LICENSE)