{"id":47143820,"url":"https://github.com/stradichenko/ocr-to-anki","last_synced_at":"2026-05-07T17:02:29.371Z","repository":{"id":326072944,"uuid":"1102378983","full_name":"stradichenko/ocr-to-anki","owner":"stradichenko","description":"A tool to ease the production of Anki cards for language learning.","archived":false,"fork":false,"pushed_at":"2026-05-03T21:50:42.000Z","size":800,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-05-03T22:31:04.560Z","etag":null,"topics":["anki","anki-cards","anki-flashcards","ankiconnect","llm-inference","local-llm","ocr"],"latest_commit_sha":null,"homepage":"","language":"Dart","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stradichenko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-23T11:02:05.000Z","updated_at":"2026-05-03T21:50:40.000Z","dependencies_parsed_at":"2026-03-12T23:01:17.991Z","dependency_job_id":null,"html_url":"https://github.com/stradichenko/ocr-to-anki","commit_stats":null,"previous_names":["stradichenko/ocr-to-anki"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/stradichenko/ocr-to-anki","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stradichenko%2Focr-to-anki","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stradichenko%2Focr-to-anki/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stradichenko%2Focr-to-anki/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stradichenko%2Focr-to-anki/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stradichenko","download_url":"https://codeload.github.com/stradichenko/ocr-to-anki/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stradichenko%2Focr-to-anki/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32747354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-07T02:14:30.463Z","status":"ssl_error","status_checked_at":"2026-05-07T02:14:29.405Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anki","anki-cards","anki-flashcards","ankiconnect","llm-inference","local-llm","ocr"],"created_at":"2026-03-12T23:00:18.744Z","updated_at":"2026-05-07T17:02:29.326Z","avatar_url":"https://github.com/stradichenko.png","language":"Dart","funding_links":["https://www.patreon.com/8153512/join","https://github.com/sponsors/stradichenko"],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n  OCR to Anki\n\u003c/h1\u003e\n\n\u003ch3 align=\"center\"\u003e\n\n![Build Status](https://img.shields.io/github/actions/workflow/status/stradichenko/ocr-to-anki/build.yml?branch=master\u0026label=build)\n![GitHub License](https://img.shields.io/github/license/stradichenko/ocr-to-anki)\n![GitHub Release](https://img.shields.io/github/v/release/stradichenko/ocr-to-anki)\n\n\u003c/h3\u003e\n\n\u003ch4 align=\"center\"\u003e\n  Consider supporting:\u003cbr\u003e\u003cbr\u003e\n  \u003ca href=\"https://www.patreon.com/8153512/join\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Patreon-F96854?style=for-the-badge\u0026logo=patreon\u0026logoColor=white\" alt=\"Patreon\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/sponsors/stradichenko\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/sponsor-30363D?style=for-the-badge\u0026logo=GitHub-Sponsors\u0026logoColor=#EA4AAA\" alt=\"GitHub Sponsors\"\u003e\n  \u003c/a\u003e\n\u003c/h4\u003e\n\n\u003ch4 align=\"center\"\u003e\n\n[![Share on X](https://img.shields.io/badge/-Share%20on%20X-gray?style=flat\u0026logo=x)](https://x.com/intent/tweet?text=OCR%20to%20Anki!%20Extract%20vocabulary%20from%20images%20and%20create%20flashcards%20offline%20with%20local%20AI.\u0026url=https://github.com/stradichenko/ocr-to-anki\u0026hashtags=Anki,OCR,LLM,llama)\n\n\u003c/h4\u003e\n\n## About\n\nCross-platform application for extracting vocabulary from images and creating\n[Anki](https://apps.ankiweb.net/) flashcards. Everything runs locally using\n[llama.cpp](https://github.com/ggerganov/llama.cpp) and the\n[Gemma 3 4B](https://ai.google.dev/gemma/docs/gemma3) model. No cloud\ndependencies, no API keys, fully offline.\n\nSupports **Linux, macOS, Windows, and Android**.\n\nThe application is composed of two layers: a Flutter GUI that provides the user\ninterface and a Python FastAPI backend that handles vision OCR and vocabulary\nenrichment through llama.cpp.\n\n| Layer | Technology | Purpose |\n|-------|-----------|---------|\n| Flutter GUI | Dart, Material 3 | Interface (Linux, macOS, Windows, Android) |\n| Python API | FastAPI, llama.cpp | Vision OCR and text enrichment backend |\n| Vision OCR | llama-mtmd-cli | Extract text from images (GPU accelerated) |\n| Text tasks | llama-server | Definitions, examples, vocabulary enrichment |\n| Model | Gemma 3 4B QAT Q4_0 | Single model for both vision and text |\n\n\u003ch3 align=\"center\"\u003e\n  \u003ca href=\"https://github.com/stradichenko/ocr-to-anki/releases\"\u003eDownload\u003c/a\u003e\n\u003c/h3\u003e\n\n## Installation\n\n### Prerequisites\n\n**Release binaries require no pre-installed dependencies.**\nOn first launch the app will automatically download:\n\n1. **Python runtime** (~30 MB) — a portable copy, cached locally\n2. **AI model** (~3.2 GB) — Gemma 3 4B, one-time download\n\nThe only system requirement is **GTK 3** on Linux.\n\n\u003e For building from source, have [Nix](https://zero-to-nix.com/start/install)\n\u003e installed with flakes enabled.\n\n### Download a release (Android)\n\n#### 1. Download the APK\n\nGrab the latest APK (`ocr-to-anki-vX.Y.Z-android-arm64.apk`) from the\n[releases page](https://github.com/stradichenko/ocr-to-anki/releases)\nand transfer it to your phone:\n\n- **Option A — Direct download on phone:** Open the releases page in your\n  mobile browser and tap the APK file.\n- **Option B — Transfer from PC:** Download on your computer, then transfer\n  via USB, Bluetooth, or cloud storage (Google Drive, Nextcloud, etc.).\n\n#### 2. Install the APK\n\n- Open the file manager on your phone, navigate to the APK, and tap it.\n- If prompted, allow **\"Install from unknown sources\"** for your file manager\n  or browser. This is a standard Android security prompt for apps outside the\n  Play Store.\n- Tap **Install** and wait for the process to complete.\n\n\u003e adb alternative (for developers):\n\u003e\n\u003e ```bash\n\u003e adb install ocr-to-anki-v0.2.0-android-arm64.apk\n\u003e ```\n\n#### 3. First launch setup\n\nOn first run the app performs a one-time setup:\n\n1. **Extracts native binaries** — The bundled `llama-server` and\n   `llama-mtmd-cli` are copied to the app's private storage (~100 MB).\n2. **Downloads the AI model** — The Gemma 3 4B model (~2.4 GB) and vision\n   projector (~812 MB) are downloaded directly to your device.\n\n\u003e **WiFi is required for the model download** unless you disabled\n\u003e \"WiFi-only downloads\" in Settings. The download supports resume, so if\n\u003e interrupted it will continue from where it left off.\n\u003e\n\u003e **Requirements:** Android 9+ (API 28), ARM64 device, ~4 GB free storage.\n\n### Download a release (Linux)\n\nGrab the latest tarball from the\n[releases page](https://github.com/stradichenko/ocr-to-anki/releases),\nextract, and run:\n\n```bash\ntar xzf ocr-to-anki-v0.2.0-linux-x86_64.tar.gz\ncd ocr-to-anki-v0.2.0-linux-x86_64\n\n# GTK3 is required at runtime.\n# On Ubuntu/Debian: sudo apt install libgtk-3-0\n# On Fedora:        sudo dnf install gtk3\n# On NixOS:         already available\n\n./run.sh\n```\n\n\u003e **First launch — fully automatic setup**\n\u003e\n\u003e On first run the app detects what's missing and guides you through two\n\u003e one-click downloads:\n\u003e\n\u003e 1. *\"Python runtime needed — Download Python\"* (~30 MB, if Python is not\n\u003e    already installed)\n\u003e 2. *\"Model download required — Download now\"* (~3.2 GB)\n\u003e\n\u003e Both downloads are cached locally and only happen once. After that the\n\u003e app starts instantly.\n\u003e\n\u003e You can also download the model manually with the bundled script:\n\u003e\n\u003e ```bash\n\u003e ./scripts/setup-llama-cpp.sh\n\u003e ```\n\n### Build from source\n\n```bash\ngit clone https://github.com/stradichenko/ocr-to-anki.git\ncd ocr-to-anki\n\n# 1. Download the model and vision projector (~3 GB total, one time)\nnix develop\n./scripts/setup-llama-cpp.sh\n\n# 2. Build the Flutter app\nnix develop .#flutter\ncd app\nflutter pub get\nflutter build linux --release\n\n# The binary is at: app/build/linux/x64/release/bundle/ocr_to_anki\n```\n\nFor a distributable tarball that bundles the backend source:\n\n```bash\nnix develop .#flutter --command ./scripts/build-flutter.sh linux\n# Output: output/release/ocr-to-anki-v0.1.0-linux-x86_64.tar.gz\n```\n\nOr as a pure Nix derivation:\n\n```bash\nnix build .#flutter-app\n./result/bin/ocr-to-anki\n```\n\n### Build for Android\n\nRequires the Android SDK and NDK:\n\n```bash\n# 1. Install Android NDK (via Android Studio or sdkmanager)\nexport ANDROID_NDK=$HOME/Android/Sdk/ndk/27.0.11718014\n\n# 2. Build llama.cpp native binaries for Android\ncd ocr-to-anki\n./scripts/build-llama-android.sh\n\n# 3. Build the Flutter APK\ncd app\nflutter pub get\nflutter build apk --release\n\n# The APK is at: app/build/app/outputs/flutter-apk/app-release.apk\n```\n\nThe build script cross-compiles `llama-server` and `llama-mtmd-cli` for ARM64\nand bundles them as Flutter assets. On first launch the app copies them to the\ndevice's private storage and sets executable permissions.\n\nSee [docs/building.md](docs/building.md) for macOS, Windows, and advanced build\noptions.\n\n### Model files\n\n| File | Size | Source |\n|------|------|--------|\n| gemma-3-4b-it-q4_0_s.gguf | ~2.4 GB | [stduhpf/google-gemma-3-4b-it-qat-q4_0-gguf-small](https://huggingface.co/stduhpf/google-gemma-3-4b-it-qat-q4_0-gguf-small) |\n| mmproj-model-f16-4B.gguf | ~812 MB | [stduhpf/google-gemma-3-4b-it-qat-q4_0-gguf-small](https://huggingface.co/stduhpf/google-gemma-3-4b-it-qat-q4_0-gguf-small) |\n\nBoth are downloaded by `./scripts/setup-llama-cpp.sh` via direct URL. No\nauthentication required.\n\nQuantization-Aware Training (QAT) produces roughly 15% better perplexity than\nstandard post-training Q4_0 quantization at the same size. The stduhpf repack\nalso fixes broken control token metadata.\n\n## Getting Started\n\n### Workflow\n\n1. Select context: handwritten or printed text, or highlighted words (pick\n   colour)\n2. Add images:\n   - **Desktop:** drag and drop, or use the file picker\n   - **Android:** tap **Camera** to take a photo, or **Gallery** to pick\n     existing photos (multi-select supported)\n3. Vision OCR: Gemma 3 extracts words from the image\n4. Enrich: the LLM generates definitions and example sentences\n5. Review: edit the generated cards before export\n6. Export:\n   - **Desktop:** send to Anki via AnkiConnect, or save as TSV/JSON\n   - **Android:** share TSV directly to AnkiDroid via the share sheet\n\n### Starting the backend\n\nThe Flutter app manages the backend process automatically. No manual server\nmanagement is needed.\n\n- **Desktop:** spawns the Python FastAPI server on startup\n- **Android:** extracts bundled `llama-server` and `llama-mtmd-cli` binaries,\n  downloads the model on first launch, then starts `llama-server` directly\n\nIf you prefer to run the backend separately on desktop:\n\n```bash\nnix develop\nPYTHONPATH=src uvicorn src.api.app:app --host 0.0.0.0 --port 8000\n```\n\n### Configuration\n\nEdit `config/settings.yaml` to customize the backend:\n\n```yaml\nai_backend:\n  type: 'llama_cpp'\n\nllama_cpp:\n  host: '127.0.0.1'\n  port: 8090\n  context_size: 4096\n  n_gpu_layers: -1\n  mmproj_offload: false  # set true when using OpenCL backend\n```\n\nMost settings are also available through the in-app Settings screen.\n\n## Building for other platforms\n\nFlutter desktop does not support cross-compilation. Each platform must be built\non its native OS. The CI/CD workflow at `.github/workflows/build.yml` handles\nthis using platform-specific runners.\n\n| Build host | Linux | macOS | Windows | Android             |\n|------------|-------|-------|---------|---------------------|\n| Linux      | yes   | no    | no      | yes (cross-compile) |\n| macOS      | no    | yes   | no      | no                  |\n| Windows    | no    | no    | yes     | no                  |\n\n### macOS\n\nRequires a Mac with Xcode installed:\n\n```bash\nnix develop .#flutter\ncd app \u0026\u0026 flutter pub get \u0026\u0026 flutter build macos --release\n```\n\n### Windows\n\nRequires Visual Studio 2022 with the \"Desktop development with C++\" workload:\n\n```powershell\ncd app\nflutter pub get\nflutter build windows --release\n```\n\n### CI/CD\n\nPush a version tag to trigger builds for all three platforms:\n\n```bash\ngit tag v0.2.0\ngit push origin v0.2.0\n```\n\nThis creates a draft GitHub Release with Linux, macOS, Windows, and Android artifacts.\nSee [docs/building.md](docs/building.md) for the full reference.\n\n## Building llama-mtmd-cli (vision)\n\nThe vision backend requires `llama-mtmd-cli` built with GPU support:\n\n```bash\n# OpenCL (recommended for Intel integrated GPUs)\nnix develop .#sycl\n./scripts/build-llama-mtmd-opencl.sh\n\n# Vulkan (fallback, see note below)\n./scripts/build-llama-mtmd-vulkan.sh\n```\n\nAuto-detection picks the best available backend: CUDA, Metal, OpenCL, Vulkan,\nthen CPU.\n\n### Intel iGPU: OpenCL vs Vulkan\n\n| Backend | Vision encoder | Encode time | Text gen | Binary |\n|---------|---------------|-------------|----------|--------|\n| OpenCL | correct | ~2 min (GPU) | 4.1 tok/s | llama-mtmd-cli-opencl |\n| Vulkan | corrupted | 0.4s (garbage) | 3.6 tok/s | llama-mtmd-cli |\n| CPU | correct | ~43 min | 0.7 tok/s | any binary with --no-mmproj-offload |\n\nOpenCL is roughly 20x faster than CPU vision and produces correct output. It\nrequires a one-line patch for Intel work group sizes, applied automatically by\nthe build script. See\n[patches/opencl-intel-workgroup-fix.patch](patches/opencl-intel-workgroup-fix.patch).\n\n\u003cdetails\u003e\n\u003csummary\u003eVulkan corruption details\u003c/summary\u003e\n\nOn Intel integrated GPUs (for example UHD Graphics CML GT2), the Vulkan compute\nbackend produces corrupted output from the SigLIP vision encoder. Text\ngeneration works fine on Vulkan; only the vision projector is affected.\n\nRoot cause: Intel Vulkan compute shaders produce f16 underflow and overflow in\nthe CLIP/SigLIP transformer. Debug embeddings show 75%+ of values saturate to\nexactly -1.0 (clamped NaN/inf). This is a\n[known class of bug on integrated GPUs](https://github.com/ggml-org/llama.cpp/issues/15034).\n\nIf you have a discrete NVIDIA GPU, Vulkan and CUDA both work fine. Set\n`mmproj_offload: true` in `config/settings.yaml`.\n\n\u003c/details\u003e\n\n## API Endpoints\n\n```\nGET  /health                  Backend status\nGET  /backends                Detected GPU hardware\nPOST /ocr/vision              Vision OCR (base64 image)\nPOST /ocr/vision/upload       Vision OCR (file upload)\nPOST /generate                Raw text generation\nPOST /enrich                  Vocabulary enrichment (definitions + examples)\nPOST /pipeline/image-to-cards Full pipeline: image to OCR to enrich to Anki cards\n```\n\n## Android Notes\n\n### Architecture\n\nOn Android the app does **not** use the Python FastAPI backend. Instead, it\nbundles native `llama-server` and `llama-mtmd-cli` binaries compiled for ARM64.\nThe Flutter app spawns these directly and communicates with `llama-server` over\nHTTP on `localhost:8090`. Vision OCR runs `llama-mtmd-cli` as a subprocess.\n\nThis avoids the need for a Python runtime on Android while keeping all\ninference fully local and offline.\n\n### Camera and Gallery\n\nThe Android home screen shows two prominent buttons:\n\n- **Camera** — opens the system camera to take a photo for OCR\n- **Gallery** — opens the photo picker (multi-select supported on Android 13+)\n\nNo runtime permissions are needed on Android 13+; the photo picker uses the\nsystem UI. Camera access is handled automatically by the `image_picker` plugin.\n\n### Model Download\n\nThe first time you launch the Android app, it downloads the Gemma 3 4B model\n(~2.4 GB) and vision projector (~812 MB) directly to the app's private storage.\nDownloads support resume, so if interrupted they will continue from where they\nleft off.\n\n### AnkiDroid Export\n\nOn Android, the \"Export to Anki\" button in the review screen generates a TSV\nfile and opens the Android share sheet. Select **AnkiDroid** from the share\nsheet to import the cards. AnkiDroid must be installed on the device.\n\n## Project Structure\n\n```\napp/                        Flutter GUI application\n  android/                  Android platform files\n    app/src/main/\n      AndroidManifest.xml   Permissions (camera, storage, internet)\n      assets/               Bundled native llama.cpp binaries (ARM64)\n  lib/\n    main.dart               Entry point and routing\n    models/                 Data models (AnkiNote, AppSettings, HighlightColor)\n    services/               Business logic\n      inference_service.dart        LLM inference (FastAPI or native)\n      highlight_detector.dart       HSV highlight colour detection\n      anki_export_service.dart      AnkiConnect / AnkiDroid / JSON export\n      backend_server_service.dart   Python backend process lifecycle\n      llama_cpp_android_service.dart Native binary management (Android)\n      model_download_service.dart   Resume-capable model downloads (Android)\n    database/               Drift (SQLite) local storage\n    providers/              Riverpod state management\n    screens/                Home, Processing, Review, Settings, History\nsrc/                        Python backend\n  api/\n    app.py                  FastAPI endpoints and lifespan hooks\n    models.py               Pydantic request/response models\n  backends/\n    auto_detect.py          GPU and backend auto detection\n    mtmd_cli.py             llama-mtmd-cli wrapper (vision, subprocess)\n    llama_cpp_server.py     llama-server wrapper (text, persistent HTTP)\n  preprocessing/\n    highlight_cropper.py    HSV highlight detection (Python reference)\n  workflows/                End to end pipelines\n  output/                   Anki export and JSON output\nconfig/\n  settings.yaml             All configuration\nscripts/                    Build and setup scripts\n  build-flutter.sh          Build Flutter for Linux/macOS/Windows\n  build-llama-android.sh    Cross-compile llama.cpp for Android ARM64\n  bundle-backend.sh         Bundle Python backend with PyInstaller\n  setup-llama-cpp.sh        Download model and vision projector\n  build-llama-mtmd-*.sh     Build llama-mtmd-cli with various GPU backends\ndocs/\n  building.md               Full build and release documentation\n```\n\n## Nix Flake Outputs\n\n### Development shells\n\n```bash\nnix develop             # Default: Python backend development\nnix develop .#flutter   # Flutter app build and development\nnix develop .#cuda      # With CUDA toolkit\nnix develop .#sycl      # With Intel OneAPI/SYCL and OpenCL\n```\n\n### Packages\n\n```bash\nnix build .#flutter-app   # Flutter Linux desktop binary\nnix build .#backend       # Nix-wrapped Python backend\nnix build .#bundle        # Complete distribution (GUI + backend + launcher)\nnix build .#dockerImage   # Docker image for server deployment\n```\n\n## License\n\n[MIT](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstradichenko%2Focr-to-anki","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstradichenko%2Focr-to-anki","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstradichenko%2Focr-to-anki/lists"}