{"id":45803458,"url":"https://github.com/eqms/knowledgeimporter","last_synced_at":"2026-02-26T13:02:11.495Z","repository":{"id":340630520,"uuid":"1166901289","full_name":"eqms/knowledgeimporter","owner":"eqms","description":"KnowledgeImporter ist eine Desktop-Applikation zum Batch-Upload von Dokumenten in LangDock Knowledge Folders. Die App konvertiert PDF, DOCX, HTML und ODT automatisch zu Markdown und lädt sie anschließend hoch.  Gebaut mit Flet (Flutter für Python), bietet sie eine native Desktop-Oberfläche mit Fortschrittsanzeige, dateibasiertem Logging etc.","archived":false,"fork":false,"pushed_at":"2026-02-25T19:06:23.000Z","size":1053,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-25T21:30:36.417Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eqms.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-25T18:24:54.000Z","updated_at":"2026-02-25T19:06:20.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/eqms/knowledgeimporter","commit_stats":null,"previous_names":["eqms/knowledgeimporter"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/eqms/knowledgeimporter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eqms%2Fknowledgeimporter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eqms%2Fknowledgeimporter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eqms%2Fknowledgeimporter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eqms%2Fknowledgeimporter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eqms","download_url":"https://codeload.github.com/eqms/knowledgeimporter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eqms%2Fknowledgeimporter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29860109,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-26T08:51:08.701Z","status":"ssl_error","status_checked_at":"2026-02-26T08:50:19.607Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-26T13:02:07.412Z","updated_at":"2026-02-26T13:02:11.484Z","avatar_url":"https://github.com/eqms.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# KnowledgeImporter\n\n\u003e **Language / Sprache**: [DE](#deutsche-dokumentation) | [EN](#english-documentation)\n\n[![Version](https://img.shields.io/badge/version-0.3.1-blue.svg)]()\n[![Python](https://img.shields.io/badge/python-3.10+-green.svg)]()\n[![License](https://img.shields.io/badge/license-MIT-green.svg)]()\n\n---\n\n## Deutsche Dokumentation\n\n### Projektübersicht\n\nKnowledgeImporter ist eine Desktop-Applikation zum Batch-Upload von Dokumenten in LangDock Knowledge Folders. Die App konvertiert PDF, DOCX, HTML und ODT automatisch zu Markdown und lädt sie anschließend hoch.\n\nGebaut mit [Flet](https://flet.dev/) (Flutter für Python), bietet sie eine native Desktop-Oberfläche mit Fortschrittsanzeige, dateibasiertem Logging und verschlüsselter API-Key-Speicherung.\n\n**Hauptfunktionen:**\n- **Batch-Upload** — Ordner auswählen und alle passenden Dateien in einen LangDock Knowledge Folder hochladen\n- **Dokument-Konvertierung** — PDF, DOCX, HTML und ODT werden automatisch zu Markdown konvertiert\n- **Replace-Modus** — optional bestehende Dateien im Zielordner vor dem erneuten Upload löschen\n- **Fortschrittsanzeige** — Echtzeit-Fortschrittsbalken mit Status pro Datei (Converting, Uploading, Success, Error)\n- **Dateibasiertes Logging** — jede Upload-Session erzeugt ein Logfile mit `[CONV]`, `[UPLOAD]`, `[OK]`, `[FAIL]` Einträgen\n- **Verschlüsselte Konfiguration** — API-Keys werden mit Fernet verschlüsselt gespeichert, Master Key im OS-Keyring\n- **Versionsanzeige** — aktuelle Version im Fenster-Titel der Applikation\n\n### Unterstützte Dateiformate\n\n| Format | Endung | Konvertierungs-Bibliothek |\n|--------|--------|--------------------------|\n| Markdown | `.md` | Nativ (keine Konvertierung) |\n| PDF | `.pdf` | [markitdown](https://github.com/microsoft/markitdown) via pdfminer.six |\n| Word | `.docx` | [markitdown](https://github.com/microsoft/markitdown) via mammoth |\n| HTML | `.html`, `.htm` | [markitdown](https://github.com/microsoft/markitdown) via BeautifulSoup4 |\n| OpenDocument | `.odt` | [odfdo](https://github.com/jdum/odfdo) via Paragraph-Extraktion |\n\n### Konvertierungs-Architektur\n\nNicht-Markdown-Dateien werden transparent innerhalb der `upload_batch()`-Methode des `UploadService` konvertiert:\n\n1. **Erkennung** — `ConversionService.needs_conversion(path)` prüft die Dateiendung gegen die Menge konvertierbarer Formate\n2. **Konvertierung** — `ConversionService.convert_file(path)` leitet an den passenden Konverter weiter:\n   - PDF/DOCX/HTML → `MarkItDown().convert()` (Microsofts markitdown-Bibliothek, optimiert für LLM Knowledge Bases)\n   - ODT → `odfdo.Document` Paragraph-Extraktion mit `get_formatted_text()`\n3. **Temp-Verzeichnis** — konvertierte `.md`-Dateien werden in ein `tempfile.mkdtemp()`-Verzeichnis geschrieben\n4. **Upload** — die konvertierte Datei wird mit dem ursprünglichen Dateinamen + `.md`-Endung hochgeladen (z.B. `report.pdf` → `report.md`)\n5. **Cleanup** — das Temp-Verzeichnis wird im `finally`-Block entfernt, was Aufräumen bei Erfolg und Fehler garantiert\n\nDie Konvertierung läuft im bestehenden `BackgroundWorker` Daemon-Thread. Progress-Callbacks nutzen den `\"converting\"`-Status, um blauen Status-Text in der UI anzuzeigen, bevor die `\"uploading\"`-Phase beginnt.\n\n#### Fehlerbehandlung\n\n- Schlägt die Konvertierung einer Datei fehl, wird sie übersprungen und als `failed` gezählt — der Batch fährt mit der nächsten Datei fort\n- `ConversionError(filename, reason)` liefert strukturiertes Error-Reporting\n- Bibliotheken werden lazy innerhalb der Konvertierungsmethoden importiert — fehlt markitdown oder odfdo, wird eine klare Fehlermeldung statt eines Import-Crashs erzeugt\n\n### Voraussetzungen\n\n- Python \u003e= 3.10\n- [UV](https://docs.astral.sh/uv/) Package Manager\n\n### Installation\n\n```bash\n# Repository klonen\ngit clone https://github.com/equitania/knowledgeimporter.git\n\n# In Projektverzeichnis wechseln\ncd knowledgeimporter\n\n# Virtuelle Umgebung erstellen und aktivieren\nuv venv \u0026\u0026 source .venv/bin/activate\n\n# Abhängigkeiten installieren\nuv pip install -e \".[dev]\"\n```\n\n### Verwendung\n\n```bash\n# Applikation starten (Produktions-Einstiegspunkt)\nknowledgeimporter\n\n# Entwicklung mit Hot-Reload\nflet run src/knowledgeimporter/main.py\n```\n\n#### Konfiguration\n\nDie Konfiguration erfolgt über die Settings-Ansicht in der App und wird in `~/.knowledgeimporter/config.json` gespeichert:\n\n| Parameter | Typ | Standard | Beschreibung |\n|-----------|-----|----------|--------------|\n| `langdock_api_key` | string | `\"\"` | LangDock API-Key (verschlüsselt gespeichert) |\n| `region` | string | `\"eu\"` | API-Region (`eu` oder `us`) |\n| `default_folder_id` | string | `\"\"` | UUID des Ziel-Knowledge-Folders |\n| `folder_name` | string | `\"\"` | Anzeigename des Ordners |\n| `file_patterns` | list | `[\"*.md\", \"*.pdf\", \"*.docx\", \"*.html\", \"*.htm\", \"*.odt\"]` | Datei-Muster für den Upload |\n| `replace_existing` | bool | `true` | Bestehende Dateien vor dem Upload löschen |\n\n### Architektur\n\n```\nsrc/knowledgeimporter/\n├── __init__.py              # Version (__version__)\n├── main.py                  # Einstiegspunkt: ft.app(target=main)\n├── app.py                   # KnowledgeImporterApp — Navigation, Config-Lifecycle\n├── models/\n│   └── config.py            # AppConfig (Pydantic) — file_patterns, API Key, Folder ID\n├── services/\n│   ├── converter.py         # ConversionService — PDF/DOCX/HTML/ODT → Markdown\n│   └── upload_service.py    # UploadService — Batch-Upload mit Konvertierungs-Integration\n├── utils/\n│   ├── storage.py           # Config-Persistenz, Fernet-Verschlüsselung, Keyring\n│   ├── upload_logger.py     # Dateibasiertes Upload-Logging mit Auto-Cleanup\n│   └── worker.py            # BackgroundWorker — Daemon-Thread mit Cancellation\n└── views/\n    ├── upload_view.py       # Upload-Ansicht — Ordner-Auswahl, Fortschritt, Log-Viewer\n    └── settings_view.py     # Einstellungen — API Key, Ordner, Muster\n```\n\n### Entwicklung\n\n```bash\n# Tests ausführen\npytest tests/ -v\n\n# Lint- und Format-Prüfung\nruff check src/ tests/ \u0026\u0026 ruff format src/ tests/ --check\n\n# Auto-Fix für Lint und Format\nruff check src/ tests/ --fix \u0026\u0026 ruff format src/ tests/\n\n# Distribution erstellen\nuv build\n```\n\n#### Code-Stil\n\n- **Formatter:** Ruff (Zeilenlänge 120)\n- **Linting:** Ruff mit Regeln E, W, F, I, B, C4, UP\n- **Target:** Python 3.10\n- **Commits:** `[ADD]`, `[CHG]`, `[FIX]` Prefix-Konvention\n\n### Abhängigkeiten\n\n| Paket | Zweck |\n|-------|-------|\n| `flet\u003e=0.80.5` | Desktop-UI-Framework (Flutter-basiert) |\n| `eq-chatbot-core\u003e=1.2.1` | LangDock API-Client (`LangDockKnowledgeManager`) |\n| `keyring\u003e=25.0.0` | OS-Keyring für Fernet Master Key |\n| `pydantic\u003e=2.10.0` | Config-Model-Validierung und Serialisierung |\n| `markitdown[pdf,docx]\u003e=0.1.5` | PDF/DOCX/HTML → Markdown Konvertierung |\n| `odfdo\u003e=3.20` | ODT → Markdown Konvertierung |\n\n### Changelog\n\n#### [0.3.1] - 25.02.2026\n- Version im Fenster-Titel der Applikation sichtbar\n- Zweisprachige README-Dokumentation (DE/EN)\n\n#### [0.3.0] - 25.02.2026\n- Dokument-Konvertierung: PDF, DOCX, HTML, ODT → Markdown\n- `ConversionService` mit markitdown und odfdo Integration\n- `\"converting\"` Status in UI und Logging (`[CONV]`)\n- Erweiterte Standard-Dateimuster: `*.md, *.pdf, *.docx, *.html, *.htm, *.odt`\n- `converted` Counter in Upload-Ergebnis und Log-Summary\n\n#### [0.2.1] - 2025\n- Log-UI durch dateibasiertes Logging ersetzt\n- App-Icon Dead Code entfernt\n- Threading-Fix, Spinner, Icon \u0026 Version Bump\n\n### Lizenz\n\nMIT — Equitania Software GmbH\n\n### Kontakt\n\n- **Unternehmen:** Equitania Software GmbH\n- **Website:** https://www.ownerp.com\n- **Repository:** https://github.com/equitania/knowledgeimporter\n\n---\n\n## English Documentation\n\n### Project Overview\n\nKnowledgeImporter is a desktop application for batch-uploading documents to LangDock Knowledge Folders. The app automatically converts PDF, DOCX, HTML, and ODT files to Markdown before uploading.\n\nBuilt with [Flet](https://flet.dev/) (Flutter for Python), it provides a native desktop UI with progress tracking, file-based logging, and encrypted API key storage.\n\n**Key Features:**\n- **Batch upload** — select a folder and upload all matching files to a LangDock Knowledge Folder\n- **Document conversion** — PDF, DOCX, HTML, and ODT files are automatically converted to Markdown\n- **Replace mode** — optionally delete existing files in the target folder before re-uploading\n- **Progress tracking** — real-time progress bar with per-file status (converting, uploading, success, error)\n- **File-based logging** — each upload session writes a timestamped log with `[CONV]`, `[UPLOAD]`, `[OK]`, `[FAIL]` entries\n- **Encrypted config** — API keys are Fernet-encrypted at rest, master key stored in OS keyring\n- **Version display** — current version shown in the application window title\n\n### Supported File Formats\n\n| Format | Extension | Conversion Library |\n|--------|-----------|-------------------|\n| Markdown | `.md` | Native (no conversion) |\n| PDF | `.pdf` | [markitdown](https://github.com/microsoft/markitdown) via pdfminer.six |\n| Word | `.docx` | [markitdown](https://github.com/microsoft/markitdown) via mammoth |\n| HTML | `.html`, `.htm` | [markitdown](https://github.com/microsoft/markitdown) via BeautifulSoup4 |\n| OpenDocument | `.odt` | [odfdo](https://github.com/jdum/odfdo) via paragraph extraction |\n\n### Conversion Architecture\n\nNon-Markdown files are transparently converted inside the `upload_batch()` method of `UploadService`:\n\n1. **Detection** — `ConversionService.needs_conversion(path)` checks the file extension against the set of convertible formats\n2. **Conversion** — `ConversionService.convert_file(path)` routes to the appropriate converter:\n   - PDF/DOCX/HTML → `MarkItDown().convert()` (Microsoft's markitdown library, optimized for LLM knowledge bases)\n   - ODT → `odfdo.Document` paragraph extraction with `get_formatted_text()`\n3. **Temp directory** — converted `.md` files are written to a `tempfile.mkdtemp()` directory\n4. **Upload** — the converted file is uploaded with the original stem + `.md` extension (e.g., `report.pdf` → `report.md`)\n5. **Cleanup** — the temp directory is removed in a `finally` block, guaranteeing cleanup on success and error\n\nThe conversion runs in the existing `BackgroundWorker` daemon thread. Progress callbacks use the `\"converting\"` status to show blue status text in the UI before the `\"uploading\"` phase begins.\n\n#### Error Handling\n\n- If a file fails conversion, it is skipped and counted as `failed` — the batch continues with the next file\n- `ConversionError(filename, reason)` provides structured error reporting\n- Libraries are lazily imported inside conversion methods — if markitdown or odfdo is missing, a clear error message is raised instead of an import crash\n\n### Prerequisites\n\n- Python \u003e= 3.10\n- [UV](https://docs.astral.sh/uv/) package manager\n\n### Installation\n\n```bash\n# Clone repository\ngit clone https://github.com/equitania/knowledgeimporter.git\n\n# Navigate to project directory\ncd knowledgeimporter\n\n# Create and activate virtual environment\nuv venv \u0026\u0026 source .venv/bin/activate\n\n# Install dependencies\nuv pip install -e \".[dev]\"\n```\n\n### Usage\n\n```bash\n# Start application (production entry point)\nknowledgeimporter\n\n# Development with hot-reload\nflet run src/knowledgeimporter/main.py\n```\n\n#### Configuration\n\nConfiguration is managed through the Settings view in the app and persisted in `~/.knowledgeimporter/config.json`:\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `langdock_api_key` | string | `\"\"` | LangDock API key (stored encrypted) |\n| `region` | string | `\"eu\"` | API region (`eu` or `us`) |\n| `default_folder_id` | string | `\"\"` | UUID of target Knowledge Folder |\n| `folder_name` | string | `\"\"` | Display name of the folder |\n| `file_patterns` | list | `[\"*.md\", \"*.pdf\", \"*.docx\", \"*.html\", \"*.htm\", \"*.odt\"]` | File patterns for upload |\n| `replace_existing` | bool | `true` | Delete existing files before upload |\n\n### Architecture\n\n```\nsrc/knowledgeimporter/\n├── __init__.py              # Version (__version__)\n├── main.py                  # Entry point: ft.app(target=main)\n├── app.py                   # KnowledgeImporterApp — navigation, config lifecycle\n├── models/\n│   └── config.py            # AppConfig (Pydantic) — file_patterns, API key, folder ID\n├── services/\n│   ├── converter.py         # ConversionService — PDF/DOCX/HTML/ODT → Markdown\n│   └── upload_service.py    # UploadService — batch upload with conversion integration\n├── utils/\n│   ├── storage.py           # Config persistence, Fernet encryption, keyring\n│   ├── upload_logger.py     # File-based upload logging with auto-cleanup\n│   └── worker.py            # BackgroundWorker — daemon thread with cancellation\n└── views/\n    ├── upload_view.py       # Upload screen — folder picker, progress, log viewer\n    └── settings_view.py     # Settings screen — API key, folder, patterns\n```\n\n### Development\n\n```bash\n# Run tests\npytest tests/ -v\n\n# Lint and format check\nruff check src/ tests/ \u0026\u0026 ruff format src/ tests/ --check\n\n# Auto-fix lint and format\nruff check src/ tests/ --fix \u0026\u0026 ruff format src/ tests/\n\n# Build distribution\nuv build\n```\n\n#### Code Style\n\n- **Formatter:** Ruff (line length 120)\n- **Linting:** Ruff with rules E, W, F, I, B, C4, UP\n- **Target:** Python 3.10\n- **Commits:** `[ADD]`, `[CHG]`, `[FIX]` prefix convention\n\n### Dependencies\n\n| Package | Purpose |\n|---------|---------|\n| `flet\u003e=0.80.5` | Desktop UI framework (Flutter-based) |\n| `eq-chatbot-core\u003e=1.2.1` | LangDock API client (`LangDockKnowledgeManager`) |\n| `keyring\u003e=25.0.0` | OS keyring for Fernet master key |\n| `pydantic\u003e=2.10.0` | Config model validation and serialization |\n| `markitdown[pdf,docx]\u003e=0.1.5` | PDF/DOCX/HTML → Markdown conversion |\n| `odfdo\u003e=3.20` | ODT → Markdown conversion |\n\n### Changelog\n\n#### [0.3.1] - 2026-02-25\n- Version displayed in application window title\n- Bilingual README documentation (DE/EN)\n\n#### [0.3.0] - 2026-02-25\n- Document conversion: PDF, DOCX, HTML, ODT → Markdown\n- `ConversionService` with markitdown and odfdo integration\n- `\"converting\"` status in UI and logging (`[CONV]`)\n- Extended default file patterns: `*.md, *.pdf, *.docx, *.html, *.htm, *.odt`\n- `converted` counter in upload result and log summary\n\n#### [0.2.1] - 2025\n- Replaced log UI with file-based logging\n- Removed app icon dead code\n- Threading fix, spinner, icon \u0026 version bump\n\n### License\n\nMIT — Equitania Software GmbH\n\n### Contact\n\n- **Company:** Equitania Software GmbH\n- **Website:** https://www.ownerp.com\n- **Repository:** https://github.com/equitania/knowledgeimporter\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feqms%2Fknowledgeimporter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feqms%2Fknowledgeimporter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feqms%2Fknowledgeimporter/lists"}