https://github.com/shubhankarreddy/markitdown-gui
Windows desktop workflow for Microsoft MarkItDown: batch convert files, folders, and URLs into reusable Markdown.
https://github.com/shubhankarreddy/markitdown-gui
desktop-app document-conversion markdown markitdown pyqt6 windows
Last synced: 14 days ago
JSON representation
Windows desktop workflow for Microsoft MarkItDown: batch convert files, folders, and URLs into reusable Markdown.
- Host: GitHub
- URL: https://github.com/shubhankarreddy/markitdown-gui
- Owner: shubhankarreddy
- License: mit
- Created: 2026-04-25T10:18:17.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-25T10:39:08.000Z (about 2 months ago)
- Last Synced: 2026-04-25T12:21:03.988Z (about 2 months ago)
- Topics: desktop-app, document-conversion, markdown, markitdown, pyqt6, windows
- Language: Python
- Size: 417 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Roadmap: ROADMAP.md
Awesome Lists containing this project
README
# MarkItDown GUI
[](LICENSE)
[](#quick-start)
[](#tech-stack)
A practical Windows desktop workflow for [Microsoft MarkItDown](https://github.com/microsoft/markitdown). Drop in documents, folders, or URLs, batch-convert them locally, preview the Markdown, and keep every successful conversion in one predictable output folder.
Built for people preparing documents for AI: convert once, clean once, and reuse compact Markdown instead of repeatedly uploading raw PDFs, Office files, or exported reports.


## Why This Exists
`markitdown` is powerful, but real desktop work often starts with messy piles of PDFs, Office files, exported reports, web pages, and folders. The CLI is great for automation; this app focuses on the human workflow around it:
- Collect many sources into one visible queue.
- Convert locally without uploading private files by default.
- Preview rendered Markdown before reusing it.
- Auto-save clean `.md` files into a known folder.
- Retry failed items without rebuilding the whole batch.
- Open the source, saved file, or saved folder in one click.
The goal is simple: make document-to-Markdown conversion usable for research, docs, notes, GitHub wikis, knowledge bases, and AI/RAG preparation.
## Save Tokens Before AI Uploads
Uploading a raw PDF, DOCX, PPTX, spreadsheet, or exported report directly into an AI tool is convenient, but it can be wasteful when you need to ask multiple questions about the same material. Each upload may force the model or platform to re-process layout, repeated headers, footers, page numbers, tables, metadata, and irrelevant sections before it reaches the actual content you care about.
MarkItDown GUI gives you a token-aware preparation step:
- Convert source files into plain Markdown once.
- Preview what was extracted before sending anything to an AI tool.
- Remove irrelevant pages, boilerplate, duplicate sections, or noisy extraction artifacts.
- Save the cleaned Markdown and reuse it across prompts, chats, docs, repositories, or RAG pipelines.
- Send only the specific Markdown section needed for the task instead of the whole raw file every time.
| Workflow | Direct raw upload | MarkItDown GUI first |
| --- | --- | --- |
| Input sent to AI | Whole PDF, DOCX, PPTX, spreadsheet, or export | Clean Markdown text you can inspect first |
| Repeated questions | Often re-upload or re-process the same file | Convert once, reuse the saved `.md` file |
| Token usage | Can include layout noise, headers, footers, metadata, duplicate text, and irrelevant pages | Can be reduced by trimming the Markdown to only the useful sections |
| Human control | Hard to see exactly what the model received | Preview, edit, copy, or save the exact text before using it |
| Privacy posture | Raw document may go directly to the AI platform | Local conversion by default; optional AI enrichment stays user-controlled |
| Best fit | One-off questions where convenience matters most | Research, docs, RAG prep, repeated prompts, and reusable knowledge bases |
The exact savings depend on the document and the AI platform, but the workflow is intentionally designed to reduce repeated file uploads, context-window waste, and token spend by turning messy source material into inspectable, reusable text.
## Features
- Drag and drop files or folders.
- Add URLs to the same batch queue.
- Background folder scanning for supported files.
- Token-aware Markdown prep for AI prompts, chat workflows, and RAG ingestion.
- Compact queue with status badges for ready, converting, saved, and failed items.
- Markdown raw view, rendered preview, and conversion details.
- Automatic save to a chosen output folder.
- Retry failed items and clear saved items.
- Optional OpenAI description enrichment for image-heavy documents.
- Dark Windows desktop interface with app icon.
- PyInstaller build and optional Inno Setup installer flow.
## Quick Start
### Download For Windows
Download the latest portable Windows build from [GitHub Releases](https://github.com/shubhankarreddy/markitdown-gui/releases/latest).
Unzip it, then run:
```text
MarkItDown.exe
```
The app is currently unsigned, so Windows SmartScreen may warn on first launch. That is expected for an early open-source build without code signing.
### Run From Source
Requirements:
- Windows 10 or later
- Python 3.11+
```powershell
git clone https://github.com/shubhankarreddy/markitdown-gui.git
cd markitdown-gui
.\setup.bat
.\venv\Scripts\python.exe .\markitdown_app.py
```
### Build The Windows App
```powershell
.\build.bat
```
The app is built to:
```text
dist\MarkItDown\MarkItDown.exe
```
If Inno Setup 6 is installed, `build.bat` also creates an installer under `installer_output\`. Build artifacts are intentionally ignored by Git so the repository stays small.
## Supported Inputs
The app accepts files, folders, and `http`/`https` URLs. Folder drops are scanned for common document and media formats supported by MarkItDown, including:
- PDF, Word, PowerPoint, Excel, Outlook, text, CSV, JSON, XML, HTML, YAML, Markdown
- PNG, JPG, GIF, BMP
- MP3, WAV, MP4
- Jupyter notebooks and ZIP archives
Conversion quality still depends on the source file and MarkItDown's upstream parser support. The app keeps failures visible so they can be retried, inspected, or reported.
## Privacy
By default, files are converted locally on your machine. If you enable OpenAI description enrichment, document content needed for that enrichment may be sent to the configured OpenAI model. Keep that option off for private or sensitive documents unless you are comfortable with that workflow.
## Project Status
The current feature-complete app is the Python/PyQt6 Windows client in `markitdown_app.py`.
There is also an experimental WPF/native Windows prototype under `Native/`. It is useful for future direction, but it is not feature-parity with the PyQt app yet.
## Tech Stack
- Python
- PyQt6
- Microsoft MarkItDown
- PyInstaller
- Inno Setup, optional
## Repository Map
- `markitdown_app.py` - main desktop app
- `requirements.txt` - pinned runtime dependencies and MarkItDown extras
- `build.bat` - Windows build script
- `MarkItDown.spec` - PyInstaller app bundle configuration
- `assets/` - app icon assets
- `docs/screenshots/` - README screenshots
- `Native/` - experimental WPF/native client
- `DESIGN_BRIEF.md` - product and UX direction
- `ROADMAP.md` - planned improvements
- `CONTRIBUTING.md` - contributor guide
## Contributing
Contributions are welcome. The best first issues are usually practical workflow improvements: clearer error messages, better parser recovery, more robust preview behavior, installer polish, and accessibility fixes.
Before opening a pull request:
```powershell
.\venv\Scripts\python.exe -m py_compile .\markitdown_app.py .\Native\MarkItDown.Native\Backend\markitdown_backend.py
```
Please read [CONTRIBUTING.md](CONTRIBUTING.md) and keep changes focused. This project values practical improvements over cosmetic churn.
## Roadmap
Near-term priorities:
- Publish signed release installers.
- Improve automated smoke tests for common file types.
- Add richer conversion diagnostics for missing upstream dependencies.
- Continue the native WPF track once the PyQt workflow is stable.
See [ROADMAP.md](ROADMAP.md) for more detail.
## License
MIT. See [LICENSE](LICENSE).