https://github.com/ma08/omnishot
Agent-native screenshots for local and remote workspaces
https://github.com/ma08/omnishot
agents ai macos ocr s3 screenshots
Last synced: 8 days ago
JSON representation
Agent-native screenshots for local and remote workspaces
- Host: GitHub
- URL: https://github.com/ma08/omnishot
- Owner: ma08
- License: mit
- Created: 2026-05-05T23:40:27.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-07T13:56:36.000Z (about 2 months ago)
- Last Synced: 2026-05-07T14:42:09.595Z (about 2 months ago)
- Topics: agents, ai, macos, ocr, s3, screenshots
- Language: Python
- Size: 23.6 MB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# Omnishot
Auto-renames and organizes macOS screenshots with local Apple models, then makes
the latest capture easy to paste anywhere: raw image bytes, path refs for remote
agents, signed S3 URLs, or public links, all behind simple keybindings.
Read the launch article on X/Twitter: [Making Screenshots Agent-Native in Remote Workspaces](https://x.com/curious_queue/status/2051832335973364102?s=20).
See the demo clips in the walkthrough thread on X/Twitter: [Omnishot screenshot routing demos](https://x.com/curious_queue/status/2052106783590961660?s=20).
[](https://x.com/curious_queue/status/2051832335973364102?s=20)
This is a small macOS utility that turns a normal screenshot into a named,
routeable artifact:
- semantic local filename from Apple Vision OCR + Apple Foundation Models
- compact `screenshot-info` payload for remote agents over SSH
- direct image paste for chat apps, docs, Finder, Cursor, and VS Code
- S3/public links when a URL is the right transport
It is public as a reference implementation, not a polished product. The useful
parts to copy are the workflow shape, paste-mode contract, machine alias
convention, and agent instructions.
## Ask Your Agent To Adapt This Repo
Paste this into your coding agent before deciding what to copy:
```text
Read through https://github.com/ma08/omnishot and help me adapt
the screenshot workflow to my own machine setup.
Focus on reusable patterns, not copying the repo author's machine-specific config.
Ask me targeted questions about:
- where my screenshots land
- whether I use local or remote coding agents
- my SSH aliases between machines
- whether I need S3/public links or only local SSH transfer
- what apps I paste screenshots into most often
Then recommend the smallest useful version I should implement.
```
## Workflow At A Glance
```mermaid
flowchart TD
A["macOS screenshot
Cmd+Shift+3/4/5"] --> B["Detect + batch
watcher.py"]
B --> C["Understand locally
Vision OCR + Apple Foundation Models"]
C --> D["Semantic rename
timestamp + content slug"]
D --> E["Persist + upload
SQLite history + S3 link"]
E --> F["Ready to route
clipboard + menubar shortcuts"]
F --> G["Cmd+V / Cmd+Option+V
path-ref"]
F --> H["Cmd+Shift+Option+V
image paste"]
F --> I["Cmd+Control+Option+V
S3 URL"]
F --> N["Cmd+Control+Shift+Option+V
public link"]
G --> J["screenshot-info
machine + path"]
J --> K["Remote agents
scp into task artifacts"]
H --> L["Local apps
chat/docs/Finder/editor"]
I --> M["Web/social sharing
URLs when needed"]
N --> M
```
The core abstraction is:
```text
screenshot -> named artifact -> route to the current surface
```
S3 is one transport, not the whole point.
## Visual Walkthrough
The menu bar keeps the latest screenshot actions inspectable even when keyboard
shortcuts are faster.
The local naming and upload pipeline is observable, so failures are not hidden
inside a background watcher.
## Why It Exists
Taking screenshots is easy. Getting a screenshot into the right agent, on the
right machine, with a useful filename and durable task context, is still awkward.
Before:
```text
Screenshot 2026-02-23 at 9.43.31 PM.png
```
After:
```text
2026-02-23_21h43m40s_PST_cursor-settings-heavy-memory.png
```
Default agent payload:
```text
screenshot-info:
machine: work-mac
path: /Users/alex/Pictures/Screenshots/2026-05-05_08h36m35s_PDT_dashboard-error-state.png
```
A remote agent can then copy the image into its task folder:
```bash
scp 'work-mac:/Users/alex/Pictures/Screenshots/example.png' user_inputs/input_artifacts/
```
## Quick Start
```bash
git clone https://github.com/ma08/omnishot.git
cd omnishot
uv sync
# Required for uploads and URL paste modes
export OMNISHOT_BUCKET=your-bucket-name
# Optional, recommended on supported macOS versions
./scripts/build-swift.sh
# Recommended runtime
uv run omnishot menubar
```
## Paste Modes
| Shortcut | Payload |
|----------|---------|
| `Cmd+V` | configured default after capture, `path-ref` by default |
| `Cmd+Option+V` | latest path reference (`screenshot-info`) |
| `Cmd+Shift+Option+V` | latest image directly |
| `Cmd+Control+Option+V` | latest S3 URL |
| `Cmd+Control+Shift+Option+V` | latest public link |
Configure the machine token with:
```bash
export BOT_MACHINE_SSH_ALIAS=work-mac
```
or pass `--ssh-host-hint work-mac`.
## Agent Instructions
The compact payload works best when your global agent instructions teach agents
what to do with it:
- treat `machine` as an SSH alias
- copy the referenced image into the active task's `user_inputs/input_artifacts/`
- index the artifact before relying on it
Reference implementation: [botfiles PR #23](https://github.com/ma08/botfiles/pull/23)
adds this behavior to global Codex/Claude instructions.
## Demo Clips
The full walkthrough thread is on X: [Omnishot screenshot routing demos](https://x.com/curious_queue/status/2052106783590961660?s=20).
GitHub strips inline MP4 players from README Markdown, so each card uses a
short animated preview and links to a browser-playable hosted clip.
Capture + remote retrieval
Mac screenshot -> path-ref paste -> VM copies image into task artifacts.
Watch clip
Paste routes montage
Path-ref, signed S3 URL, public URL, picker access, and link checks.
Watch clip
Menu bar controls
Latest image, path-ref, S3 URL, and public link actions.
Watch clip
Langfuse trace
Pipeline observability for screenshot naming and upload.
Watch clip
## Docs
- [Usage](docs/usage.md) - setup, modes, shortcuts, launchd, CLI options
- [Agent instructions](docs/agent-instructions.md) - reusable `screenshot-info`
handling for Codex/Claude-style agents
- [Architecture](docs/architecture.md) - pipeline diagram and data flow
- [Tech stack](docs/tech-stack.md) - exact implementation details and citations
- [Troubleshooting](docs/troubleshooting.md) - permissions, S3, clipboard, prompt evals
- [Contributing](CONTRIBUTING.md) - local checks and PR expectations
## Architecture At A Glance
```mermaid
flowchart TD
A["macOS screenshot folder
~/Pictures/Screenshots"] --> B["watcher.py
detect new PNGs + batch monitors"]
B --> C["DescribeImage.swift
Vision OCR + Apple Foundation Models"]
C --> D["enrich.py
semantic filename + fallback logic"]
D --> E["Renamed local PNG"]
E --> F["history.py
SQLite recent captures"]
E --> G["upload.py
S3 object + presigned/public links"]
F --> H["menubar.py
history UI + shortcuts"]
G --> H
H --> I["paste.py
path-ref text"]
H --> J["AppKit pasteboard
image/file paste"]
H --> K["S3 URL/public link"]
I --> L["Remote coding agents
scp into task artifacts"]
J --> M["Local app surfaces
chat/docs/Finder/editor"]
K --> N["Web/social surfaces
shareable URL"]
```
## License
MIT