An open API service indexing awesome lists of open source software.

https://github.com/ma08/omnishot

Agent-native screenshots for local and remote workspaces
https://github.com/ma08/omnishot

agents ai macos ocr s3 screenshots

Last synced: 8 days ago
JSON representation

Agent-native screenshots for local and remote workspaces

Awesome Lists containing this project

README

          

# Omnishot

Auto-renames and organizes macOS screenshots with local Apple models, then makes
the latest capture easy to paste anywhere: raw image bytes, path refs for remote
agents, signed S3 URLs, or public links, all behind simple keybindings.

Read the launch article on X/Twitter: [Making Screenshots Agent-Native in Remote Workspaces](https://x.com/curious_queue/status/2051832335973364102?s=20).

See the demo clips in the walkthrough thread on X/Twitter: [Omnishot screenshot routing demos](https://x.com/curious_queue/status/2052106783590961660?s=20).

[![Omnishot article cover: screenshots routed from a local Mac into path-ref, image paste, S3 URL, and public-link workflows](docs/assets/readme/omnishot-article-cover.png)](https://x.com/curious_queue/status/2051832335973364102?s=20)

This is a small macOS utility that turns a normal screenshot into a named,
routeable artifact:

- semantic local filename from Apple Vision OCR + Apple Foundation Models
- compact `screenshot-info` payload for remote agents over SSH
- direct image paste for chat apps, docs, Finder, Cursor, and VS Code
- S3/public links when a URL is the right transport

It is public as a reference implementation, not a polished product. The useful
parts to copy are the workflow shape, paste-mode contract, machine alias
convention, and agent instructions.

## Ask Your Agent To Adapt This Repo

Paste this into your coding agent before deciding what to copy:

```text
Read through https://github.com/ma08/omnishot and help me adapt
the screenshot workflow to my own machine setup.

Focus on reusable patterns, not copying the repo author's machine-specific config.

Ask me targeted questions about:
- where my screenshots land
- whether I use local or remote coding agents
- my SSH aliases between machines
- whether I need S3/public links or only local SSH transfer
- what apps I paste screenshots into most often

Then recommend the smallest useful version I should implement.
```

## Workflow At A Glance

```mermaid
flowchart TD
A["macOS screenshot
Cmd+Shift+3/4/5"] --> B["Detect + batch
watcher.py"]
B --> C["Understand locally
Vision OCR + Apple Foundation Models"]
C --> D["Semantic rename
timestamp + content slug"]
D --> E["Persist + upload
SQLite history + S3 link"]
E --> F["Ready to route
clipboard + menubar shortcuts"]

F --> G["Cmd+V / Cmd+Option+V
path-ref"]
F --> H["Cmd+Shift+Option+V
image paste"]
F --> I["Cmd+Control+Option+V
S3 URL"]
F --> N["Cmd+Control+Shift+Option+V
public link"]

G --> J["screenshot-info
machine + path"]
J --> K["Remote agents
scp into task artifacts"]
H --> L["Local apps
chat/docs/Finder/editor"]
I --> M["Web/social sharing
URLs when needed"]
N --> M
```

The core abstraction is:

```text
screenshot -> named artifact -> route to the current surface
```

S3 is one transport, not the whole point.

## Visual Walkthrough

The menu bar keeps the latest screenshot actions inspectable even when keyboard
shortcuts are faster.


Omnishot menu bar actions showing path reference, S3 URL, image paste, public link, recent screenshots, and folder actions

The local naming and upload pipeline is observable, so failures are not hidden
inside a background watcher.


Langfuse trace for an Omnishot screenshot processing run

## Why It Exists

Taking screenshots is easy. Getting a screenshot into the right agent, on the
right machine, with a useful filename and durable task context, is still awkward.

Before:

```text
Screenshot 2026-02-23 at 9.43.31 PM.png
```

After:

```text
2026-02-23_21h43m40s_PST_cursor-settings-heavy-memory.png
```

Default agent payload:

```text
screenshot-info:
machine: work-mac
path: /Users/alex/Pictures/Screenshots/2026-05-05_08h36m35s_PDT_dashboard-error-state.png
```

A remote agent can then copy the image into its task folder:

```bash
scp 'work-mac:/Users/alex/Pictures/Screenshots/example.png' user_inputs/input_artifacts/
```

## Quick Start

```bash
git clone https://github.com/ma08/omnishot.git
cd omnishot
uv sync

# Required for uploads and URL paste modes
export OMNISHOT_BUCKET=your-bucket-name

# Optional, recommended on supported macOS versions
./scripts/build-swift.sh

# Recommended runtime
uv run omnishot menubar
```

## Paste Modes

| Shortcut | Payload |
|----------|---------|
| `Cmd+V` | configured default after capture, `path-ref` by default |
| `Cmd+Option+V` | latest path reference (`screenshot-info`) |
| `Cmd+Shift+Option+V` | latest image directly |
| `Cmd+Control+Option+V` | latest S3 URL |
| `Cmd+Control+Shift+Option+V` | latest public link |

Configure the machine token with:

```bash
export BOT_MACHINE_SSH_ALIAS=work-mac
```

or pass `--ssh-host-hint work-mac`.

## Agent Instructions

The compact payload works best when your global agent instructions teach agents
what to do with it:

- treat `machine` as an SSH alias
- copy the referenced image into the active task's `user_inputs/input_artifacts/`
- index the artifact before relying on it

Reference implementation: [botfiles PR #23](https://github.com/ma08/botfiles/pull/23)
adds this behavior to global Codex/Claude instructions.

## Demo Clips

The full walkthrough thread is on X: [Omnishot screenshot routing demos](https://x.com/curious_queue/status/2052106783590961660?s=20).

GitHub strips inline MP4 players from README Markdown, so each card uses a
short animated preview and links to a browser-playable hosted clip.




Animated preview of Omnishot capturing a Mac screenshot and a remote agent retrieving it


Capture + remote retrieval

Mac screenshot -> path-ref paste -> VM copies image into task artifacts.

Watch clip





Animated preview of Omnishot paste routes for path-ref, S3 links, public links, and picker access


Paste routes montage

Path-ref, signed S3 URL, public URL, picker access, and link checks.

Watch clip





Animated preview of Omnishot menu bar controls for latest screenshot actions


Menu bar controls

Latest image, path-ref, S3 URL, and public link actions.

Watch clip





Animated preview of a Langfuse trace for the Omnishot screenshot naming and upload pipeline


Langfuse trace

Pipeline observability for screenshot naming and upload.

Watch clip

## Docs

- [Usage](docs/usage.md) - setup, modes, shortcuts, launchd, CLI options
- [Agent instructions](docs/agent-instructions.md) - reusable `screenshot-info`
handling for Codex/Claude-style agents
- [Architecture](docs/architecture.md) - pipeline diagram and data flow
- [Tech stack](docs/tech-stack.md) - exact implementation details and citations
- [Troubleshooting](docs/troubleshooting.md) - permissions, S3, clipboard, prompt evals
- [Contributing](CONTRIBUTING.md) - local checks and PR expectations

## Architecture At A Glance

```mermaid
flowchart TD
A["macOS screenshot folder
~/Pictures/Screenshots"] --> B["watcher.py
detect new PNGs + batch monitors"]
B --> C["DescribeImage.swift
Vision OCR + Apple Foundation Models"]
C --> D["enrich.py
semantic filename + fallback logic"]
D --> E["Renamed local PNG"]

E --> F["history.py
SQLite recent captures"]
E --> G["upload.py
S3 object + presigned/public links"]

F --> H["menubar.py
history UI + shortcuts"]
G --> H

H --> I["paste.py
path-ref text"]
H --> J["AppKit pasteboard
image/file paste"]
H --> K["S3 URL/public link"]

I --> L["Remote coding agents
scp into task artifacts"]
J --> M["Local app surfaces
chat/docs/Finder/editor"]
K --> N["Web/social surfaces
shareable URL"]
```

## License

MIT