{"id":50063789,"url":"https://github.com/sam-siavoshian/agent-notch","last_synced_at":"2026-05-22T22:00:36.175Z","repository":{"id":358312106,"uuid":"1240892334","full_name":"sam-siavoshian/agent-notch","owner":"sam-siavoshian","description":"Agent Notch — macOS computer-use agent that lives in the notch. Voice in, screen context in, Claude Sonnet drives the mouse.","archived":false,"fork":false,"pushed_at":"2026-05-18T08:00:11.000Z","size":9951,"stargazers_count":18,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-22T04:28:15.732Z","etag":null,"topics":["ai-agent","anthropic","apple-silicon","automation","claude","computer-use","desktop-agent","gemini","hackathon","llm","macos","macos-app","multimodal","notch","ocr","screen-capture","swift","swiftui","voice-assistant","whisperkit"],"latest_commit_sha":null,"homepage":"https://agent-notch.vercel.app/","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sam-siavoshian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-16T17:44:48.000Z","updated_at":"2026-05-18T08:00:19.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/sam-siavoshian/agent-notch","commit_stats":null,"previous_names":["sam-siavoshian/tritonhacks2026","sam-siavoshian/agent-notch"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sam-siavoshian/agent-notch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sam-siavoshian%2Fagent-notch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sam-siavoshian%2Fagent-notch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sam-siavoshian%2Fagent-notch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sam-siavoshian%2Fagent-notch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sam-siavoshian","download_url":"https://codeload.github.com/sam-siavoshian/agent-notch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sam-siavoshian%2Fagent-notch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33372736,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-22T21:56:13.512Z","status":"ssl_error","status_checked_at":"2026-05-22T21:56:10.769Z","response_time":265,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent","anthropic","apple-silicon","automation","claude","computer-use","desktop-agent","gemini","hackathon","llm","macos","macos-app","multimodal","notch","ocr","screen-capture","swift","swiftui","voice-assistant","whisperkit"],"created_at":"2026-05-21T21:18:07.468Z","updated_at":"2026-05-22T22:00:36.169Z","avatar_url":"https://github.com/sam-siavoshian.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Agent Notch\n\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![macOS 14+](https://img.shields.io/badge/macOS-14%2B-black.svg)](#)\n[![Swift 5.10](https://img.shields.io/badge/swift-5.10-orange.svg)](#)\n[![Built with Claude Haiku 4.5](https://img.shields.io/badge/computer--use-Claude%20Haiku%204.5-7c3aed.svg)](https://www.anthropic.com/)\n\nmacOS computer-use agent that lives in the notch.\n\nLong-press the cursor companion, talk, Claude Haiku 4.5 drives the mouse. The notch shows what it is doing. Screen context (OCR + Gemini) builds a persistent UI map; Mercury 2 distills it into a brief before every agent turn.\n\n\u003e Requires an M-series MacBook with a physical notch. macOS 14+.\n\n---\n\n## Quick start\n\n```bash\nbrew install xcodegen\nbash scripts/setup-signing.sh\nxcodegen generate\nopen AgentNotch.xcodeproj\n```\n\nSet `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `OPENAI_API_KEY`, and `OPENROUTER_API_KEY` in the Xcode scheme env. Build, run, grant the three permissions, long-press the cursor.\n\n---\n\n## Keys\n\n| Var | Used by |\n|---|---|\n| `ANTHROPIC_API_KEY` | Claude Haiku 4.5 computer-use agent |\n| `GEMINI_API_KEY` | Continuous background screen observer (Gemini Flash Lite) |\n| `OPENAI_API_KEY` | Voice transcription (Whisper) + TTS |\n| `OPENROUTER_API_KEY` | Mercury 2 context selector + ActiveTaskUpdater |\n| `ANTHROPIC_NOTCH_DEMO_PROMPT` | Optional. Hardcoded transcript for mic-less demos. |\n\nNever committed. Set via Xcode scheme env or enter in the in-app Settings UI — keys are stored in the macOS Keychain (`com.agentnotch.app`). Env var wins over Keychain if both are set.\n\n---\n\n## Permissions\n\nOnboarding asks for three:\n\n- **Accessibility** — long-press detection + click/keystroke synthesis\n- **Screen Recording** — context capture\n- **Microphone** — voice in\n\n---\n\n## Controls\n\n- Hover notch, it opens\n- Drag down to open, drag up to close\n- `⌘D` toggles open/closed\n- Long-press cursor companion, talk, release, agent fires\n- Tabs: Home (status), Settings (reasoning, color, prefs), Spotify, Calendar\n- `⌘⇧I` opens the Dev Tools window\n\n---\n\n## Why a signing script?\n\nAd-hoc signing changes the cdhash every build. macOS TCC keys permission grants by cdhash, so grants vanish on every rebuild. `scripts/setup-signing.sh` wires your Apple Development cert in so grants stick.\n\nOpen Xcode → Settings → Accounts → Manage Certificates → + → Apple Development if the script cannot find one.\n\n---\n\n## Layout\n\n```\nApp/                  app entry, entitlements\nCore/                 shared types, settings, secrets\nFeatures/Notch/       notch UI + tabs\nFeatures/Cursor/      cursor companion + long-press\nFeatures/Context/     screen capture, OCR, Gemini observer, Mercury selector, event pipeline\nFeatures/Agent/       Whisper + IntentRouter + Claude Haiku 4.5 computer-use loop\nFeatures/Calendar/    EventKit calendar tab\nFeatures/Music/       Spotify tab\nFeatures/Onboarding/  first-launch permissions\n```\n\nDetails in [`AGENTS.md`](AGENTS.md) and [`CLAUDE.md`](CLAUDE.md). Product spec in [`PRD.md`](PRD.md). Pipeline walkthrough in [`AGENT_PIPELINE.md`](AGENT_PIPELINE.md).\n\n---\n\n## Context system\n\nThe agent doesn't start from zero when you talk to it. It already knows what app you're in, what you've been working on, and who the people and things you mention actually are.\n\nTwo paths run in parallel — a quiet background observer that learns your apps over time, and a fast foreground path that pulls it all together the moment you long-press.\n\n### Background — always watching, politely\n\n- **Dirty detector** — only \"looks\" when the screen genuinely changed (perceptual hash + pixel diff). Self-tunes its sensitivity in noisy environments.\n- **Gemini observer** — turns each meaningful frame into structured understanding: app, surface, controls, narrative, content. Throttled to one call every 8s. Captures verbatim — names, URLs, paths — never paraphrases.\n- **Surface memory** — per-app UI map built up over many observations. Knows where the Send button lives in Slack, what your Discord DMs look like, where settings hide in Figma. Self-prunes after 30 days.\n- **Capture story log** — chronological narrative of your day, persisted across restarts. Daily-rotated JSONL.\n\n### Foreground — sub-second on long-press\n\n- **L2 snapshot** — 0.4s parallel capture of right-now: frontmost app, full UI tree, OCR'd text, selection, clipboard, cursor, app-specific blob.\n- **App adapters** — browser URL (credentials stripped), terminal cwd, IDE file + project root. Each on a strict timeout.\n- **Selector** — assembles the live snapshot + UI memory + recent story + active task + recipes + resources, calls Mercury 2, gets a structured brief back in ~600ms. Local fallback if Mercury times out.\n- **Resolved references** — \"her\", \"that doc\", \"the repo\" mapped to concrete entities before the action model starts.\n- **Cross-app routing** — \"DM phone1k\" while in Brave opens Discord, because that's where phone1k actually lives. Decided deterministically from surface memory, independent of the model.\n\n### Underneath\n\n- **Event pipeline** — keystrokes (burst-batched), focus changes, copy/paste, dwells flow through a single ingest point.\n- **Recipe learner** — repeated action sequences get promoted to reusable recipes after the 3rd occurrence.\n- **Active task tracker** — rolling sense of what you're working on; refreshes itself when it drifts from reality.\n- **Resource index** — recent-touched URLs/files/channels, so \"the thing I just had open\" resolves after you switch apps.\n\n### Privacy by architecture\n\n- Password managers (1Password, Bitwarden, Keychain) — nothing logged.\n- Secure input fields anywhere on the system — typed text dropped.\n- URL credentials — stripped before storage.\n- Clipboard taint — paste from a never-log app is dropped wherever it lands.\n- Agent never logs its own UI. Single kill switch pauses all collection.\n\n### Visibility\n\n`⌘⇧I` opens the Dev Tools window. Live observation stream, browser of the per-app UI memory, every Mercury request/response, structured brief Claude actually saw, full long-press timeline. Nothing is a black box.\n\nCode lives in [`Features/Context/`](Features/Context/). The full architecture map is in [`CLAUDE.md`](CLAUDE.md).\n\n---\n\n## Stack\n\nSwift, SwiftUI, Claude Haiku 4.5 (computer-use), Mercury 2 via OpenRouter (context selector), Gemini Flash Lite (background screen observer), OpenAI Whisper + TTS, Vision OCR, CGEvent, ScreenCaptureKit, XcodeGen.\n\nBuilt at TritonHacks 2026.\n\n---\n\n## Contributing\n\nPRs welcome. Conventions live in [`AGENTS.md`](AGENTS.md). Run `xcodegen generate` after pulling. Keep diffs minimal and stay out of `vendored/`.\n\n## License\n\n[MIT](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsam-siavoshian%2Fagent-notch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsam-siavoshian%2Fagent-notch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsam-siavoshian%2Fagent-notch/lists"}