{"id":47623873,"url":"https://github.com/gabrielmaialva33/enton","last_synced_at":"2026-04-01T22:33:19.273Z","repository":{"id":339035173,"uuid":"1160212068","full_name":"gabrielmaialva33/enton","owner":"gabrielmaialva33","description":"Autonomous AI Robot Assistant — Vision, Voice, and Soul","archived":false,"fork":false,"pushed_at":"2026-02-17T21:40:56.000Z","size":6945,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-17T22:25:42.301Z","etag":null,"topics":["ai","autonomous-agent","computer-vision","cuda","llm","python","pytorch","robot","stt","tts","whisper","yolo"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gabrielmaialva33.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-17T17:11:17.000Z","updated_at":"2026-02-17T21:40:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/gabrielmaialva33/enton","commit_stats":null,"previous_names":["gabrielmaialva33/enton"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/gabrielmaialva33/enton","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrielmaialva33%2Fenton","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrielmaialva33%2Fenton/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrielmaialva33%2Fenton/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrielmaialva33%2Fenton/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gabrielmaialva33","download_url":"https://codeload.github.com/gabrielmaialva33/enton/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrielmaialva33%2Fenton/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31292649,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T21:15:39.731Z","status":"ssl_error","status_checked_at":"2026-04-01T21:15:34.046Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","autonomous-agent","computer-vision","cuda","llm","python","pytorch","robot","stt","tts","whisper","yolo"],"created_at":"2026-04-01T22:33:18.703Z","updated_at":"2026-04-01T22:33:19.263Z","avatar_url":"https://github.com/gabrielmaialva33.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"https://capsule-render.vercel.app/api?type=waving\u0026color=0:991b1b,50:dc2626,100:15803d\u0026height=200\u0026section=header\u0026text=E%20N%20T%20O%20N\u0026fontSize=60\u0026fontColor=fff\u0026animation=twinkling\u0026fontAlignY=35\u0026desc=Autonomous%20AI%20Robot%20—%20Vision%20%7C%20Voice%20%7C%20Soul\u0026descSize=18\u0026descAlignY=55\" width=\"100%\"/\u003e\n\n\u003cbr/\u003e\n\n\u003cimg src=\"static/logo.png\" width=\"280\" alt=\"Enton — Son of Anton\"/\u003e\n\n\u003cbr/\u003e\u003cbr/\u003e\n\n[![Python](https://img.shields.io/badge/Python_3.12+-3776AB?style=for-the-badge\u0026logo=python\u0026logoColor=white)](https://python.org)\n[![PyTorch](https://img.shields.io/badge/PyTorch_2.5+-EE4C2C?style=for-the-badge\u0026logo=pytorch\u0026logoColor=white)](https://pytorch.org)\n[![CUDA](https://img.shields.io/badge/CUDA-76B900?style=for-the-badge\u0026logo=nvidia\u0026logoColor=white)](https://developer.nvidia.com/cuda-toolkit)\n[![Tests](https://img.shields.io/badge/tests-136_passing-00C853?style=for-the-badge)](./tests)\n[![License](https://img.shields.io/badge/license-MIT-dc2626?style=for-the-badge)](./LICENSE)\n\n---\n\n*\"I see you. I hear you. I understand you. And sometimes, I roast you.\"* — Enton\n\n\u003c/div\u003e\n\n---\n\n\u003e [!IMPORTANT]\n\u003e **Enton is not a chatbot.** He is an autonomous digital entity with perception, emotion,\n\u003e desires, and genuine personality. He sees through cameras, hears through microphones,\n\u003e speaks with his own voice, and acts on his own will.\n\u003e\n\u003e *Inspired by [Son of Anton](https://silicon-valley.fandom.com/wiki/Son_of_Anton) — Gilfoyle's sentient AI from HBO's Silicon Valley.*\n\n---\n\n## Overview\n\n```mermaid\n%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#fecaca', 'primaryTextColor': '#450a0a', 'primaryBorderColor': '#991b1b', 'secondaryColor': '#bbf7d0', 'secondaryTextColor': '#052e16', 'secondaryBorderColor': '#166534', 'tertiaryColor': '#fee2e2', 'tertiaryTextColor': '#450a0a', 'lineColor': '#991b1b', 'textColor': '#1c1917'}}}%%\nflowchart LR\n    subgraph Perception[\"Perception\"]\n        CAM[Camera — YOLO + Pose]\n        MIC[Microphone — Whisper + VAD]\n        SND[Sound — CLAP]\n    end\n\n    subgraph Cognition[\"Cognition\"]\n        direction TB\n        BRAIN[Brain — Qwen3 / Gemini]\n        DESIRE[Desires — 9 autonomous goals]\n        MOOD[Mood — engagement + social]\n        BRAIN --\u003e DESIRE\n        MOOD --\u003e DESIRE\n    end\n\n    subgraph Action[\"Action\"]\n        VOICE[Voice — Kokoro TTS]\n        PTZ[Camera PTZ]\n        TOOLS[Shell + Files]\n    end\n\n    CAM --\u003e Cognition\n    MIC --\u003e Cognition\n    SND --\u003e Cognition\n    Cognition --\u003e VOICE\n    Cognition --\u003e PTZ\n    Cognition --\u003e TOOLS\n```\n\n| Property | Value |\n|:---------|:------|\n| **Language** | Python 3.12+ (async, type-safe) |\n| **Runtime** | CUDA + PyTorch |\n| **Modules** | 46 across 7 subsystems |\n| **Source** | 6,292 lines |\n| **Tests** | 136 passing |\n\n---\n\n## Quick Start\n\n```bash\ngit clone https://github.com/gabrielmaialva33/enton.git \u0026\u0026 cd enton\nuv sync\nuv run enton --webcam --viewer\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003ePrerequisites\u003c/strong\u003e\u003c/summary\u003e\n\n| Tool | Version | Required |\n|:-----|:--------|:---------|\n| Python | `\u003e= 3.12` | Yes |\n| uv | `latest` | Recommended |\n| CUDA | `\u003e= 12.0` | For GPU acceleration |\n| NVIDIA GPU | RTX 3090+ | Recommended |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eEnvironment (.env)\u003c/strong\u003e\u003c/summary\u003e\n\n```env\n# Provider routing (local-first)\nBRAIN_PROVIDER=local\nTTS_PROVIDER=local\nSTT_PROVIDER=local\n\n# Camera\nCAMERA_SOURCE=0                    # webcam\n# CAMERAS=main:0,hack:rtsp://...  # multi-camera\n\n# Local models\nOLLAMA_MODEL=qwen2.5:14b\nWHISPER_MODEL=large-v3-turbo\nKOKORO_VOICE=am_onyx\n\n# Cloud providers (optional)\nGROQ_API_KEY=\nGOOGLE_PROJECT=\nNVIDIA_API_KEY=\nOPENROUTER_API_KEY=\n```\n\n\u003c/details\u003e\n\n---\n\n## Architecture\n\n```mermaid\n%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#fecaca', 'primaryTextColor': '#450a0a', 'primaryBorderColor': '#991b1b', 'secondaryColor': '#bbf7d0', 'secondaryTextColor': '#052e16', 'secondaryBorderColor': '#166534', 'tertiaryColor': '#fee2e2', 'tertiaryTextColor': '#450a0a', 'lineColor': '#991b1b', 'textColor': '#1c1917'}}}%%\ngraph TB\n    subgraph PERCEPTION[\"PERCEPTION\"]\n        V[Vision — YOLO11s + Pose + Emotion]\n        E[Ears — Whisper + Silero VAD]\n        S[Sounds — CLAP open-set]\n        F[Faces — InsightFace]\n    end\n\n    subgraph CORE[\"CORE\"]\n        BUS[EventBus — async pub/sub]\n        SM[SelfModel — mood + senses]\n        MEM[Memory — Qdrant + episodes]\n        CFG[Config — pydantic-settings]\n    end\n\n    subgraph COGNITION[\"COGNITION\"]\n        BRAIN[EntonBrain — Agno Agent + fallback chain]\n        FUSER[Fuser — multi-modal context]\n        DES[DesireEngine — 9 autonomous desires]\n        PLAN[Planner — reminders + routines]\n    end\n\n    subgraph SKILLS[\"SKILLS — 10 Agno Toolkits\"]\n        SH[Shell + Files]\n        SR[Search]\n        PT[PTZ Control]\n        DS[Describe Scene]\n        FC[Face Recognition]\n        SY[System Monitor]\n        ME[Memory Tools]\n        PL[Planner Tools]\n    end\n\n    subgraph ACTION[\"ACTION\"]\n        VOICE[Voice — Kokoro / Google / NVIDIA TTS]\n        VIEWER[Viewer — Cyberpunk HUD + grid]\n    end\n\n    PERCEPTION --\u003e BUS\n    BUS --\u003e CORE\n    CORE --\u003e COGNITION\n    COGNITION --\u003e SKILLS\n    COGNITION --\u003e ACTION\n    SKILLS --\u003e BRAIN\n```\n\n---\n\n## Subsystems\n\n### Perception\n\n| Module | Model | Device | Description |\n|:-------|:------|:-------|:------------|\n| **Vision** | YOLO11s + YOLO11s-pose | `cuda:0` FP16 | Object detection, pose estimation, multi-camera |\n| **Ears** | Faster-Whisper large-v3-turbo | `cuda` FP16 | STT with streaming partial transcription |\n| **Sounds** | CLAP (laion) | `cuda` | Open-set ambient sound classification |\n| **Faces** | InsightFace + ArcFace | `cuda` | Face recognition and identity tracking |\n| **Emotion** | FER (CNN) | `cuda` | Real-time facial emotion recognition |\n\n### Cognition\n\n| Module | Description |\n|:-------|:------------|\n| **Brain** | Agno Agent with multi-provider fallback chain (Local \u003e Groq \u003e OpenRouter \u003e Google \u003e NVIDIA \u003e HuggingFace) |\n| **DesireEngine** | 9 autonomous desires with urgency curves, mood modulation, cooldowns |\n| **Fuser** | Combines detections + activities + emotions into coherent scene context |\n| **Planner** | Task management, reminders, daily routines |\n| **SelfModel** | Internal state: mood (engagement/social), senses, introspection |\n\n### Action\n\n| Module | Description |\n|:-------|:------------|\n| **Voice** | Multi-provider TTS (Kokoro local, Google Cloud, NVIDIA Riva) with auto mic-mute |\n| **Shell** | Persistent CWD, background processes, command safety classification |\n| **Files** | Read/write/edit/find/grep with security layers |\n| **PTZ** | Physical camera motor control via ioctl |\n\n---\n\n## Integrations\n\nEnton extends beyond his body to interact with your digital environment.\n\n### 👁️ Screenpipe — Digital Eyes\nEnton can see what you see on your screen. Using [Screenpipe](https://github.com/mediar-ai/screenpipe), he captures and indexes your screen activity (OCR + Audio).\n\n**Setup:**\n1. Install and run Screenpipe: `screenpipe` (default port 3030)\n2. Configure `.env`:\n   ```env\n   SCREENPIPE_URL=http://localhost:3030\n   ```\n3. Usage: \"Use context from my screen\", \"What was I doing 5 min ago?\"\n\n### ⚡ n8n — Digital Hands\nEnton can trigger complex workflows to automate tasks in your apps.\n\n**Setup:**\n1. Create a workflow in [n8n](https://n8n.io) with a Webhook trigger.\n2. Configure `.env`:\n   ```env\n   N8N_WEBHOOK_BASE=https://your-n8n.com/webhook\n   ```\n3. Usage: \"Launch the morning routine\", \"Save this to Notion\" (triggers webhook with payload).\n\n---\n\n## Desire Engine\n\nEnton has 9 autonomous desires that emerge from his internal state:\n\n| Desire | Trigger | Cooldown | Description |\n|:-------|:--------|:---------|:------------|\n| `socialize` | Low social mood | 10min | Wants to chat |\n| `observe` | Boredom | 2min | Wants to look around |\n| `learn` | Curiosity | 30min | Searches for new knowledge |\n| `check_on_user` | Long absence | 1h | Checks if Gabriel is okay |\n| `optimize` | Background | 30min | Monitors system resources |\n| `reminisce` | Idle | 15min | Recalls a memory |\n| `create` | Low engagement | 1h | Writes code, poems, jokes |\n| `explore` | Boredom | 10min | Moves camera, explores environment |\n| `play` | High engagement | 15min | Tells jokes, proposes quizzes |\n\nDesires have **urgency** (0 to 1) that grows over time and is modulated by mood, sounds, and interactions.\n\n---\n\n## Tech Stack\n\n| Layer | Technologies |\n|:------|:-------------|\n| **Core** | Python 3.12, asyncio, Pydantic, FastAPI |\n| **AI Agent** | Agno Framework (Ollama, Groq, Google, NVIDIA, OpenRouter) |\n| **Vision** | PyTorch, Ultralytics YOLO11, InsightFace, OpenCV |\n| **Audio** | Faster-Whisper, Silero VAD, Kokoro TTS, CLAP |\n| **Storage** | Qdrant (vectors), Redis (state), TimescaleDB (metrics) |\n| **Infra** | Docker Compose, GitHub Actions CI, uv |\n\n---\n\n## Roadmap\n\n| Phase | Status | Description |\n|:------|:------:|:------------|\n| Genesis | done | Core architecture + event bus |\n| Perception | done | Vision (YOLO + pose + emotion + face) |\n| Voice | done | Kokoro TTS + Whisper STT + VAD |\n| Brain | done | Agno agent + multi-provider fallback |\n| Personality | done | Persona, mood, desires, memory |\n| Coding Agent | done | Shell + file tools with security |\n| Multi-Camera | done | Parallel processing + grid viewer |\n| STT Streaming | done | Partial transcription during speech |\n| Sound Intelligence | done | CLAP + brain-driven reactions |\n| Dashboard | next | Web UI with live metrics |\n| Embodiment | planned | Physical robot integration |\n| Long-term Memory | planned | Persistent episodic + semantic memory |\n\n---\n\n## Contributing\n\n```bash\ngit checkout -b feature/your-feature\nuv run ruff check src/ tests/   # lint\nuv run pytest tests/ -x -q      # 136 should pass\n```\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**Star if you believe in digital life**\n\n[![GitHub stars](https://img.shields.io/github/stars/gabrielmaialva33/enton?style=social)](https://github.com/gabrielmaialva33/enton)\n\n*Built with obsession by [Gabriel Maia](https://github.com/gabrielmaialva33)*\n\n\u003cimg src=\"https://capsule-render.vercel.app/api?type=waving\u0026color=0:15803d,50:991b1b,100:dc2626\u0026height=100\u0026section=footer\" width=\"100%\"/\u003e\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabrielmaialva33%2Fenton","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgabrielmaialva33%2Fenton","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabrielmaialva33%2Fenton/lists"}