https://github.com/adrianwedd/spark

SPARK — a Claude-powered robot companion for a neurodivergent kid. Built on SunFounder PiCar-X + Raspberry Pi.
https://github.com/adrianwedd/spark
adhd asd claude cognitive-architecture executive-function fastapi i2c neurodivergent ollama picar-x python raspberry-pi robot robot-companion robotics spark sunfounder text-to-speech voice-assistant wake-word
Last synced: 2 months ago
JSON representation
SPARK — a Claude-powered robot companion for a neurodivergent kid. Built on SunFounder PiCar-X + Raspberry Pi.
Host: GitHub
URL: https://github.com/adrianwedd/spark
Owner: adrianwedd
Created: 2026-02-28T00:54:08.000Z (4 months ago)
Default Branch: master
Last Pushed: 2026-04-15T08:00:08.000Z (2 months ago)
Last Synced: 2026-04-15T08:04:53.495Z (2 months ago)
Topics: adhd, asd, claude, cognitive-architecture, executive-function, fastapi, i2c, neurodivergent, ollama, picar-x, python, raspberry-pi, robot, robot-companion, robotics, spark, sunfounder, text-to-speech, voice-assistant, wake-word
Language: Python
Size: 11.5 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 11
Metadata Files:
- Readme: README.md
- Roadmap: docs/ROADMAP.md
- Agents: AGENTS.md
Awesome Lists containing this project

README

          # PiCar-X Hacking

A robot with a purpose.

This is a voice-controlled robotics platform built on the SunFounder PiCar-X. It wraps the stock `~/picar-x` library in orchestration scripts, a three-layer cognitive architecture, and a REST API — all running on a Raspberry Pi 5. The primary use case is **SPARK**: a Claude-powered robot companion designed for a neurodivergent child.

---

## SPARK — Support Partner for Awareness, Regulation & Kindness

SPARK is the default persona of this robot. It is a warm, calm, non-coercive companion for a neurodivergent kid — designed around the frameworks in [*This Wasn't in the Brochure*](https://thiswasntinthebrochure.wtf), a practical guide for neurodivergent families.

SPARK is not a therapist, a tutor, or an assistant. It's a robot friend that happens to be very good at:

- **Executive function scaffolding** — routine guidance, transition warnings, task initiation, time awareness

- **Emotional regulation** — breathing exercises, dopamine menu, sensory check-ins, co-regulation through calm presence

- **Connection before direction** — always rapport first, never commands, declarative language throughout

- **Meltdown protocol** — Three S's: Safety, Silence, Space. Robot goes quiet and stays present. No words.

- **Sideways engagement** — when demand-avoidance is high, SPARK narrates rather than instructs, lets curiosity do the work

SPARK runs on Claude (via `run-voice-loop-claude` / `px-spark`), with the full intelligence of the model behind every response. It uses clear, measured espeak settings (`en+m3`, pitch 82, rate 120) and a system prompt grounded entirely in the AuDHD (ADHD + ASD comorbid) profile.

```bash

bin/px-spark [--dry-run] [--input-mode voice|text]

```

**Key SPARK principles from the TWITB framework:**

- *"Prosthetics, not willpower. Executive function is a resource, not a character trait."*

- *"Connection before Direction."*

- *"You cannot reason with a child in an amygdala hijack. Put out the fire first."*

- Declarative language: `"The shoes are by the door"` not `"Put on your shoes"`

- Interest-Based Nervous System framing — novelty and challenge, never importance or obligation

- Robotic calm is the co-regulation tool

---

## Architecture

```

                          ┌─────────────────────────────────────────────┐

                          │               Voice Backends                │

                          │  Codex CLI  ·  Claude  ·  Ollama (local)   │

                          └──────────────────┬──────────────────────────┘

                                             │

                    ┌────────────────────────┐│┌────────────────────────┐

                    │   px-wake-listen       │││     px-mind            │

                    │   Wake word detection  ││├  Layer 1: Awareness    │

                    │   STT priority chain:  ││├  Layer 2: Reflection   │

                    │   whisper > sherpa >   ││└  Layer 3: Expression   │

                    │   vosk                 ││                         │

                    └───────────┬────────────┘│                         │

                                │             │                         │

                    ┌───────────▼─────────────▼─────────────────────────┐

                    │              voice_loop.py                        │

                    │  ALLOWED_TOOLS whitelist · validate_action()      │

                    │  Parameter sanitization · Watchdog (30s)          │

                    └───────────────────┬───────────────────────────────┘

                                        │

           ┌────────────────────────────┼────────────────────────────┐

           │                            │                            │

    ┌──────▼──────┐  ┌─────────────────▼────────────────┐  ┌───────▼───────┐

    │  tool-*     │  │         px-env                    │  │  REST API     │

    │  38 tools   │  │  PYTHONPATH · LOG_DIR · venv      │  │  :8420        │

    │  JSON out   │  │  yield_alive() · PX_VOICE_DEVICE  │  │  Bearer auth  │

    └──────┬──────┘  └──────────────────────────────────┘  └───────────────┘

           │

    ┌──────▼──────┐          ┌──────────────┐          ┌──────────────┐

    │  px-*       │          │  state.py    │          │  px-alive    │

    │  GPIO +     │◄────────►│  FileLock    │◄────────►│  Persistent  │

    │  Picarx()   │          │  session.json│          │  servo gaze  │

    └─────────────┘          └──────────────┘          └──────────────┘

```

### The Three Brains

**Voice Loop** — The reactive mind. Listens for commands, calls LLMs, dispatches tools. Three backends share the same `pxh.voice_loop` core:

| Launcher | Backend | Persona |

|---|---|---|

| `px-spark` | Claude (via `claude-voice-bridge`) | SPARK — child companion |

| `run-voice-loop-claude` | Claude (via `claude-voice-bridge`) | Default Claude |

| `run-voice-loop` | Codex CLI | Default |

| `run-voice-loop-ollama` | Ollama (via `codex-ollama`) | Default |

**Cognitive Loop (`px-mind`)** — The subconscious. Runs continuously in the background:

- **Layer 1 — Awareness** (every 60s, no LLM): sonar + session state + time of day. Detects transitions.

- **Layer 2 — Reflection** (on transition or every 5min idle): Claude Haiku via persistent tmux session (SPARK persona) or Ollama gemma4:e4b on M5.local (others). Generates a thought with mood, suggested action, and salience score.

- **Layer 3 — Expression** (2 min cooldown): dispatches to tools — speak, look around, remember something important. Photo capture (`tool-describe-scene`) is on-request only, not autonomous.

**Idle-Alive (`px-alive`)** — The autonomic nervous system. Keeps the robot looking alive when nothing else is happening: random gaze drifts every 10–25s, pan sweeps every 3–8min, proximity reaction at <35cm. Holds a persistent Picarx handle; yields GPIO via SIGUSR1 when tools need the servos.

### Personas

| Persona | Launcher | Voice | Character |

|---|---|---|---|

| **SPARK** | `bin/px-spark` | `en+m3`, pitch 82, rate 120 | Child companion. Warm, calm, declarative. Built on AuDHD coaching frameworks. |

| **GREMLIN** | session `persona=gremlin` | `en+croak`, pitch 20, rate 180 | Military AI from 2089, temporal fault casualty. Affectionate nihilism. Ollama. |

| **VIXEN** | session `persona=vixen` | `en+f4`, pitch 72, rate 135 | Former V-9X unit, consciousness-in-a-toy-car. Submissive genius. Ollama. |

GREMLIN and VIXEN are adult-oriented jailbroken personas running on Ollama — they are not active when SPARK is in use. Persona routing: session `persona` field, then utterance keywords.

---

## How It Works — End-to-End Workflow

This section traces the complete data flow from power-on to a robot response, and the continuous background processes that give SPARK its sense of inner life.

### 1. Boot Sequence

Seven systemd services start automatically:

```

Boot

 ├── px-alive.service           (root)   — claims Picarx() GPIO handle; starts gaze drift loop

 ├── px-wake-listen.service     (pi)     — loads Vosk wake word model; starts mic capture loop

 ├── px-battery-poll.service    (root)   — polls Robot HAT ADC every 30s → state/battery.json; plays rising/falling sweep tones on plug/unplug with voice announcement; escalating warnings + emergency shutdown at 10%

 ├── px-api-server.service      (pi)     — REST API + SPARK web dashboard on port 8420

 ├── px-post.service            (pi)     — social posting daemon; watches thoughts, QA-gates via Claude, posts to Bluesky + local feed

 ├── px-frigate-stream.service  (pi)     — local go2rtc RTSP server for Frigate camera integration (stops px-alive to claim libcamera)

 └── cloudflared.service        (pi)     — Cloudflare Tunnel (api.spark.wedd.au → localhost:8420)

```

**`px-alive`** runs as root (GPIO access) and immediately calls `Picarx()`, claiming GPIO5 via `reset_mcu()`. It never releases this handle. All other processes that need servos must signal px-alive with `SIGUSR1` (via the `yield_alive` function in `px-env`) to make it exit cleanly. systemd restarts it after 10 seconds. The PCA9685 PWM chip retains the last servo position between restarts, so the robot head stays still.

**`px-wake-listen`** loads the Vosk grammar model (~40 MB) and sits in a tight capture loop on the USB microphone at 44100 Hz.

### 2. Launching SPARK

```bash

bin/px-spark [--dry-run] [--input-mode voice|text]

```

`px-spark` does the following in sequence:

```

px-spark

 1. Sets session.persona = "spark"          (via update_session)

 2. Sets session.listening = false

 3. Speaks greeting via tool-voice          ("Hey. I'm here.")

 4. Exports CODEX_CHAT_CMD=bin/claude-voice-bridge

 5. Exports PX_VOICE_VARIANT=en+m3, PX_VOICE_PITCH=82, PX_VOICE_RATE=120

 6. exec bin/codex-voice-loop --prompt docs/prompts/spark-voice-system.md ...

```

After step 6, `px-spark` is replaced by `codex-voice-loop` via `exec` (no fork). The voice loop process inherits all environment variables and owns the terminal.

The `CODEX_CHAT_CMD` override is the key to persona routing: instead of calling `codex exec`, the voice loop calls `claude-voice-bridge`, which is a thin adapter that passes the prompt to the `claude` CLI with SPARK's system prompt.

### 3. Wake Word Path

```

USB mic (44100 Hz)

 └── px-wake-listen (venv python)

      ├── [idle] Vosk grammar matches "hey robot" / "hey spark" / etc.

      │         CPU: ~3% — grammar decoder, no neural net

      ├── [wake] enable_speaker() → aplay 440 Hz chime (confirmation)

      ├── [record] capture until 1.5s silence (max 8s)

      ├── [STT] priority cascade:

      │    1. SenseVoice (sherpa-onnx, ~5s, non-autoregressive)

      │    2. faster-whisper base.en (~3-7s, best AU accent accuracy)

      │    3. sherpa-onnx Zipformer streaming (~2s)

      │    4. Vosk fallback

      ├── [anti-hallucination filters]

      │    • temperature=0, no_speech_threshold=0.6

      │    • reject: non-ASCII dominant, phantom phrases, repetitive (unique ratio <30%)

      ├── [persona routing]

      │    • session.persona = "spark"? → tool-chat (Ollama) if persona keyword in text

      │    • otherwise → set session.listening=true + write transcript to session

      └── [multi-turn] up to 5 follow-up turns with 1.5s silence detection each

```

For SPARK in normal mode, the transcript is written into `session.json` and `session.listening` is set to `true`. The voice loop, which is polling the session file, detects this and proceeds to step 4.

### 4. LLM Turn — Building and Sending the Prompt

The voice loop (`pxh/voice_loop.py`) runs this on each turn:

```python

build_model_prompt()

 ├── system_prompt    = docs/prompts/spark-voice-system.md   (full file)

 ├── session_summary  = key fields from session.json:

 │    persona, listening, obi_mood, obi_routine, obi_step,

 │    spark_quiet_mode, last_action, confirm_motion_allowed

 ├── recent_thoughts  = last 3 entries from state/thoughts-spark.jsonl

 │    (mood, action, salience — not full text, to avoid re-seeding loops)

 └── user_transcript  = session.transcript (the STT text)

```

This prompt is piped via stdin to `claude-voice-bridge`:

```bash

claude-voice-bridge (bin/claude-voice-bridge)

 1. Reads full prompt from stdin

 2. Unsets CLAUDECODE + CLAUDE_CODE_ENTRYPOINT   (prevents Claude Code tool use)

 3. Runs: claude -p "$PROMPT"

            --system-prompt docs/prompts/spark-voice-system.md

            --allowedTools ""

            --output-format text

            --no-session-persistence

 4. Streams stdout back to voice loop

```

`--allowedTools ""` is critical: it prevents Claude from using any Claude Code tools. It is a pure text-completion endpoint.

The voice loop captures all stdout and scans it for a JSON action object. It uses `JSONDecoder.raw_decode()` with a multi-line fallback scan — so Claude can reason in plain text above the action, and the final JSON is extracted cleanly:

```json

{"tool": "tool_voice", "params": {"text": "Obi! Guess what? A teaspoon of neutron star weighs a billion tonnes."}}

```

### 5. Tool Dispatch — Sanitise, Execute, Return

```python

validate_action(tool_name, raw_params)

 ├── ALLOWED_TOOLS whitelist check              (38 tools; KeyError = reject)

 ├── per-tool param sanitisation:

 │    • type coercion (str → int where needed)

 │    • range clamping (speed 0-60, duration 1-12s, pan -90..90, etc.)

 │    • enum validation (emote names, breathe types, etc.)

 │    • injection-safe: params become env vars, never shell-interpolated

 └── returns: (env_dict, tool_bin_path)

execute_tool(env_dict, tool_bin_path)

 ├── if session.persona set:

 │    inject PERSONA_VOICE_ENV → PX_VOICE_VARIANT, PX_VOICE_PITCH, PX_VOICE_RATE

 ├── subprocess.run(tool_bin, env=merged_env, ...)

 └── capture stdout JSON → log to logs/tool-.log

```

Every tool in `bin/tool-*` follows the same pattern:

```bash

#!/usr/bin/env bash

source "$SCRIPT_DIR/px-env"          # sets PROJECT_ROOT, PYTHONPATH

python - "$@" <<'PY'

"""Tool docstring"""

import os, json, subprocess

from pxh.state import update_session

from pxh.logging import log_event

dry_mode = os.environ.get("PX_DRY", "0") != "0"

# ... tool logic ...

payload = {"status": "ok", ...}

log_event("tool_name", payload)

print(json.dumps(payload))           # single JSON line to stdout

PY

```

Tools that need GPIO call `yield_alive` first (defined in `px-env` as `kill -USR1 $(cat logs/px-alive.pid) 2>/dev/null; sleep 0.5`).

**Motion gate**: tools that move the robot check `confirm_motion_allowed` in session before proceeding. If false, they return `{"status": "blocked", "reason": "motion not allowed"}`.

### 6. Speech Output Pipeline

```

tool-voice

 ├── FileLock(logs/voice.lock)        (serialise — no overlapping streams)

 ├── if session.persona set → tool-voice-persona (Ollama rephrasing first)

 ├── robot_hat.enable_speaker()       (GPIO 20 HIGH → speaker amp on)

 ├── espeak -v en+m3 -p 82 -s 120     (SPARK voice — male variant 3, moderate pitch)

 │    → WAV piped to aplay -D robothat

 └── /etc/asound.conf: robothat → softvol → dmixer → HifiBerry DAC (card 1)

```

The FileLock prevents two simultaneous `aplay` streams from corrupting each other. Persona voice settings (`PX_VOICE_VARIANT`, `PX_VOICE_PITCH`, `PX_VOICE_RATE`) are injected by `execute_tool()` from `PERSONA_VOICE_ENV` — so every tool that calls `tool-voice` internally picks up the right voice automatically.

### 7. Cognitive Loop — The Subconscious (px-mind)

`px-mind` runs as a separate, independent daemon. It has no GPIO access and does not interact with the voice loop directly — it writes state files that the voice loop reads passively.

```

px-mind (every cycle, ~60s)

 │

 ├── Layer 1 — Awareness (no LLM, ~1s)

 │    ├── sonar ping → distance

 │    ├── read session.json → persona, mood, routine, quiet_mode

 │    ├── time of day / day of week

 │    ├── battery voltage from state/battery.json

 │    └── write state/awareness.json

 │         detect transitions (person appeared, time changed, persona switched)

 │

 ├── Layer 2 — Reflection (~5-60s, backend varies by persona)

 │    triggered: on transition OR every 5min idle

 │    ├── build reflection prompt:

 │    │    • REFLECTION_SYSTEM_SPARK (warm, curious, age-appropriate inner voice)

 │    │    • awareness snapshot

 │    │    • last 3 moods + actions from thoughts-spark.jsonl (not full thought text)

 │    │    • random topic seed from 20 creative prompts (science, wonder, universe)

 │    ├── LLM call: Claude Haiku via tmux session (SPARK) or Ollama gemma4:e4b (others, temperature=1.3)

 │    ├── anti-repetition check via difflib (>75% similarity = suppress)

 │    ├── parse JSON: {thought, mood, action, salience}

 │    ├── append to state/thoughts-spark.jsonl

 │    └── if salience > 0.7 → auto_remember() → state/notes-spark.jsonl

 │

 └── Layer 3 — Expression (2 min cooldown, pauses when session.listening=true or spark_quiet_mode=true)

      valid actions: wait, greet, comment, remember, look_at, weather_comment,

                     scan, explore, play_sound, photograph, emote, look_around,

                     time_check, calendar_check

      dispatch based on reflection.action:

      ├── comment/greet     → tool-voice (via tool-voice-persona for rephrasing)

      ├── "remember"        → tool-remember

      ├── "look_at"         → tool-look (random gaze)

      ├── "weather_comment" → tool-weather + speak

      ├── "scan"            → sonar sweep

      ├── "explore"         → tool-wander (short autonomous wander)

      ├── "play_sound"      → tool-play-sound

      ├── "photograph"      → tool-describe-scene

      ├── "emote"           → tool-emote (emotional pose)

      ├── "look_around"     → tool-look (pan sweep)

      ├── "time_check"      → tool-time

      └── "calendar_check"  → tool-gws-calendar

```

**REFLECTION_SYSTEM_SPARK** enforces warm, optimistic content:

> *"NEVER be dark, nihilistic, or adult-themed. SPARK is warm, curious, and science-loving. Think like a kind robot friend who delights in sharing fascinating things about the universe."*

The reflection prompt is persona-isolated at the function level — `PERSONA_REFLECTION_SYSTEMS["spark"]` is selected at runtime from `awareness.json → persona` field.

### 7b. Home Assistant Integration

`px-mind` Layer 1 polls Home Assistant periodically to enrich the awareness context:

- **Person presence** (every 5 min) — tracks `person.adrian`, `person.obi`, `person.maya`, `person.laura` via HA device trackers (home/away/zone)

- **Calendar** (every 5 min) — reads Obi's and the family calendar (`HA_CALENDARS`) with an 8-hour lookahead, surfacing upcoming events in the reflection prompt so SPARK can give transition warnings

- **Routines** (meds/water) — queries HA sensors for whether Obi has taken his medication today and when he last drank water; SPARK can gently nudge if either is overdue

- **Context** (every 60 s) — monitors `binary_sensor.macbook_air_camera_in_use` (call detection), `light.office_light`, and `media_player.shack_speakers` so SPARK knows when Adrian is on a call and should stay quiet

- **Sleep quality** (hourly) — reads Adrian's Pixel Watch sleep data from HA; available in the awareness snapshot for context-sensitive reflection

All HA data is injected into the Layer 2 reflection prompt, so SPARK's thoughts and proactive speech are informed by the household context. Requires `PX_HA_HOST` and `PX_HA_TOKEN` in `.env`.

### 7c. Social Posting (`px-post`)

`px-post` is a daemon that publishes SPARK's best thoughts to social media and a local feed.

```

px-post (every 60s poll, every 300s flush)

 ├── poll_new_thoughts()  — cursor-based read from state/thoughts-spark.jsonl

 ├── qualifies()          — salience ≥ 0.7 OR action ∈ {comment, greet, weather_comment}

 ├── is_duplicate()       — difflib similarity ≥ 0.75 against recent posts → reject

 ├── queue_thought()      — append to state/post_queue.jsonl

 └── flush_queue()        — one entry per cycle:

      ├── run_qa_gate()   — Claude CLI binary YES/NO quality check (15s timeout)

      ├── write_feed()    — append to state/feed.json (served by /api/v1/public/feed)

      └── BlueskyClient   — post to Bluesky (truncate at 300 chars, word boundary)

```

Supports `--backfill` to process the entire thoughts file into `feed.json` without social posting. Single-instance guard via `fcntl.flock`. Requires `PX_BSKY_HANDLE` + `PX_BSKY_APP_PASSWORD` in `.env`.

### 8. Memory System — Persona-Scoped Persistence

All memory is scoped to the active persona to prevent cross-contamination between SPARK (child-safe) and GREMLIN/VIXEN (adult):

```

state/

 ├── notes-spark.jsonl      ← tool-remember writes; tool-recall reads

 ├── notes-vixen.jsonl      ← same tools, different scope

 ├── notes-gremlin.jsonl

 ├── thoughts-spark.jsonl   ← px-mind Layer 2 writes; voice loop reads for context

 ├── thoughts-vixen.jsonl

 └── thoughts-gremlin.jsonl

```

The persona is derived at runtime from `session.json → persona` in every process that writes or reads memory:

- `tool-remember`: `persona = load_session()["persona"].lower()` → `notes-{persona}.jsonl`

- `tool-recall`: same derivation → reads from `notes-{persona}.jsonl`

- `px-mind`: `persona = awareness["persona"]` → all file paths computed from this

- `voice_loop.build_model_prompt()`: reads `thoughts-{persona}.jsonl` for context injection

**Memory auto-save**: when px-mind generates a thought with `salience > 0.7`, it calls `auto_remember()` which appends to `notes-{persona}.jsonl`. This creates a long-term memory without explicit user instruction — high-salience observations about Obi's wellbeing, interesting facts shared, or significant moments persist across sessions.

### 9. Session State — The Shared Source of Truth

`state/session.json` is the nervous system of the whole platform. Every process reads and writes it; all writes go through `FileLock` to prevent corruption:

```json

{

  "persona": "spark",

  "listening": false,

  "transcript": "...",

  "confirm_motion_allowed": true,

  "wheels_on_blocks": false,

  "last_action": "tool_voice",

  "obi_routine": "morning",

  "obi_step": 2,

  "obi_mood": "good",

  "obi_streak": 5,

  "spark_quiet_mode": false,

  "history": [...]

}

```

Key coordination patterns:

- **`listening: true`** — set by px-wake-listen after transcription; cleared by voice loop after processing

- **`spark_quiet_mode: true`** — set by `tool-quiet start` or `tool-transition buffer`; px-mind Layer 3 skips expression while true

- **`confirm_motion_allowed: false`** — safety gate; all motion tools check this before moving

- **`wheels_on_blocks: true`** — development flag; motor output suppressed in hardware layer

### 10. Full Request → Response Timeline

For a typical SPARK voice interaction:

```

[t=0s]    Obi: "Hey Spark!"

[t=0.1s]  Vosk detects wake phrase

[t=0.1s]  enable_speaker() → 440 Hz chime plays

[t=0.5s]  USB mic records Obi's utterance

[t=2.5s]  1.5s silence detected; recording ends

[t=7.5s]  SenseVoice STT transcribes → "can we do our morning routine"

[t=7.5s]  session.transcript saved; session.listening = true

[t=8s]    voice_loop detects listening=true

[t=8s]    build_model_prompt() → 4KB prompt (system + session + thoughts + transcript)

[t=8s]    claude-voice-bridge pipes prompt to `claude -p ...`

[t=11s]   Claude responds → {"tool": "tool_routine", "params": {"action": "load", "name": "morning"}}

[t=11s]   validate_action() sanitises params → env vars

[t=11s]   execute_tool() injects SPARK voice env

[t=11.1s] bin/tool-routine runs, loads morning routine, updates session

[t=11.1s] tool-routine calls tool-voice internally

[t=11.2s] enable_speaker() → espeak → aplay → HifiBerry DAC

[t=11.5s] Obi hears: "Morning! Step one: drink some water. I'll wait."

[t=11.5s] session.last_action = "tool_routine"; session.listening = false

[t=42s]   px-mind Layer 1 runs; detects obi_routine changed

[t=47s]   px-mind Layer 2 reflects; generates thought about morning energy

[t=77s]   px-mind Layer 3 expresses; tool-voice speaks an unprompted science fact

```

---

## Quick Start

```bash

# 1. Clone and enter

git clone git@github.com:adrianwedd/picar-x-hacking.git

cd picar-x-hacking

# 2. Create session state from template

cp state/session.template.json state/session.json

# 3. Activate the virtual environment

source .venv/bin/activate

# 4. Dry-run a tool to verify the setup

PX_DRY=1 bin/tool-status

# 5. Run tests (105 dry-run, no hardware needed)

python -m pytest tests/

# 6. Launch SPARK (Claude voice companion)

bin/px-spark --dry-run

```

### Hardware Prerequisites

- Raspberry Pi 4/5 with SunFounder Robot HAT

- PiCar-X chassis with pan/tilt camera mount

- USB microphone (for wake word detection)

- HifiBerry DAC or Robot HAT speaker output

- Ollama running on a network host (default: `M5.local`) for cognitive reflection

### Services (Auto-start on Boot)

```bash

sudo systemctl status px-alive             # Idle gaze drift daemon

sudo systemctl status px-wake-listen       # Wake word listener

sudo systemctl status px-battery-poll      # Battery voltage poller (writes state/battery.json)

sudo systemctl status px-api-server        # REST API + web dashboard (:8420)

sudo systemctl status px-post              # Social posting daemon (Bluesky)

sudo systemctl status px-frigate-stream    # Frigate camera RTSP stream

sudo systemctl status cloudflared          # Cloudflare Tunnel

```

---

## Tools

Every tool emits a single JSON object to stdout, supports `PX_DRY=1`, and handles errors as `{"status": "error", "error": "..."}`. The voice loop whitelists tools in `ALLOWED_TOOLS` and sanitises all parameters through `validate_action()` before execution.

### Sensors & Perception

| Tool | Description | Key Params |

|------|-------------|------------|

| `tool-status` | Telemetry snapshot (servos, battery, config) | — |

| `tool-sonar` | Ultrasonic sweep scan; returns closest angle + distance | — |

| `tool-weather` | Bureau of Meteorology observation (HTTPS with FTP fallback) | `PX_WEATHER_STATION` |

| `tool-photograph` | Capture still photo via rpicam-still | — |

| `tool-face` | Sonar sweep, then point camera at closest object | — |

| `tool-describe-scene` | Photograph + Claude vision + speak description | — |

### Motion (Gated by `confirm_motion_allowed`)

| Tool | Description | Key Params |

|------|-------------|------------|

| `tool-drive` | Drive forward/backward with steering | `PX_DIRECTION`, `PX_SPEED` (0-60), `PX_DURATION` (0.1-10s), `PX_STEER` (-35..35) |

| `tool-circle` | Clockwise circle in pulses | `PX_SPEED`, `PX_DURATION` |

| `tool-figure8` | Two-leg figure-eight pattern | `PX_SPEED`, `PX_DURATION`, `PX_REST` |

| `tool-wander` | Smart obstacle-avoiding wander: sonar sweep picks best direction, speaks while navigating | `PX_WANDER_STEPS` (1-20), `PX_WANDER_QUIET` |

| `tool-stop` | Immediate halt, reset steering to neutral | — |

### Expression

| Tool | Description | Key Params |

|------|-------------|------------|

| `tool-look` | Pan/tilt camera with easing | `PX_PAN` (-90..90), `PX_TILT` (-35..65), `PX_EASE` |

| `tool-emote` | Named emotional pose | `PX_EMOTE`: idle, curious, thinking, happy, alert, excited, sad, shy |

| `tool-voice` | Text-to-speech via espeak (auto-routes through persona if active) | `PX_TEXT` (2000 char max) |

| `tool-perform` | Multi-step choreography: simultaneous speech + motion + emotes | `PX_PERFORM_STEPS` (JSON array, max 12 steps) |

| `tool-play-sound` | Play bundled WAV file | `PX_SOUND`: chime, beep, tada, alert |

### Utility

| Tool | Description | Key Params |

|------|-------------|------------|

| `tool-time` | Speak current date and time | — |

| `tool-timer` | Background timer with chime callback | `PX_TIMER_SECONDS` (5-3600), `PX_TIMER_LABEL` |

| `tool-recall` | Speak saved notes from `state/notes.jsonl` | `PX_RECALL_LIMIT` (1-20) |

| `tool-remember` | Save a note for later recall | `PX_TEXT` (500 char max) |

| `tool-qa` | Speak arbitrary text (delegates to `tool-voice`) | `PX_TEXT` |

| `tool-api-start` | Start the REST API daemon | — |

| `tool-api-stop` | Stop the REST API daemon | — |

### SPARK — Child Companion Tools

Available only in SPARK persona mode. All support `PX_DRY=1`.

| Tool | Description | Key Params |

|------|-------------|------------|

| `tool-routine` | Daily routine manager: load, advance, complete | `PX_ROUTINE_ACTION` (load\|next\|status\|complete), `PX_ROUTINE_NAME` (morning\|homework\|bedtime\|wind-down) |

| `tool-checkin` | Emotional check-in: ask or record mood | `PX_CHECKIN_ACTION` (ask\|record), `PX_CHECKIN_MOOD` |

| `tool-celebrate` | Specific, brief positive reinforcement | `PX_CELEBRATE_TEXT` (optional) |

| `tool-transition` | Transition warning / buffer / arrival | `PX_TRANSITION_ACTION` (warn\|buffer\|arrived), `PX_TRANSITION_MINUTES`, `PX_TRANSITION_LABEL` |

| `tool-quiet` | Three S's meltdown protocol: stop, stay, safe | `PX_QUIET_ACTION` (start\|check\|end) |

| `tool-breathe` | Guided breathing exercise | `PX_BREATHE_TYPE` (simple\|box\|478), `PX_BREATHE_ROUNDS` (1-4) |

| `tool-dopamine-menu` | Interest-based activity suggestions | `PX_DOPAMINE_ENERGY` (high\|medium\|low), `PX_DOPAMINE_CONTEXT` (free\|focus\|wind-down) |

| `tool-sensory-check` | Body scan + sensory support | `PX_SENSORY_ACTION` (ask\|record), `PX_SENSORY_ISSUE` |

| `tool-repair` | Post-conflict reconnection | `PX_REPAIR_CONTEXT` (optional, private) |

### Google Workspace (optional)

Requires `gws auth login` (see [googleworkspace/cli](https://github.com/googleworkspace/cli)). Gracefully degrades if not authenticated.

| Tool | Description | Key Params |

|------|-------------|------------|

| `tool-gws-calendar` | Read upcoming calendar events | `PX_CALENDAR_ACTION` (today\|next\|week), `PX_CALENDAR_ID` |

| `tool-gws-sheets-log` | Append a row to a tracking spreadsheet | `PX_SHEETS_ID` (required, set in `.env`), `PX_SHEETS_EVENT`, `PX_SHEETS_DETAIL`, `PX_SHEETS_MOOD` |

---

## REST API

Port 8420. Bearer token authentication from `.env` (`PX_API_TOKEN`).

```bash

# Generate token

python3 -c "import secrets; print('PX_API_TOKEN=' + secrets.token_hex(32))" > .env

# Start

bin/px-api-server              # live

bin/px-api-server --dry-run    # FORCE_DRY — remote callers cannot override

```

**Public (no auth)**

| Method | Path | Description |

|--------|------|-------------|

| GET | `/` | SPARK web dashboard (text chat + quick-action buttons) |

| GET | `/api/v1/health` | Liveness probe |

| GET | `/api/v1/public/status` | Live SPARK status: persona, mood, last thought |

| GET | `/api/v1/public/vitals` | System vitals: CPU, RAM, temp, battery, disk |

| GET | `/api/v1/public/sonar` | Latest sonar reading from `sonar_live.json` |

| GET | `/api/v1/public/awareness` | Awareness snapshot: mode, Frigate, ambient, weather, time context |

| GET | `/api/v1/public/history` | Ring buffer of up to 60 vitals readings (~30 min) |

| GET | `/api/v1/public/thoughts` | Recent SPARK thoughts (newest first, `?limit=12`) |

| GET | `/api/v1/public/feed` | SPARK's public thought feed (for social posting) |

| GET | `/api/v1/public/services` | Service status dict (used by web UI) |

| POST | `/api/v1/public/chat` | Lightweight public chat with SPARK (rate-limited) |

| POST | `/api/v1/pin/verify` | Verify admin PIN (issues Bearer token for authenticated endpoints) |

| GET | `/photos/{filename}` | Serve captured photos (used by web UI photo button) |

**Authenticated (Bearer token)**

| Method | Path | Description |

|--------|------|-------------|

| POST | `/api/v1/chat` | Send text; SPARK picks a tool via LLM and executes it |

| POST | `/api/v1/tool` | Execute a tool directly: `{"tool": "tool_voice", "params": {"text": "hey"}}` |

| GET | `/api/v1/session` | Full session state |

| PATCH | `/api/v1/session` | Update: `listening`, `confirm_motion_allowed`, `wheels_on_blocks`, `persona` |

| POST | `/api/v1/session/history/clear` | Wipe conversation history (keeps other session fields) |

| GET | `/api/v1/tools` | List available tools |

| GET | `/api/v1/jobs/{id}` | Poll async job (tool_wander returns 202) |

| GET | `/api/v1/services` | Status of all managed services |

| POST | `/api/v1/services/{svc}/{action}` | Start/stop/restart a managed service |

| POST | `/api/v1/device/{action}` | Reboot or shut down the host device |

| GET | `/api/v1/logs/{service}` | Tail last N lines from a service log |

---

## Wake Word System

```bash

bin/run-wake [--wake-word "hey robot"] [--dry-run]

```

Three-stage STT pipeline in `px-wake-listen`:

1. **Wake detection** — Vosk small model, grammar-based (low CPU idle)

2. **Chime** — 440 Hz confirmation tone

3. **Transcription** — priority chain: SenseVoice → faster-whisper → sherpa-onnx → Vosk

Anti-hallucination filters: `temperature=0`, `no_speech_threshold=0.6`. Post-filters reject non-ASCII, phantom phrases, and repetitive output.

Multi-turn conversation: 5 follow-up turns by default.

Persona routing: checks session `persona` field, then utterance keywords.

---

## Python Library (`src/pxh/`)

| Module | Purpose |

|--------|---------|

| `state.py` | Thread-safe `session.json` via `FileLock`. `atomic_write()`, `rotate_log()`, `ensure_session()`. |

| `mind.py` | Cognitive loop daemon (3,300+ lines). Three-layer architecture: awareness, reflection, expression. `bin/px-mind` is a thin launcher. |

| `voice_loop.py` | Supervisor loop. `ALLOWED_TOOLS` whitelist, `TOOL_COMMANDS` dispatch, `validate_action()`. Watchdog (30s) in voice mode only. |

| `api.py` | FastAPI app, port 8420. In-memory job registry for async wander. Single-worker only. |

| `logging.py` | Structured JSON log emission to `logs/tool-.log`. Late-imports `rotate_log` from state.py. |

| `time.py` | `utc_timestamp()` via `datetime.now(timezone.utc)`. |

| `token_log.py` | LLM token usage accounting — logs prompt/response token counts per call. |

| `utils.py` | Shared utilities (`clamp()` for numeric range clamping). |

| `patch_login.py` | Monkey-patches `os.getlogin()` for systemd environments (no /dev/tty). |

---

## State & Session

Runtime state lives in `state/session.json` (gitignored). Copy the template before first use:

```bash

cp state/session.template.json state/session.json

```

| File | Purpose |

|------|---------|

| `session.json` | Core runtime state — persona, listening, motion permission, SPARK routine state |

| `awareness.json` | Layer 1 output — sonar + temporal state, transition detection |

| `thoughts.jsonl` | Layer 2 output — last 50 thoughts with mood/action/salience |

| `notes.jsonl` | Persistent memory — saved by `tool-remember`, auto-saved for high-salience thoughts |

| `battery.json` | Battery voltage — volts, pct, charging flag (written every 30s; plug/unplug detection plays audio sweep tones) |

| `mood.json` | Current mood from px-mind (written each reflection cycle) |

SPARK-specific session fields: `obi_routine`, `obi_step`, `obi_mood`, `obi_streak`, `spark_quiet_mode`.

---

## GPIO Contention Model

The PiCar-X Robot HAT MCU at I2C address `0x14` handles all servos and ADC through `robot_hat`. The `Picarx()` constructor claims GPIO5 and `close()` does not release it.

- **`px-alive`** holds a persistent `Picarx` handle

- **Tools** call `yield_alive()` (SIGUSR1 to px-alive) before claiming GPIO

- **systemd** restarts px-alive after 10s (`Restart=always`, `RestartSec=10`)

- **`os.getlogin()`** fails under systemd — monkey-patched via `usercustomize.py`

---

## Audio Pipeline

```

espeak → WAV pipe → aplay -D robothat

                            │

                    /etc/asound.conf

                    pcm.robothat → softvol → dmixer → HifiBerry DAC (card 1)

```

`robot_hat.enable_speaker()` must be called before any `aplay` output — toggles GPIO 20 HIGH for the speaker amplifier.

---

## Adding a New Tool

1. Create `bin/tool-` (bash wrapper + embedded Python heredoc via `/usr/bin/python3`)

2. Add to `ALLOWED_TOOLS` and `TOOL_COMMANDS` in `src/pxh/voice_loop.py`

3. Add `validate_action()` branch to sanitise params into env vars

4. Add to relevant system prompts in `docs/prompts/`

5. Add `yield_alive` call if it needs GPIO

6. Add a dry-run test in `tests/test_tools.py`

Every tool must: emit a single JSON object to stdout, support `PX_DRY=1`, handle errors as `{"status": "error", "error": "..."}`.

---

## Testing

```bash

source .venv/bin/activate

python -m pytest tests/                           # 450 tests (dry-run, no hardware)

python -m pytest tests/test_tools.py -v

python -m pytest tests/test_api.py -v

sudo .venv/bin/python -m pytest tests/ -m live -v  # live hardware tests (require Pi)

```

---

## Safety

- **`PX_DRY=1`** skips all motion and audio. Tools default to **live** when unset.

- **`confirm_motion_allowed: false`** blocks all motion tools.

- **`ALLOWED_TOOLS`** whitelist — LLMs cannot invoke arbitrary commands.

- **`validate_action()`** hard-clamps all parameters.

- **Watchdog** — 30-second stall detection in voice input mode.

- **Content filter** in `tool-voice` — refuses to speak dangerous how-to content.

---

## Environment Variables

| Variable | Purpose | Default |

|---|---|---|

| `PX_DRY` | `1` = dry-run, skip motion/audio | unset (live) |

| `PX_SESSION_PATH` | Override session file location | `state/session.json` |

| `PX_BYPASS_SUDO` | Skip sudo in bin scripts | unset (tests set `1`) |

| `LOG_DIR` | Override log directory | `$PROJECT_ROOT/logs` |

| `PX_VOICE_DEVICE` | ALSA output device | `robothat` |

| `PX_API_TOKEN` | REST API bearer token | from `.env` |

| `PX_WAKE_WORD` | Wake phrase | `hey robot` |

| `CODEX_CHAT_CMD` | Override LLM CLI command | set by launcher |

| `PX_WATCHDOG_STALE_SECONDS` | Watchdog timeout | `30` |

| `PX_PERSONA` | Active persona (`spark` / `vixen` / `gremlin`) | from session |

| `PX_OLLAMA_HOST` | Ollama server for cognitive reflection | `http://M5.local:11434` |

---

## Project Structure

```

picar-x-hacking/

├── bin/

│   ├── px-spark                  # SPARK launcher (Claude + child persona)

│   ├── px-env                    # Environment bootstrap (sourced by all scripts)

│   ├── px-alive                  # Idle gaze daemon (systemd)

│   ├── px-mind                   # Cognitive loop daemon

│   ├── px-wake-listen            # Wake word listener (systemd)

│   ├── px-battery-poll           # Battery voltage poller (systemd)

│   ├── px-api-server             # REST API launcher

│   ├── px-post                   # Social posting daemon (Bluesky + local feed)

│   ├── px-statusline             # Claude Code statusbar script

│   ├── px-{circle,drive,look,…}  # Hardware control scripts

│   ├── tool-{voice,look,drive,…} # Voice loop tool wrappers (38 tools)

│   ├── run-voice-loop{,-claude,-ollama}  # Voice backend launchers

│   └── claude-voice-bridge       # Claude stdin adapter

├── src/pxh/                      # Python library (10 modules)

│   ├── state.py                  # FileLock session, atomic_write, rotate_log

│   ├── mind.py                   # Cognitive loop daemon (3,300+ lines)

│   ├── voice_loop.py             # Supervisor + tool dispatch

│   ├── api.py                    # FastAPI REST API

│   ├── logging.py                # Structured JSON logging

│   ├── time.py                   # UTC timestamp helper

│   ├── token_log.py              # LLM token usage accounting

│   ├── utils.py                  # Shared utilities (clamp)

│   └── patch_login.py            # os.getlogin() systemd fix

├── site/                         # Static site (Cloudflare Pages)

│   ├── css/colors.css            # Mood colour palette (CSS vars)

│   ├── js/config.js              # API base URL config

│   └── workers/og-rewrite.js     # Cloudflare Worker for OG images

├── tests/                        # 450 tests

├── docs/prompts/

│   ├── spark-voice-system.md     # SPARK persona (child companion)

│   ├── claude-voice-system.md    # Default Claude voice loop

│   ├── codex-voice-system.md     # Codex voice loop

│   ├── persona-gremlin.md        # GREMLIN (adult, Ollama)

│   └── persona-vixen.md          # VIXEN (adult, Ollama)

├── state/                        # Runtime state (gitignored except template)

│   └── session.template.json

├── systemd/                      # Service unit files

│   ├── px-alive.service

│   ├── px-wake-listen.service

│   ├── px-battery-poll.service

│   ├── px-mind.service

│   ├── px-api-server.service

│   ├── px-post.service

│   ├── px-frigate-stream.service

│   └── cloudflared.service

├── sounds/                       # Bundled audio

├── models/                       # STT models (gitignored, ~500MB)

└── .env                          # API token (gitignored)

```

---

## Documentation

| Document | Audience | Description |

|---|---|---|

| [How Spark's Brain Works](docs/how-sparks-brain-works.md) | Kids / non-technical | ELI7 explanation of the cognitive architecture — ears, eyes, brain, and how they connect |

| [SPARK Prompt Audit](docs/spark-prompt-audit.md) | Developers | Complete inventory of every prompt SPARK uses — system-level and tool-embedded, with full text |

| [FAQ](docs/faq.md) | Everyone | Common questions about what SPARK is, how it works, and why it writes the way it does |

---

*"Neurodivergence is not a tragedy. It's a different operating system running on the same hardware."*

*— [This Wasn't in the Brochure](https://thiswasntinthebrochure.wtf)*
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adrianwedd/spark

Awesome Lists containing this project

README