https://github.com/sleep3r/toodles

🐩 Telegram bot wrapping gemini-cli — real-time streaming, voice transcription, file sharing & per-topic sessions. Built in Rust.
https://github.com/sleep3r/toodles

gemini rust telegram

Last synced: 1 day ago
JSON representation

🐩 Telegram bot wrapping gemini-cli — real-time streaming, voice transcription, file sharing & per-topic sessions. Built in Rust.

Host: GitHub
URL: https://github.com/sleep3r/toodles
Owner: sleep3r
License: mit
Created: 2026-03-16T22:37:53.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-04-24T14:43:01.000Z (2 months ago)
Last Synced: 2026-06-26T03:30:15.225Z (1 day ago)
Topics: gemini, rust, telegram
Language: Rust
Homepage:
Size: 546 KB
Stars: 3
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

toodles

Telegram × Gemini CLI — streamed responses, voice, file & photo sharing, local transcription

Quick Start ·
Features ·
Config ·
Architecture

---

A Telegram bot written in Rust that wraps [`gemini-cli`](https://github.com/google-gemini/gemini-cli), letting you chat with Gemini AI directly from Telegram — with real-time streaming, voice transcription, photo & file analysis, and per-topic session isolation.

## ✨ Features

| | Feature | Details |
|---|---|---|
| 💬 | **Real-time streaming** | In-place draft updates while the model is generating, then final formatted commit |
| ⏳ | **Instant feedback** | Immediate startup placeholder (`Подключаю Gemini-сессию…`) on cold starts |
| 🛑 | **Stop generation** | Inline "🛑 Stop" button to cancel generation mid-stream |
| 📝 | **Smart message splitting** | Long responses auto-split into multiple Telegram messages at newline boundaries — no truncation |
| ⚠️ | **Error feedback** | Session startup and runtime errors are surfaced to the user (no silent failure) |
| 📷 | **Photo analysis** | Send photos (including albums) — batched via aggregator and analyzed by Gemini Vision |
| 📄 | **Document handling** | Send files (PDF, XLSX, etc.) — downloaded and forwarded to gemini-cli for processing |
| 📎 | **File sharing** | Gemini can send files back via the `ATTACH_FILE:` protocol |
| 🧩 | **Message aggregation** | Sequential messages within 1.5s are batched into a single prompt — handles albums, forwarded batches, and split messages |
| 🔥 | **Warm session pool** | Keeps prewarmed ACP sessions to reduce first-response latency (`WARM_SESSION_POOL_SIZE`) |
| ♻️ | **Session startup retries** | Automatic retry with backoff when ACP initialization fails transiently |
| 🎙 | **Voice messages** | Transcribed locally via **Parakeet V3** or cloud via **OpenAI Whisper** |
| 🧠 | **Local transcription** | Offline, no API keys — NVIDIA Parakeet ONNX (int8, ~478 MB) |
| 📌 | **Forum topics** | Each Telegram topic gets an isolated gemini-cli session |
| 🏷️ | **Thread auto-title** | First message sets topic title; later updates use recent-context summaries |
| 🔄 | **Session management** | `/new` starts fresh, `/status` shows active count |
| 🔒 | **Access control** | Optional user allowlist via `ALLOWED_USER_IDS` |
| 🖥️ | **macOS background service** | launchd targets keep the bot running 24/7 with auto-restart |
| 🧙 | **Setup wizard** | Interactive `--setup` generates `.env` with guided prompts |
| 🎨 | **Customisable prompt** | System prompt configurable via `SYSTEM_PROMPT` in `.env` |
| ✅ | **CI-gated** | `check + fmt + clippy + test` on every push/PR |

## 🚀 Quick Start

### Prerequisites

- **Rust** ≥ 1.70 — [rustup.rs](https://rustup.rs)
- **gemini-cli** — `npm install -g @google/gemini-cli && gemini`
- **Telegram bot token** — [@BotFather](https://t.me/BotFather)
- **ffmpeg** — `brew install ffmpeg` *(required for voice messages)*
- *(Optional)* **OpenAI API key** — for cloud Whisper fallback

### Install & Run

```sh
git clone https://github.com/sleep3r/toodles
cd toodles

# Option A: Interactive setup wizard (recommended)
make setup

# Option B: Manual config
cp .env.example .env
$EDITOR .env

# Run
make run # debug
make release # optimized build
make run-release # run optimized

# Optional: install as macOS launchd service (24/7)
make service-install
```

### Run as macOS background service (launchd)

```sh
make service-install # build release + install + start
make service-status # check launchd state
make service-logs # tail bot logs
```

`service-install` copies your project `.env` into `~/.config/toodles/service.env`
so launchd can read secrets consistently.

After code changes:

```sh
make service-update # rebuild release + restart service
```

If you change `.env`, run `make service-update` to sync it into the service env file.

Stop / remove service:

```sh
make service-stop
make service-uninstall
```

Optional overrides (passed as Make variables):

```sh
make LAUNCHD_LABEL=com.alex.toodles service-install
make TOODLES_ENV_FILE=/path/to/.env service-install
make LAUNCHD_WORKDIR=/Users/alexander service-install
```

## 💬 How It Works

```
┌───────────┐ ┌──────────┐ ┌──────────────┐
│ Telegram │───────▶│ toodles │───────▶│ gemini-cli │
│ user │◀─ edit │ (Rust) │◀─ pipe │ subprocess │
└───────────┘ msg └──────────┘ stdout└──────────────┘
```

1. User sends a message (text, photo, document, or voice)
2. Messages are aggregated within a 1.5s window (handles albums and split messages)
3. On cold start, a startup status is shown while ACP session is created (or grabbed from warm pool)
4. A draft placeholder with 🛑 **Stop** is attached and updated during generation
5. User can press **Stop** at any time — generation is cancelled via `CancellationToken`
6. Final response is committed with Markdown→Telegram HTML formatting and plain-text fallback
7. Subsequent messages reuse the same topic/chat session automatically

## 🎙 Voice Transcription

toodles supports two transcription backends:

```
┌────────────────────┐ ┌──────────────┐ ┌───────────┐
│ Telegram Voice │────▶│ ffmpeg │────▶│ Parakeet │──── text
│ (OGG Opus) │ │ (16kHz f32) │ │ V3 🦜 │
└────────────────────┘ └──────────────┘ └─────┬─────┘
│ fallback
┌─────▼─────┐
│ OpenAI │
│ Whisper 🌐 │
└───────────┘
```

| Mode | Latency | Cost | Setup |
|---|---|---|---|
| **Local** (Parakeet V3) | ~2-5s | Free | `--setup` downloads 478 MB model |
| **Cloud** (Whisper API) | ~1-3s | ~$0.006/min | Requires `OPENAI_API_KEY` |

If both are enabled, local transcription is tried first with automatic cloud fallback.

## ⚙️ Configuration

All configuration is managed through environment variables or `.env`:

```sh
# Required
TELEGRAM_BOT_TOKEN=123456:ABC-DEF...

# Access control (leave empty for unrestricted)
ALLOWED_USER_IDS=123456789,987654321

# Gemini CLI
GEMINI_CLI_PATH=gemini # path to binary
GEMINI_CLI_COMMAND=gemini --acp # optional full ACP command
GEMINI_WORKING_DIR=/path/to/project # optional cwd
GEMINI_YOLO=true # optional auto-approve mode
DRAFT_MODE=verbose # compact | verbose draft UX
THREAD_RENAME_EVERY=4 # 0 disables auto-rename
WARM_SESSION_POOL_SIZE=1 # 0 disables warm prewarmed pool

# Optional: read additional settings from TOML
TOODLES_CONFIG=~/.config/toodles/config.toml

# System prompt — customise the bot's personality
SYSTEM_PROMPT=You are a helpful AI assistant. Keep answers concise.

# Voice — cloud (optional fallback)
OPENAI_API_KEY=sk-...

# Voice — local (recommended)
USE_LOCAL_TRANSCRIPTION=true
MODELS_DIR=~/.toodles/models

# Logging
RUST_LOG=info
```

> **💡 Tip:** Run `make setup` to generate this interactively!

### Optional TOML config

You can also keep settings in `~/.config/toodles/config.toml`:

```toml
bot_token = "123456:ABC-DEF..."
gemini_cli_command = "gemini --acp"
gemini_working_dir = "/path/to/project"
gemini_yolo = true
draft_mode = "verbose"
thread_rename_every = 4
warm_session_pool_size = 1
```

You can copy `config.example.toml` as a starting point.

## 🤖 Bot Commands

| Command | Description |
|---|---|
| `/start` | Get started 👋 |
| `/new` | Start fresh 🔄 |
| `/status` | Bot status 📊 |
| `/thread` | Create forum thread 🧵 |
| `/help` | Show commands 💡 |

`/thread` works in forum-enabled supergroups where the bot has topic-management rights.
You can call `/thread` from both the main chat and existing topics; Toodles creates a new topic in the same group.
The first user message in a topic sets its initial title, then Toodles refreshes the title every `THREAD_RENAME_EVERY` messages using the recent message context.

## 🧯 Cold-Start Tuning

If the first response sometimes takes too long:

- Set `WARM_SESSION_POOL_SIZE=1` (or `2`) to keep prewarmed ACP sessions ready.
- Keep `GEMINI_WORKING_DIR` on a local SSD path (avoid slow network mounts).
- Check bot logs for repeated ACP initialize retries; transient failures are retried automatically.

If `/thread` fails with "not enough rights to create a topic", grant the bot admin permission to manage topics.

## 📐 Architecture

```
src/
├── main.rs — entry point, dispatcher, bot commands
├── config.rs — Config from env + optional TOML (single gemini profile)
├── session.rs — ACP session lifecycle + per-chat/topic session mapping
├── aggregator.rs — message batching with debounce window + file guard ownership
├── telegram_api.rs — raw Telegram API (sendMessageDraft), global HTTP client
├── setup.rs — interactive setup wizard (--setup)
├── transcription.rs — Parakeet V3 engine + model download
└── handlers/
├── mod.rs — CancelRegistry, inline stop button, draft streaming, message splitting, Markdown→HTML
├── message.rs — text message handler (with aggregation)
├── document.rs — document/file handler (download + aggregate + query)
├── photo.rs — photo handler (download + aggregate albums + query)
└── voice.rs — voice handler (transcribe → query)
```

**Session lifecycle:**

```mermaid
stateDiagram-v2
[*] --> New: /new or first message
New --> Ready: session created
Ready --> Query: user message
Query --> Placeholder: ⏳ + 🛑 Stop button
Placeholder --> Streaming: line-by-line via BufReader
Streaming --> Cancelled: user clicks 🛑
Cancelled --> Ready: ⬛ Generation stopped
Streaming --> Ready: response committed (Markdown)
Ready --> [*]: /new (reset)
```

Each chat or forum topic maps to an isolated ACP session. Queries are serialised per session via `tokio::sync::Mutex` and a per-session queue. Startup uses retries and an optional warm pool (`WARM_SESSION_POOL_SIZE`) to reduce first-token latency. During generation, the bot updates one placeholder message (draft UX), supports inline cancellation via `CancellationToken`, and commits a final Markdown→Telegram HTML response with plain-text fallback. Long responses are split across multiple Telegram messages at newline boundaries. Sequential messages and photo albums are aggregated via a 1.5s debounce window. Temporary files (photos, documents) are kept alive via `Arc` until the query completes.

## 🛠 Makefile

```sh
make help # show all targets
make build # debug build
make release # optimized build
make run # run (debug)
make run-release # run (release)
make setup # interactive setup wizard
make test # run tests
make lint # clippy
make fmt # format code
make clean # clean artifacts
make service-install # install/start launchd service
make service-sync-env # copy .env into launchd service env
make service-update # rebuild + restart launchd service
make service-stop # stop launchd service
make service-status # print launchd status
make service-logs # tail service logs
make service-uninstall # remove launchd service
```

## 📄 License

MIT — see [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sleep3r/toodles

Awesome Lists containing this project

README

toodles