An open API service indexing awesome lists of open source software.

https://github.com/sleep3r/toodles

🐩 Telegram bot wrapping gemini-cli β€” real-time streaming, voice transcription, file sharing & per-topic sessions. Built in Rust.
https://github.com/sleep3r/toodles

gemini rust telegram

Last synced: 1 day ago
JSON representation

🐩 Telegram bot wrapping gemini-cli β€” real-time streaming, voice transcription, file sharing & per-topic sessions. Built in Rust.

Awesome Lists containing this project

README

          



toodles


Telegram Γ— Gemini CLI β€” streamed responses, voice, file & photo sharing, local transcription


Quick Start Β·
Features Β·
Config Β·
Architecture


Rust
Telegram
Gemini
License

---

A Telegram bot written in Rust that wraps [`gemini-cli`](https://github.com/google-gemini/gemini-cli), letting you chat with Gemini AI directly from Telegram β€” with real-time streaming, voice transcription, photo & file analysis, and per-topic session isolation.

## ✨ Features

| | Feature | Details |
|---|---|---|
| πŸ’¬ | **Real-time streaming** | In-place draft updates while the model is generating, then final formatted commit |
| ⏳ | **Instant feedback** | Immediate startup placeholder (`ΠŸΠΎΠ΄ΠΊΠ»ΡŽΡ‡Π°ΡŽ Gemini-ΡΠ΅ΡΡΠΈΡŽβ€¦`) on cold starts |
| πŸ›‘ | **Stop generation** | Inline "πŸ›‘ Stop" button to cancel generation mid-stream |
| πŸ“ | **Smart message splitting** | Long responses auto-split into multiple Telegram messages at newline boundaries β€” no truncation |
| ⚠️ | **Error feedback** | Session startup and runtime errors are surfaced to the user (no silent failure) |
| πŸ“· | **Photo analysis** | Send photos (including albums) β€” batched via aggregator and analyzed by Gemini Vision |
| πŸ“„ | **Document handling** | Send files (PDF, XLSX, etc.) β€” downloaded and forwarded to gemini-cli for processing |
| πŸ“Ž | **File sharing** | Gemini can send files back via the `ATTACH_FILE:` protocol |
| 🧩 | **Message aggregation** | Sequential messages within 1.5s are batched into a single prompt β€” handles albums, forwarded batches, and split messages |
| πŸ”₯ | **Warm session pool** | Keeps prewarmed ACP sessions to reduce first-response latency (`WARM_SESSION_POOL_SIZE`) |
| ♻️ | **Session startup retries** | Automatic retry with backoff when ACP initialization fails transiently |
| πŸŽ™ | **Voice messages** | Transcribed locally via **Parakeet V3** or cloud via **OpenAI Whisper** |
| 🧠 | **Local transcription** | Offline, no API keys β€” NVIDIA Parakeet ONNX (int8, ~478 MB) |
| πŸ“Œ | **Forum topics** | Each Telegram topic gets an isolated gemini-cli session |
| 🏷️ | **Thread auto-title** | First message sets topic title; later updates use recent-context summaries |
| πŸ”„ | **Session management** | `/new` starts fresh, `/status` shows active count |
| πŸ”’ | **Access control** | Optional user allowlist via `ALLOWED_USER_IDS` |
| πŸ–₯️ | **macOS background service** | launchd targets keep the bot running 24/7 with auto-restart |
| πŸ§™ | **Setup wizard** | Interactive `--setup` generates `.env` with guided prompts |
| 🎨 | **Customisable prompt** | System prompt configurable via `SYSTEM_PROMPT` in `.env` |
| βœ… | **CI-gated** | `check + fmt + clippy + test` on every push/PR |

## πŸš€ Quick Start

### Prerequisites

- **Rust** β‰₯ 1.70 β€” [rustup.rs](https://rustup.rs)
- **gemini-cli** β€” `npm install -g @google/gemini-cli && gemini`
- **Telegram bot token** β€” [@BotFather](https://t.me/BotFather)
- **ffmpeg** β€” `brew install ffmpeg` *(required for voice messages)*
- *(Optional)* **OpenAI API key** β€” for cloud Whisper fallback

### Install & Run

```sh
git clone https://github.com/sleep3r/toodles
cd toodles

# Option A: Interactive setup wizard (recommended)
make setup

# Option B: Manual config
cp .env.example .env
$EDITOR .env

# Run
make run # debug
make release # optimized build
make run-release # run optimized

# Optional: install as macOS launchd service (24/7)
make service-install
```

### Run as macOS background service (launchd)

```sh
make service-install # build release + install + start
make service-status # check launchd state
make service-logs # tail bot logs
```

`service-install` copies your project `.env` into `~/.config/toodles/service.env`
so launchd can read secrets consistently.

After code changes:

```sh
make service-update # rebuild release + restart service
```

If you change `.env`, run `make service-update` to sync it into the service env file.

Stop / remove service:

```sh
make service-stop
make service-uninstall
```

Optional overrides (passed as Make variables):

```sh
make LAUNCHD_LABEL=com.alex.toodles service-install
make TOODLES_ENV_FILE=/path/to/.env service-install
make LAUNCHD_WORKDIR=/Users/alexander service-install
```

## πŸ’¬ How It Works

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Telegram │───────▢│ toodles │───────▢│ gemini-cli β”‚
β”‚ user │◀─ edit β”‚ (Rust) │◀─ pipe β”‚ subprocess β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ msg β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ stdoutβ””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

1. User sends a message (text, photo, document, or voice)
2. Messages are aggregated within a 1.5s window (handles albums and split messages)
3. On cold start, a startup status is shown while ACP session is created (or grabbed from warm pool)
4. A draft placeholder with πŸ›‘ **Stop** is attached and updated during generation
5. User can press **Stop** at any time β€” generation is cancelled via `CancellationToken`
6. Final response is committed with Markdown→Telegram HTML formatting and plain-text fallback
7. Subsequent messages reuse the same topic/chat session automatically

## πŸŽ™ Voice Transcription

toodles supports two transcription backends:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Telegram Voice │────▢│ ffmpeg │────▢│ Parakeet │──── text
β”‚ (OGG Opus) β”‚ β”‚ (16kHz f32) β”‚ β”‚ V3 🦜 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
β”‚ fallback
β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
β”‚ OpenAI β”‚
β”‚ Whisper 🌐 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

| Mode | Latency | Cost | Setup |
|---|---|---|---|
| **Local** (Parakeet V3) | ~2-5s | Free | `--setup` downloads 478 MB model |
| **Cloud** (Whisper API) | ~1-3s | ~$0.006/min | Requires `OPENAI_API_KEY` |

If both are enabled, local transcription is tried first with automatic cloud fallback.

## βš™οΈ Configuration

All configuration is managed through environment variables or `.env`:

```sh
# Required
TELEGRAM_BOT_TOKEN=123456:ABC-DEF...

# Access control (leave empty for unrestricted)
ALLOWED_USER_IDS=123456789,987654321

# Gemini CLI
GEMINI_CLI_PATH=gemini # path to binary
GEMINI_CLI_COMMAND=gemini --acp # optional full ACP command
GEMINI_WORKING_DIR=/path/to/project # optional cwd
GEMINI_YOLO=true # optional auto-approve mode
DRAFT_MODE=verbose # compact | verbose draft UX
THREAD_RENAME_EVERY=4 # 0 disables auto-rename
WARM_SESSION_POOL_SIZE=1 # 0 disables warm prewarmed pool

# Optional: read additional settings from TOML
TOODLES_CONFIG=~/.config/toodles/config.toml

# System prompt β€” customise the bot's personality
SYSTEM_PROMPT=You are a helpful AI assistant. Keep answers concise.

# Voice β€” cloud (optional fallback)
OPENAI_API_KEY=sk-...

# Voice β€” local (recommended)
USE_LOCAL_TRANSCRIPTION=true
MODELS_DIR=~/.toodles/models

# Logging
RUST_LOG=info
```

> **πŸ’‘ Tip:** Run `make setup` to generate this interactively!

### Optional TOML config

You can also keep settings in `~/.config/toodles/config.toml`:

```toml
bot_token = "123456:ABC-DEF..."
gemini_cli_command = "gemini --acp"
gemini_working_dir = "/path/to/project"
gemini_yolo = true
draft_mode = "verbose"
thread_rename_every = 4
warm_session_pool_size = 1
```

You can copy `config.example.toml` as a starting point.

## πŸ€– Bot Commands

| Command | Description |
|---|---|
| `/start` | Get started πŸ‘‹ |
| `/new` | Start fresh πŸ”„ |
| `/status` | Bot status πŸ“Š |
| `/thread` | Create forum thread 🧡 |
| `/help` | Show commands πŸ’‘ |

`/thread` works in forum-enabled supergroups where the bot has topic-management rights.
You can call `/thread` from both the main chat and existing topics; Toodles creates a new topic in the same group.
The first user message in a topic sets its initial title, then Toodles refreshes the title every `THREAD_RENAME_EVERY` messages using the recent message context.

## 🧯 Cold-Start Tuning

If the first response sometimes takes too long:

- Set `WARM_SESSION_POOL_SIZE=1` (or `2`) to keep prewarmed ACP sessions ready.
- Keep `GEMINI_WORKING_DIR` on a local SSD path (avoid slow network mounts).
- Check bot logs for repeated ACP initialize retries; transient failures are retried automatically.

If `/thread` fails with "not enough rights to create a topic", grant the bot admin permission to manage topics.

## πŸ“ Architecture

```
src/
β”œβ”€β”€ main.rs β€” entry point, dispatcher, bot commands
β”œβ”€β”€ config.rs β€” Config from env + optional TOML (single gemini profile)
β”œβ”€β”€ session.rs β€” ACP session lifecycle + per-chat/topic session mapping
β”œβ”€β”€ aggregator.rs β€” message batching with debounce window + file guard ownership
β”œβ”€β”€ telegram_api.rs β€” raw Telegram API (sendMessageDraft), global HTTP client
β”œβ”€β”€ setup.rs β€” interactive setup wizard (--setup)
β”œβ”€β”€ transcription.rs β€” Parakeet V3 engine + model download
└── handlers/
β”œβ”€β”€ mod.rs β€” CancelRegistry, inline stop button, draft streaming, message splitting, Markdownβ†’HTML
β”œβ”€β”€ message.rs β€” text message handler (with aggregation)
β”œβ”€β”€ document.rs β€” document/file handler (download + aggregate + query)
β”œβ”€β”€ photo.rs β€” photo handler (download + aggregate albums + query)
└── voice.rs β€” voice handler (transcribe β†’ query)
```

**Session lifecycle:**

```mermaid
stateDiagram-v2
[*] --> New: /new or first message
New --> Ready: session created
Ready --> Query: user message
Query --> Placeholder: ⏳ + πŸ›‘ Stop button
Placeholder --> Streaming: line-by-line via BufReader
Streaming --> Cancelled: user clicks πŸ›‘
Cancelled --> Ready: ⬛ Generation stopped
Streaming --> Ready: response committed (Markdown)
Ready --> [*]: /new (reset)
```

Each chat or forum topic maps to an isolated ACP session. Queries are serialised per session via `tokio::sync::Mutex` and a per-session queue. Startup uses retries and an optional warm pool (`WARM_SESSION_POOL_SIZE`) to reduce first-token latency. During generation, the bot updates one placeholder message (draft UX), supports inline cancellation via `CancellationToken`, and commits a final Markdown→Telegram HTML response with plain-text fallback. Long responses are split across multiple Telegram messages at newline boundaries. Sequential messages and photo albums are aggregated via a 1.5s debounce window. Temporary files (photos, documents) are kept alive via `Arc` until the query completes.

## πŸ›  Makefile

```sh
make help # show all targets
make build # debug build
make release # optimized build
make run # run (debug)
make run-release # run (release)
make setup # interactive setup wizard
make test # run tests
make lint # clippy
make fmt # format code
make clean # clean artifacts
make service-install # install/start launchd service
make service-sync-env # copy .env into launchd service env
make service-update # rebuild + restart launchd service
make service-stop # stop launchd service
make service-status # print launchd status
make service-logs # tail service logs
make service-uninstall # remove launchd service
```

## πŸ“„ License

MIT β€” see [LICENSE](LICENSE).