https://github.com/sleep3r/toodles
π© Telegram bot wrapping gemini-cli β real-time streaming, voice transcription, file sharing & per-topic sessions. Built in Rust.
https://github.com/sleep3r/toodles
gemini rust telegram
Last synced: 1 day ago
JSON representation
π© Telegram bot wrapping gemini-cli β real-time streaming, voice transcription, file sharing & per-topic sessions. Built in Rust.
- Host: GitHub
- URL: https://github.com/sleep3r/toodles
- Owner: sleep3r
- License: mit
- Created: 2026-03-16T22:37:53.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-24T14:43:01.000Z (2 months ago)
- Last Synced: 2026-06-26T03:30:15.225Z (1 day ago)
- Topics: gemini, rust, telegram
- Language: Rust
- Homepage:
- Size: 546 KB
- Stars: 3
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
toodles
Telegram Γ Gemini CLI β streamed responses, voice, file & photo sharing, local transcription
Quick Start Β·
Features Β·
Config Β·
Architecture
---
A Telegram bot written in Rust that wraps [`gemini-cli`](https://github.com/google-gemini/gemini-cli), letting you chat with Gemini AI directly from Telegram β with real-time streaming, voice transcription, photo & file analysis, and per-topic session isolation.
## β¨ Features
| | Feature | Details |
|---|---|---|
| π¬ | **Real-time streaming** | In-place draft updates while the model is generating, then final formatted commit |
| β³ | **Instant feedback** | Immediate startup placeholder (`ΠΠΎΠ΄ΠΊΠ»ΡΡΠ°Ρ Gemini-ΡΠ΅ΡΡΠΈΡβ¦`) on cold starts |
| π | **Stop generation** | Inline "π Stop" button to cancel generation mid-stream |
| π | **Smart message splitting** | Long responses auto-split into multiple Telegram messages at newline boundaries β no truncation |
| β οΈ | **Error feedback** | Session startup and runtime errors are surfaced to the user (no silent failure) |
| π· | **Photo analysis** | Send photos (including albums) β batched via aggregator and analyzed by Gemini Vision |
| π | **Document handling** | Send files (PDF, XLSX, etc.) β downloaded and forwarded to gemini-cli for processing |
| π | **File sharing** | Gemini can send files back via the `ATTACH_FILE:` protocol |
| π§© | **Message aggregation** | Sequential messages within 1.5s are batched into a single prompt β handles albums, forwarded batches, and split messages |
| π₯ | **Warm session pool** | Keeps prewarmed ACP sessions to reduce first-response latency (`WARM_SESSION_POOL_SIZE`) |
| β»οΈ | **Session startup retries** | Automatic retry with backoff when ACP initialization fails transiently |
| π | **Voice messages** | Transcribed locally via **Parakeet V3** or cloud via **OpenAI Whisper** |
| π§ | **Local transcription** | Offline, no API keys β NVIDIA Parakeet ONNX (int8, ~478 MB) |
| π | **Forum topics** | Each Telegram topic gets an isolated gemini-cli session |
| π·οΈ | **Thread auto-title** | First message sets topic title; later updates use recent-context summaries |
| π | **Session management** | `/new` starts fresh, `/status` shows active count |
| π | **Access control** | Optional user allowlist via `ALLOWED_USER_IDS` |
| π₯οΈ | **macOS background service** | launchd targets keep the bot running 24/7 with auto-restart |
| π§ | **Setup wizard** | Interactive `--setup` generates `.env` with guided prompts |
| π¨ | **Customisable prompt** | System prompt configurable via `SYSTEM_PROMPT` in `.env` |
| β
| **CI-gated** | `check + fmt + clippy + test` on every push/PR |
## π Quick Start
### Prerequisites
- **Rust** β₯ 1.70 β [rustup.rs](https://rustup.rs)
- **gemini-cli** β `npm install -g @google/gemini-cli && gemini`
- **Telegram bot token** β [@BotFather](https://t.me/BotFather)
- **ffmpeg** β `brew install ffmpeg` *(required for voice messages)*
- *(Optional)* **OpenAI API key** β for cloud Whisper fallback
### Install & Run
```sh
git clone https://github.com/sleep3r/toodles
cd toodles
# Option A: Interactive setup wizard (recommended)
make setup
# Option B: Manual config
cp .env.example .env
$EDITOR .env
# Run
make run # debug
make release # optimized build
make run-release # run optimized
# Optional: install as macOS launchd service (24/7)
make service-install
```
### Run as macOS background service (launchd)
```sh
make service-install # build release + install + start
make service-status # check launchd state
make service-logs # tail bot logs
```
`service-install` copies your project `.env` into `~/.config/toodles/service.env`
so launchd can read secrets consistently.
After code changes:
```sh
make service-update # rebuild release + restart service
```
If you change `.env`, run `make service-update` to sync it into the service env file.
Stop / remove service:
```sh
make service-stop
make service-uninstall
```
Optional overrides (passed as Make variables):
```sh
make LAUNCHD_LABEL=com.alex.toodles service-install
make TOODLES_ENV_FILE=/path/to/.env service-install
make LAUNCHD_WORKDIR=/Users/alexander service-install
```
## π¬ How It Works
```
βββββββββββββ ββββββββββββ ββββββββββββββββ
β Telegram βββββββββΆβ toodles βββββββββΆβ gemini-cli β
β user βββ edit β (Rust) βββ pipe β subprocess β
βββββββββββββ msg ββββββββββββ stdoutββββββββββββββββ
```
1. User sends a message (text, photo, document, or voice)
2. Messages are aggregated within a 1.5s window (handles albums and split messages)
3. On cold start, a startup status is shown while ACP session is created (or grabbed from warm pool)
4. A draft placeholder with π **Stop** is attached and updated during generation
5. User can press **Stop** at any time β generation is cancelled via `CancellationToken`
6. Final response is committed with MarkdownβTelegram HTML formatting and plain-text fallback
7. Subsequent messages reuse the same topic/chat session automatically
## π Voice Transcription
toodles supports two transcription backends:
```
ββββββββββββββββββββββ ββββββββββββββββ βββββββββββββ
β Telegram Voice ββββββΆβ ffmpeg ββββββΆβ Parakeet βββββ text
β (OGG Opus) β β (16kHz f32) β β V3 π¦ β
ββββββββββββββββββββββ ββββββββββββββββ βββββββ¬ββββββ
β fallback
βββββββΌββββββ
β OpenAI β
β Whisper π β
βββββββββββββ
```
| Mode | Latency | Cost | Setup |
|---|---|---|---|
| **Local** (Parakeet V3) | ~2-5s | Free | `--setup` downloads 478 MB model |
| **Cloud** (Whisper API) | ~1-3s | ~$0.006/min | Requires `OPENAI_API_KEY` |
If both are enabled, local transcription is tried first with automatic cloud fallback.
## βοΈ Configuration
All configuration is managed through environment variables or `.env`:
```sh
# Required
TELEGRAM_BOT_TOKEN=123456:ABC-DEF...
# Access control (leave empty for unrestricted)
ALLOWED_USER_IDS=123456789,987654321
# Gemini CLI
GEMINI_CLI_PATH=gemini # path to binary
GEMINI_CLI_COMMAND=gemini --acp # optional full ACP command
GEMINI_WORKING_DIR=/path/to/project # optional cwd
GEMINI_YOLO=true # optional auto-approve mode
DRAFT_MODE=verbose # compact | verbose draft UX
THREAD_RENAME_EVERY=4 # 0 disables auto-rename
WARM_SESSION_POOL_SIZE=1 # 0 disables warm prewarmed pool
# Optional: read additional settings from TOML
TOODLES_CONFIG=~/.config/toodles/config.toml
# System prompt β customise the bot's personality
SYSTEM_PROMPT=You are a helpful AI assistant. Keep answers concise.
# Voice β cloud (optional fallback)
OPENAI_API_KEY=sk-...
# Voice β local (recommended)
USE_LOCAL_TRANSCRIPTION=true
MODELS_DIR=~/.toodles/models
# Logging
RUST_LOG=info
```
> **π‘ Tip:** Run `make setup` to generate this interactively!
### Optional TOML config
You can also keep settings in `~/.config/toodles/config.toml`:
```toml
bot_token = "123456:ABC-DEF..."
gemini_cli_command = "gemini --acp"
gemini_working_dir = "/path/to/project"
gemini_yolo = true
draft_mode = "verbose"
thread_rename_every = 4
warm_session_pool_size = 1
```
You can copy `config.example.toml` as a starting point.
## π€ Bot Commands
| Command | Description |
|---|---|
| `/start` | Get started π |
| `/new` | Start fresh π |
| `/status` | Bot status π |
| `/thread` | Create forum thread π§΅ |
| `/help` | Show commands π‘ |
`/thread` works in forum-enabled supergroups where the bot has topic-management rights.
You can call `/thread` from both the main chat and existing topics; Toodles creates a new topic in the same group.
The first user message in a topic sets its initial title, then Toodles refreshes the title every `THREAD_RENAME_EVERY` messages using the recent message context.
## π§― Cold-Start Tuning
If the first response sometimes takes too long:
- Set `WARM_SESSION_POOL_SIZE=1` (or `2`) to keep prewarmed ACP sessions ready.
- Keep `GEMINI_WORKING_DIR` on a local SSD path (avoid slow network mounts).
- Check bot logs for repeated ACP initialize retries; transient failures are retried automatically.
If `/thread` fails with "not enough rights to create a topic", grant the bot admin permission to manage topics.
## π Architecture
```
src/
βββ main.rs β entry point, dispatcher, bot commands
βββ config.rs β Config from env + optional TOML (single gemini profile)
βββ session.rs β ACP session lifecycle + per-chat/topic session mapping
βββ aggregator.rs β message batching with debounce window + file guard ownership
βββ telegram_api.rs β raw Telegram API (sendMessageDraft), global HTTP client
βββ setup.rs β interactive setup wizard (--setup)
βββ transcription.rs β Parakeet V3 engine + model download
βββ handlers/
βββ mod.rs β CancelRegistry, inline stop button, draft streaming, message splitting, MarkdownβHTML
βββ message.rs β text message handler (with aggregation)
βββ document.rs β document/file handler (download + aggregate + query)
βββ photo.rs β photo handler (download + aggregate albums + query)
βββ voice.rs β voice handler (transcribe β query)
```
**Session lifecycle:**
```mermaid
stateDiagram-v2
[*] --> New: /new or first message
New --> Ready: session created
Ready --> Query: user message
Query --> Placeholder: β³ + π Stop button
Placeholder --> Streaming: line-by-line via BufReader
Streaming --> Cancelled: user clicks π
Cancelled --> Ready: β¬ Generation stopped
Streaming --> Ready: response committed (Markdown)
Ready --> [*]: /new (reset)
```
Each chat or forum topic maps to an isolated ACP session. Queries are serialised per session via `tokio::sync::Mutex` and a per-session queue. Startup uses retries and an optional warm pool (`WARM_SESSION_POOL_SIZE`) to reduce first-token latency. During generation, the bot updates one placeholder message (draft UX), supports inline cancellation via `CancellationToken`, and commits a final MarkdownβTelegram HTML response with plain-text fallback. Long responses are split across multiple Telegram messages at newline boundaries. Sequential messages and photo albums are aggregated via a 1.5s debounce window. Temporary files (photos, documents) are kept alive via `Arc` until the query completes.
## π Makefile
```sh
make help # show all targets
make build # debug build
make release # optimized build
make run # run (debug)
make run-release # run (release)
make setup # interactive setup wizard
make test # run tests
make lint # clippy
make fmt # format code
make clean # clean artifacts
make service-install # install/start launchd service
make service-sync-env # copy .env into launchd service env
make service-update # rebuild + restart launchd service
make service-stop # stop launchd service
make service-status # print launchd status
make service-logs # tail service logs
make service-uninstall # remove launchd service
```
## π License
MIT β see [LICENSE](LICENSE).