https://github.com/andersonby/vv-agent

A lightweight agent framework for production runtime. Cycle-based execution with pluggable LLM backends, tool dispatch, memory compression, and distributed scheduling.
https://github.com/andersonby/vv-agent
agent-framework
Last synced: 17 days ago
JSON representation
A lightweight agent framework for production runtime. Cycle-based execution with pluggable LLM backends, tool dispatch, memory compression, and distributed scheduling.
Host: GitHub
URL: https://github.com/andersonby/vv-agent
Owner: AndersonBY
Created: 2026-02-28T06:01:15.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-05-31T04:04:43.000Z (17 days ago)
Last Synced: 2026-05-31T06:08:16.106Z (17 days ago)
Topics: agent-framework
Language: Python
Homepage:
Size: 1.89 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md
Awesome Lists containing this project

README

          # vv-agent

[中文文档](README_ZH.md)

A lightweight agent framework extracted from VectorVein's production runtime. Cycle-based execution with pluggable LLM backends, tool dispatch, memory compression, and distributed scheduling.

## Architecture

```

Agent / RunConfig / ModelSettings

└── Runner

    └── AgentRuntime

        ├── CycleRunner          # single LLM turn: context -> completion -> tool calls

        ├── ToolCallRunner       # tool dispatch, directive convergence

        ├── RuntimeHookManager   # before/after hooks

        ├── MemoryManager        # automatic history compression

        └── ExecutionBackend     # inline, thread, or Celery scheduling

```

The public SDK entry points are exported from `vv_agent`: `Agent`, `Runner`,

`RunConfig`, `ModelSettings`, `function_tool`, `Session`, typed `RunEvent`

objects, and the interactive session API for desktop/runtime integrations.

Runtime internals still use `RuntimeTask` (`AgentTask` during the remaining

internal migration), `AgentResult`, `Message`, `CycleRecord`, and `ToolCall`.

Task completion is tool-driven: the agent calls `task_finish` or `ask_user` to signal terminal states. No implicit "last message = answer" heuristics.

## Setup

```bash

cp local_settings.example.py local_settings.py

# Fill in your API keys and endpoints in local_settings.py

```

```bash

uv sync --dev

uv run pytest

```

## Quick Start

### CLI

```bash

uv run vv-agent --prompt "Summarize this framework" --backend moonshot --model kimi-k2.6

# With per-cycle logging

uv run vv-agent --prompt "Summarize this framework" --backend moonshot --model kimi-k2.6 --verbose

```

CLI flags: `--settings-file`, `--backend`, `--model`, `--verbose`.

### Programmatic SDK

```python

from vv_agent import Agent, RunConfig, Runner, function_tool

@function_tool

def read_order(order_id: str) -> str:

    """Read order information."""

    return "order details"

agent = Agent(

    name="ops",

    instructions="Check facts first, then answer.",

    model="kimi-k2.6",

    tools=[read_order],

)

result = Runner.run_sync(agent, "Analyze order 123", run_config=RunConfig(

    default_backend="moonshot",

))

print(result.status, result.final_output)

```

`Agent.output_type` can coerce JSON final output into `dict`, `list`,

dataclasses, or Pydantic-style models. Decorated tools may accept a leading

`ToolContext` parameter; it is passed at invocation time and omitted from the

tool JSON schema.

### Streaming And Sessions

`RunConfig.workspace` controls the workspace for a run. `RunConfig.session`

accepts `MemorySession`, `SQLiteSession`, or `RedisSession` to persist message

history across runs.

```python

from vv_agent import Agent, MemorySession, RunConfig, Runner

agent = Agent(name="assistant", instructions="Remember context.", model="kimi-k2.6")

session = MemorySession("thread-001")

config = RunConfig(

    default_backend="moonshot",

    workspace="./workspace/thread-001",

    session=session,

)

Runner.run_sync(agent, "Inspect the project", run_config=config)

for event in Runner.stream_sync(agent, "Continue and report progress", run_config=config):

    if event.type == "assistant_delta":

        print(event.delta, end="")

```

The lower-level `AgentRuntime` API remains available for backend integrations

that need direct cycle-loop control.

Install Redis support with `uv sync --extra redis` or inject a Redis-compatible

client when constructing `RedisSession`.

### Interactive Sessions

Use `Runner` for one-shot runs, streamed runs, and conversation history managed

by `RunConfig.session`. Use `InteractiveAgentClient` when the host application

needs a stateful, bidirectional runtime session with stable session ids,

runtime listeners, queued steering prompts, follow-up turns, cancellation, and

shared tool state.

```python

from pathlib import Path

from vv_agent import (

    AgentSessionOptions,

    InteractiveAgentClient,

    InteractiveAgentDefinition,

)

from vv_agent.runtime.backends import ThreadBackend

client = InteractiveAgentClient(

    options=AgentSessionOptions(

        settings_file=Path("local_settings.py"),

        default_backend="moonshot",

        workspace=Path("./workspace/thread-001"),

        execution_backend=ThreadBackend(max_workers=4),

    )

)

session = client.create_session(

    session_id="thread-001",

    agent=InteractiveAgentDefinition(

        description="Operate in the user's workspace and report progress.",

        model="kimi-k2.6",

        no_tool_policy="finish",

    ),

)

unsubscribe = session.subscribe(lambda event, payload: print(event, payload))

try:

    run = session.prompt("Inspect the workspace")

    print(run.result.status, run.result.final_answer)

finally:

    unsubscribe()

```

Interactive sessions are additive to the normal SDK facade; they do not

reintroduce the old 0.1 `AgentSDKClient` or `AgentSDKOptions` names.

### Agent As Tool, Handoff, And Policy

Use `agent.as_tool()` when a child agent should return a result to the parent

agent and let the parent continue. Use `handoff()` when control should transfer

to the target agent and the target output should finish the run.

```python

from vv_agent import Agent, RunConfig, Runner, ToolPolicy, handoff

from vv_agent.constants import TASK_FINISH_TOOL_NAME

researcher = Agent(name="researcher", instructions="Collect facts.", model="kimi-k2.6")

writer = Agent(

    name="writer",

    instructions="Write from research.",

    model="kimi-k2.6",

    tools=[researcher.as_tool(name="research", description="Collect facts.")],

)

triage = Agent(

    name="triage",

    instructions="Transfer writing tasks.",

    model="kimi-k2.6",

    handoffs=[handoff(agent=writer, description="Use for writing.")],

)

result = Runner.run_sync(

    triage,

    "Write a short report.",

    run_config=RunConfig(

        default_backend="moonshot",

        tool_policy=ToolPolicy(allowed_tools=[TASK_FINISH_TOOL_NAME, "transfer_to_writer"]),

    ),

)

```

Tools can request approval with `@function_tool(needs_approval=True)`. By

default the run enters `WAIT_USER` before the tool body is called and emits a

`ToolApprovalRequestedEvent`. `ToolPolicy(approval="never")` disables that

approval gate for trusted runs.

### Guardrails And Tracing

Input guardrails run before the model provider is called. Output guardrails run

after a final output is available. Trace processors receive lightweight run and

tool spans.

```python

from vv_agent import Agent, GuardrailResult, RunConfig, Runner, input_guardrail

@input_guardrail

def reject_empty(ctx, input_text: str) -> GuardrailResult:

    del ctx

    if not input_text.strip():

        return GuardrailResult.block("input is required")

    return GuardrailResult.allow()

agent = Agent(

    name="assistant",

    instructions="Answer carefully.",

    model="kimi-k2.6",

    input_guardrails=[reject_empty],

)

result = Runner.run_sync(

    agent,

    "Summarize this project.",

    run_config=RunConfig(default_backend="moonshot", tracing={"workflow_name": "summary"}),

)

```

### Shell Runtime Configuration (Windows)

`bash` runtime defaults are a **startup/session configuration**, not tool-call arguments.

- Run defaults: pass `bash_shell`, `windows_shell_priority`, and `bash_env`

  through `RunConfig.metadata`.

- Per-agent defaults: put the same keys in `Agent.metadata`.

- Recommended Windows priority: `["git-bash", "powershell", "cmd"]`

- On Windows, bash-tool child processes default `PYTHONUTF8=1` and `PYTHONIOENCODING=utf-8` unless already overridden via the parent environment or `bash_env`.

- On Windows, bash-tool child processes are launched with hidden-console flags so GUI hosts can run `bash` / `powershell` commands without flashing a terminal window.

- `Runner.run_sync(...)` and `Runner.stream_sync(...)` both inherit compiled

  shell metadata.

- The `bash` tool schema description includes a runtime shell hint (resolved shell kind + invocation prefix), so the model sees which shell command style is expected before calling the tool.

- The runtime shell hint is frozen per task/session-run to keep tool schemas stable across cycles and preserve LLM prompt cache efficiency.

- Runner/CLI-generated runtime tasks attach structured `system_prompt_sections`

  metadata to the system message when prompt sections are available, so

  Anthropic prompt-cache breakpoints can keep the stable prompt prefix hot while

  treating current time and session-memory blocks as volatile.

```python

from vv_agent import Agent, RunConfig, Runner

agent = Agent(

    name="desktop",

    instructions="Desktop helper",

    model="kimi-k2.6",

    metadata={"bash_env": {"HTTP_PROXY": "http://127.0.0.1:7890"}},

)

result = Runner.run_sync(

    agent,

    "Check the workspace.",

    run_config=RunConfig(

        default_backend="moonshot",

        metadata={

            "windows_shell_priority": ["git-bash", "powershell", "cmd"],

            "bash_env": {"PIP_INDEX_URL": "https://pypi.tuna.tsinghua.edu.cn/simple"},

        },

    ),

)

```

## Execution Backends

The cycle loop is delegated to a pluggable `ExecutionBackend`.

| Backend | Use case |

|---------|----------|

| `InlineBackend` | Default. Synchronous, single-process. |

| `ThreadBackend` | Thread pool. Non-blocking `submit()` returns a `Future`. |

| `CeleryBackend` | Distributed. Each cycle dispatched as an independent Celery task. |

### CeleryBackend

Two modes:

- **Inline fallback** (no `RuntimeRecipe`): cycles run in-process, same as `InlineBackend`.

- **Distributed** (with `RuntimeRecipe`): each cycle is a Celery task. Workers rebuild the `AgentRuntime` from the recipe and load state from a shared `StateStore` (SQLite or Redis).

```python

from vv_agent.runtime.backends.celery import CeleryBackend, RuntimeRecipe, register_cycle_task

register_cycle_task(celery_app)

recipe = RuntimeRecipe(

    settings_file="local_settings.py",

    backend="moonshot",

    model="kimi-k2.6",

    workspace="./workspace",

)

backend = CeleryBackend(celery_app=app, state_store=store, runtime_recipe=recipe)

runtime = AgentRuntime(llm_client=llm, tool_registry=registry, execution_backend=backend)

```

Install celery extras: `uv sync --extra celery`.

### Cancellation and Streaming

```python

from vv_agent.runtime import CancellationToken, ExecutionContext

# Cancel from another thread

token = CancellationToken()

ctx = ExecutionContext(cancellation_token=token)

result = runtime.run(task, ctx=ctx)

def on_stream_event(event: dict) -> None:

    if event.get("event") == "assistant_delta":

        print(event.get("content_delta", ""), end="")

# Stream LLM output events, including assistant deltas and tool progress

ctx = ExecutionContext(stream_callback=on_stream_event)

result = runtime.run(task, ctx=ctx)

```

### Runtime Log Payloads

`tool_result` runtime events carry full tool output in `content` and any structured tool payload in `metadata` (no implicit truncation of `content`).

`content_preview` and `assistant_preview` are still emitted for UI convenience.

If you need shorter previews for logs/transport, configure an explicit preview limit:

```python

from vv_agent import RunConfig

config = RunConfig(

    log_preview_chars=220,  # optional: enable preview truncation explicitly

)

```

## Workspace Backends

Workspace file I/O is delegated to a pluggable `WorkspaceBackend` protocol. All built-in file tools (`read_file`, `write_file`, `list_files`, etc.) go through this abstraction.

`list_files` includes built-in safety defaults for large workspaces:

- Returns at most `500` paths per call by default (`max_results` can tune this, with hard cap).

- Uses `ripgrep` (`rg`) for fast local traversal when available, with automatic fallback to Python walk.

- `workspace_grep` also uses `rg` for local workspaces (with Python fallback), defaults to smart-case matching (lowercase patterns are case-insensitive; patterns with uppercase stay case-sensitive), and skips hidden/common dependency roots unless explicitly included.

- `workspace_grep` returns model-facing grep text in `ToolExecutionResult.content`, while structured matches/counts live in `ToolExecutionResult.metadata`.

- When listing from workspace root, common dependency/cache roots (for example `node_modules`, `.venv`, `.git`) are summarized instead of expanded.

- You can still inspect those paths explicitly by setting `path` to that directory (or by setting `include_ignored=true`).

- Supports `scan_limit` to stop early on very large trees; when triggered, response sets `count_is_estimate=true`.

| Backend | Use case |

|---------|----------|

| `LocalWorkspaceBackend` | Default. Reads/writes to a local directory with path-escape protection. |

| `MemoryWorkspaceBackend` | Pure in-memory dict storage. Great for testing and sandboxed runs. |

| `S3WorkspaceBackend` | S3-compatible object storage (AWS S3, Aliyun OSS, MinIO, Cloudflare R2). |

```python

from vv_agent.workspace import LocalWorkspaceBackend, MemoryWorkspaceBackend

# Explicit local backend

runtime = AgentRuntime(

    llm_client=llm,

    tool_registry=registry,

    workspace_backend=LocalWorkspaceBackend(Path("./workspace")),

)

# In-memory backend for testing

runtime = AgentRuntime(

    llm_client=llm,

    tool_registry=registry,

    workspace_backend=MemoryWorkspaceBackend(),

)

```

### S3WorkspaceBackend

Install the optional S3 dependency: `uv pip install 'vv-agent[s3]'`.

```python

from vv_agent.workspace import S3WorkspaceBackend

backend = S3WorkspaceBackend(

    bucket="my-bucket",

    prefix="agent-workspace",

    endpoint_url="https://oss-cn-hangzhou.aliyuncs.com",  # or None for AWS

    aws_access_key_id="...",

    aws_secret_access_key="...",

    addressing_style="virtual",  # "path" for MinIO

)

```

### Custom Backend

Implement the `WorkspaceBackend` protocol (8 methods) to plug in any storage:

```python

from vv_agent.workspace import WorkspaceBackend

class MyBackend:

    def list_files(self, base: str, glob: str) -> list[str]: ...

    def read_text(self, path: str) -> str: ...

    def read_bytes(self, path: str) -> bytes: ...

    def write_text(self, path: str, content: str, *, append: bool = False) -> int: ...

    def file_info(self, path: str) -> FileInfo | None: ...

    def exists(self, path: str) -> bool: ...

    def is_file(self, path: str) -> bool: ...

    def mkdir(self, path: str) -> None: ...

```

## Modules

| Module | Description |

|--------|-------------|

| `vv_agent.runtime.AgentRuntime` | Top-level state machine (completed / wait_user / max_cycles / failed) |

| `vv_agent.runtime.CycleRunner` | Single LLM turn and cycle record construction |

| `vv_agent.runtime.ToolCallRunner` | Tool execution with directive convergence |

| `vv_agent.runtime.RuntimeHookManager` | Hook dispatch (before/after LLM, tool call, memory compact) |

| `vv_agent.runtime.StateStore` | Checkpoint persistence protocol (`InMemoryStateStore` / `SqliteStateStore` / `RedisStateStore`) |

| `vv_agent.memory.MemoryManager` | Context compression when history exceeds threshold |

| `vv_agent.workspace` | Pluggable file storage: `LocalWorkspaceBackend`, `MemoryWorkspaceBackend`, `S3WorkspaceBackend` |

| `vv_agent.tools` | Built-in tools plus `function_tool`, `FunctionTool`, and structured tool outputs |

| `vv_agent` | Public SDK: `Agent`, `Runner`, `RunConfig`, `ModelSettings`, tools, sessions, typed events |

| `vv_agent.sdk` | Legacy migration internals; new user code should not use this as the main entry point |

| `vv_agent.skills` | Agent Skills support (`SKILL.md` parsing, validation, unified normalization, prompt rendering with budget management, `activate_skill` tool) |

| `vv_agent.llm.VVLlmClient` | Unified LLM interface via `vv-llm` (endpoint rotation, retry, streaming) |

| `vv_agent.config` | Model/endpoint/key resolution from `local_settings.py` |

## Memory Compaction

`MemoryManager` now measures context size in tokens and compacts history when a model-derived auto-compaction threshold is exceeded.

- Task-level knobs:

  - `memory_compact_threshold` (default `128000`, legacy fallback only when token counting is unavailable)

  - `memory_threshold_percentage` (warning threshold percentage, default `90`)

- Compile mapping:

  - `AgentCompiler` forwards stable agent/run metadata into `RuntimeTask`.

  - Runtime-only compaction knobs remain metadata-backed until promoted into

    stable public fields.

- Token budget model:

  - `effective_context_window = model_context_window - reserved_output_tokens`

  - `autocompact_threshold = effective_context_window - autocompact_buffer_tokens`

  - Defaults come from `vv-llm` model metadata when available, otherwise fall back to `200000 / 16000 / 13000`

- Effective-length strategy (backend-aligned):

  - If previous cycle token usage exists:

    - `effective_length = previous_prompt_tokens + token_count(recent_tool_messages)`

  - Otherwise fallback to:

    - `vv_llm.chat_clients.utils.get_message_token_counts(...)`

    - If tokenizer resolution fails, use a local CJK-aware estimate

- Compaction pipeline:

  1. Preemptive microcompact: clear old large tool results when usage crosses `microcompact_trigger_ratio`

  2. Session Memory extraction: persist key facts before full summarization so they survive later compactions

  3. Structural cleanup (stale tool calls, orphan tool messages, assistant-no-tool collapse, old tool result artifactization)

  4. If still over threshold, generate a compressed memory summary that preserves original user messages, file operations, current work state, and resolved errors

  5. If the provider still returns prompt-too-long, retry with forced compaction once, then progressively stronger emergency tail-dropping

  6. After full compaction, re-inject relevant workspace files into `` under a bounded token budget

- Session Memory behavior:

  - Stored in `workspace/.memory/session//session_memory.json` by default

  - Scoped to the current session when `metadata.session_id` is present; otherwise scoped to the current `task_id`

  - New sessions/tasks start without inherited Session Memory from previous sessions/tasks

  - Injected into the first system message on every cycle as ``

  - Extraction reuses the configured memory summary backend/model

  - Full compaction resets transcript tracking but preserves persisted memory entries

  - Sub-tasks disable Session Memory by default to avoid parent/child memory-file contamination

### Runtime metadata keys

Pass these via `Agent.metadata` or `RunConfig.metadata`; the compiler forwards

them into `RuntimeTask.metadata`:

- `memory_keep_recent_messages`

- `model_context_window`

- `reserved_output_tokens`

- `autocompact_buffer_tokens`

- `microcompact_trigger_ratio`

- `microcompact_keep_recent_cycles`

- `microcompact_min_result_length`

- `microcompact_compactable_tools`

- `include_memory_warning`

- `session_memory_enabled` / `enable_session_memory`

- `session_memory_min_tokens`

- `session_memory_max_tokens`

- `session_memory_min_text_messages`

- `session_memory_storage_dir`

- `tool_result_compact_threshold`

- `tool_result_keep_last`

- `tool_result_excerpt_head`

- `tool_result_excerpt_tail`

- `tool_calls_keep_last`

- `assistant_no_tool_keep_last`

- `tool_result_artifact_dir`

- `summary_event_limit`

### Memory summary model selection priority

Priority is strict:

1. `RuntimeTask.metadata`

   - `memory_summary_backend` / `memory_summary_model`

   - aliases: `compress_memory_summary_backend` / `compress_memory_summary_model`

   - aliases: `memory_compress_backend` / `memory_compress_model`

2. `local_settings.py` constants

   - `DEFAULT_USER_MEMORY_SUMMARIZE_BACKEND` / `DEFAULT_USER_MEMORY_SUMMARIZE_MODEL`

   - aliases: `DEFAULT_MEMORY_SUMMARIZE_BACKEND` / `DEFAULT_MEMORY_SUMMARIZE_MODEL`

   - aliases: `VV_AGENT_MEMORY_SUMMARY_BACKEND` / `VV_AGENT_MEMORY_SUMMARY_MODEL`

3. Fallback

   - runtime `default_backend` + current task `model`

## Built-in Tools

`list_files`, `file_info`, `read_file`, `write_file`, `file_str_replace`, `workspace_grep`, `compress_memory`, `todo_write`, `task_finish`, `ask_user`, `bash`, `read_image`, `create_sub_task`, `sub_task_status`.

Custom tools can be registered via `ToolRegistry.register()`.

The `bash` tool supports two background paths:

- Explicit background: pass `run_in_background=true`, receive a `session_id` immediately, then poll with `check_background_command`.

- Timeout handoff: if a foreground command reaches `timeout`, it is moved into a background session instead of failing immediately. The tool returns a `session_id`, and the session emits terminal background-command events when that process completes, fails, or times out.

## Sub-agents

Use `Agent.as_tool()` when the parent agent should call a child agent and then

continue. Use `handoff()` when the child agent should take over and finish the

run. Legacy `create_sub_task` tools still exist in the runtime while migration

continues, but they are no longer the primary public SDK contract.

Each delegated sub-task now runs in a real `AgentSession` (session id defaults to the sub-task id). Tool payloads include `session_id`, and runtime events include stable identifiers (`task_id` / `session_id`) so host apps can subscribe, persist, and stream sub-task progress independently, including `sub_agent_assistant_delta` and `sub_agent_tool_call_progress` events.

Batch mode in `create_sub_task` dispatches valid sub-task items through the runtime execution backend's `parallel_map`, so synchronous batches run concurrently when the backend supports parallel execution.

Use `sub_task_status` to query legacy runtime sub-task states, inspect

lightweight progress snapshots (`detail_level=snapshot`), or send follow-up

messages to running/completed sub-tasks.

Before a completed sub-task is resumed, the runtime now sanitizes the saved session transcript: empty assistant turns, thinking-only turns, orphaned tool results, and unresolved tail tool calls are removed so the next follow-up prompt resumes from a coherent history.

Sub-task runtime metadata now includes `task_id`, `session_id`, and `browser_scope_key` for each sub-agent run, so session-scoped tools (for example, browser controllers) stay isolated across parallel sub-tasks.

Host apps can interrupt a currently running sub-agent by calling `vv_agent.runtime.engine.steer_sub_agent_session(session_id=..., prompt=...)`.

When a sub-agent uses a different model from the parent, the runtime needs `settings_file` and `default_backend` to resolve the LLM client.

## Examples

The `examples/` directory now contains public SDK cookbook scripts plus a small

set of lower-level runtime integration examples. See

[`examples/README.md`](examples/README.md) for the full list.

```bash

uv run python examples/01_quick_start.py

uv run python examples/24_workspace_backends.py

```

## Testing

```bash

uv run pytest                              # unit tests (no network)

uv run ruff check .                        # lint

uv run ty check                            # type check

V_AGENT_RUN_LIVE_TESTS=1 uv run pytest -m live   # integration tests (needs real LLM)

```

Environment variables for live tests:

| Variable | Default | Description |

|----------|---------|-------------|

| `V_AGENT_LOCAL_SETTINGS` | `local_settings.py` | Settings file path |

| `V_AGENT_LIVE_BACKEND` | `moonshot` | LLM backend |

| `V_AGENT_LIVE_MODEL` | `kimi-k2.6` | Model name |

| `V_AGENT_ENABLE_BASE64_KEY_DECODE` | - | Set `1` to enable base64 API key decoding |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/andersonby/vv-agent

Awesome Lists containing this project

README