An open API service indexing awesome lists of open source software.

https://github.com/runablehq/agents-e2b-connection-issue

Reproduction of agents-e2b connections issue
https://github.com/runablehq/agents-e2b-connection-issue

Last synced: 10 months ago
JSON representation

Reproduction of agents-e2b connections issue

Awesome Lists containing this project

README

          

# ๐Ÿ› Agents SDK - Minimal Reproduction Issue

![agents-header](https://github.com/user-attachments/assets/f6d99eeb-1803-4495-9c5e-3cf07a37b402)

Deploy to Cloudflare

**โš ๏ธ ISSUE REPRODUCTION**: This repository demonstrates a critical issue with the [`agents`](https://www.npmjs.com/package/agents) SDK when deployed to Cloudflare Workers. Subsequent requests in the same chat room stall unpredictably at random steps, while the same code works perfectly locally. Added a fetch call to verify if network calls are broken once the agent breaks.

## ๐Ÿšจ The Problem

**Environment**: Cloudflare Workers (production deployment)
**Affected**: Agents SDK chat flow with multiple requests per session
**Status**: โŒ Broken in Workers, โœ… Works locally

### Issue Description

We're migrating a Node.js project using the Agents SDK to Cloudflare Workers. The application uses:
- One agent per user serving multiple chat sessions
- SQLite for internal session mapping
- E2B sandbox integration
- Similar flow to the official chat bot example

### Reproduction Steps

1. Deploy this starter to Cloudflare Workers
2. Send the first chat message โ†’ โœ… **Works perfectly**
3. Send multiple messages in the same chat room but different sessions โ†’ โŒ **Hangs indefinitely** after some requests (~4 in this minimal reproduction, but about 1-2 requests in our production environment)
4. Subsequent requests stall at random steps in production:
- Authentication
- E2B sandbox loading
- Tool initialization
- Network calls never resolve or timeout

### Expected vs Actual Behavior

| Environment | First Request | Subsequent Requests |
|-------------|---------------|--------------------|
| **Local Development** | โœ… Works | โœ… Works |
| **Cloudflare Workers** | โœ… Works | โŒ Hangs randomly |

### Current Flow (onChatMessage)

```typescript
// This flow works locally but fails on Workers after first request
1. Authentication
2. Load E2B sandbox
3. Billing service check
4. Initialize tools
5. Return stream from Agents SDK // โ† Hangs here or earlier steps
```

## ๐Ÿ” How to Reproduce

### Prerequisites
- Cloudflare account
- OpenAI API key
- This exact starter template

### Setup Instructions

1. **Clone and Install**:
```bash
npx create-cloudflare@latest --template cloudflare/agents-starter
cd agents-starter
npm install
```

2. **Configure Environment**:
```bash
# Create .dev.vars file
echo "OPENAI_API_KEY=your_openai_api_key" > .dev.vars
```

3. **Test Locally** (this works):
```bash
npm start
# Open browser, send multiple messages โ†’ All work fine
```

4. **Deploy to Workers** (this breaks):
```bash
npm run deploy
# Visit deployed URL, send first message โ†’ Works
# Send second message โ†’ Hangs indefinitely
```

### ๐Ÿ”ง Technical Details

**Architecture**:
- One agent instance per user
- Multiple chat sessions mapped via SQLite
- Hyperdrive connection to PostgreSQL (GCP)
- E2B sandbox integration
- Streaming responses via Agents SDK

**Failure Pattern**:
- โœ… First request after deployment: Always succeeds
- โŒ Subsequent requests: Hang at unpredictable steps
- ๐Ÿ”„ No timeout or error - requests just never resolve
- ๐Ÿ  Local development: No issues whatsoever

**Affected Components**:
- Authentication flow
- E2B sandbox initialization
- Tool system setup
- Agents SDK streaming
- Network calls in general

## ๐Ÿ“Š Issue Analysis

### What We Know

- **Timing**: Issue started recently (wasn't happening before)
- **Scope**: Only affects Cloudflare Workers deployment
- **Pattern**: First request always works, subsequent ones fail
- **Randomness**: Failure occurs at different steps unpredictably
- **No Errors**: Requests don't timeout or throw errors, they just hang

### Suspected Causes

1. **Workers Runtime Differences**:
- Different event loop behavior
- Request/response lifecycle differences
- Memory or state management issues
- Max allowed connections

2. **Agents SDK Integration**:
- Potential Workers-specific compatibility issue
- State persistence between requests
- Streaming response handling

3. **External Dependencies**:
- E2B sandbox connection pooling
- Hyperdrive connection management
- SQLite state between requests

### ๐Ÿงช Debugging Steps Taken

- [x] Confirmed local development works perfectly
- [x] Verified first request always succeeds in Workers
- [x] Identified random failure points in subsequent requests
- [x] Ruled out API key or authentication issues
- [ ] Need investigation into Workers-specific behavior
- [ ] Need Agents SDK team input on Workers compatibility

## ๐Ÿ“ Project Structure

```
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ app.tsx # Chat UI implementation
โ”‚ โ”œโ”€โ”€ server.ts # โš ๏ธ Main agent logic (where issues occur)
โ”‚ โ”œโ”€โ”€ tools.ts # Tool definitions (hangs during init)
โ”‚ โ”œโ”€โ”€ utils.ts # Helper functions
โ”‚ โ””โ”€โ”€ styles.css # UI styling
โ”œโ”€โ”€ wrangler.jsonc # Workers configuration
โ””โ”€โ”€ .dev.vars.example # Environment template
```

### ๐Ÿ” Key Files for Investigation

- **`src/server.ts`**: Contains the main chat flow that hangs
- **`src/tools.ts`**: Tool initialization that sometimes fails
- **`wrangler.jsonc`**: Workers configuration that might affect behavior
- **Network calls**: Any external API calls that hang in Workers

## ๐Ÿ› ๏ธ Help Needed

### For Cloudflare Team

1. **Workers Runtime Investigation**:
- Are there known issues with persistent connections in Workers?
- How should long-running agent sessions be handled?
- Any Workers-specific considerations for the Agents SDK?

2. **Debugging Assistance**:
- Best practices for debugging hanging requests in Workers
- Logging/monitoring recommendations for this type of issue
- Workers-specific profiling tools

### For Agents SDK Team

1. **Workers Compatibility**:
- Is the Agents SDK fully tested on Cloudflare Workers?
- Any known limitations or required configurations?
- Recommended patterns for multi-request agent sessions?

2. **State Management**:
- How should agent state persist between requests in Workers?
- Are there Workers-specific initialization patterns?
- Connection pooling best practices?

### For Community

1. **Similar Issues**:
- Has anyone experienced similar hanging request issues?
- Any workarounds or solutions found?
- Alternative deployment patterns that work?

2. **Testing Help**:
- Can others reproduce this issue with the same setup?
- Different Workers configurations to try?
- Alternative agent architectures that work reliably?

---

## ๐Ÿ“‹ Original Template Information

Click to expand original starter template documentation

### Features

- ๐Ÿ’ฌ Interactive chat interface with AI
- ๐Ÿ› ๏ธ Built-in tool system with human-in-the-loop confirmation
- ๐Ÿ“… Advanced task scheduling (one-time, delayed, and recurring via cron)
- ๐ŸŒ“ Dark/Light theme support
- โšก๏ธ Real-time streaming responses
- ๐Ÿ”„ State management and chat history
- ๐ŸŽจ Modern, responsive UI

### Customization Guide

#### Adding New Tools

Add new tools in `tools.ts` using the tool builder:

```typescript
// Example of a tool that requires confirmation
const searchDatabase = tool({
description: "Search the database for user records",
parameters: z.object({
query: z.string(),
limit: z.number().optional(),
}),
// No execute function = requires confirmation
});

// Example of an auto-executing tool
const getCurrentTime = tool({
description: "Get current server time",
parameters: z.object({}),
execute: async () => new Date().toISOString(),
});
```

#### Use a different AI model provider

The starting implementation uses the [`ai-sdk`](https://sdk.vercel.ai/docs/introduction) and [OpenAI provider](https://sdk.vercel.ai/providers/ai-sdk-providers/openai), but you can use alternatives like [`workers-ai-provider`](https://sdk.vercel.ai/providers/community-providers/cloudflare-workers-ai) or [`anthropic`](https://sdk.vercel.ai/providers/ai-sdk-providers/anthropic).

## Learn More

- [`agents`](https://github.com/cloudflare/agents/blob/main/packages/agents/README.md)
- [Cloudflare Agents Documentation](https://developers.cloudflare.com/agents/)
- [Cloudflare Workers Documentation](https://developers.cloudflare.com/workers/)

## License

MIT