https://github.com/milistu/callm

Keep callm and process thousands of requests without (rate) limits
https://github.com/milistu/callm

anthropic api claude cohere deepseek gemini google llm openai parallel requests voyageai

Last synced: 2 months ago
JSON representation

Keep callm and process thousands of requests without (rate) limits

Host: GitHub
URL: https://github.com/milistu/callm
Owner: milistu
License: mit
Created: 2025-09-13T13:06:48.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-12-17T18:04:03.000Z (6 months ago)
Last Synced: 2025-12-21T02:15:34.694Z (6 months ago)
Topics: anthropic, api, claude, cohere, deepseek, gemini, google, llm, openai, parallel, requests, voyageai
Language: Python
Homepage:
Size: 189 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

          


  
callm

  Keep callm and process thousands of requests without (rate) limits




  

  

  



  Installation •

  Quick Start •

  Providers •

  Examples •

  Contributing



---

## 😌 Why callm?

Building LLM-powered applications often means processing **thousands of API requests**. You've probably experienced:

| Problem | Without callm | With callm |

|---------|---------------|------------|

| **Rate limit errors** | Constant 429 errors, manual sleep/retry | Automatic RPM & TPM throttling |

| **Retry logic** | Write custom backoff for each project | Built-in exponential backoff with jitter |

| **Token tracking** | No visibility into usage | Real-time token consumption metrics |

| **Boilerplate code** | Copy-paste the same async code everywhere | One function call, any provider |

| **Waiting for batch APIs** | Provider batch APIs take up to 24 hours | Results in minutes, not hours |

| **Multiple SDKs** | Install openai, anthropic, cohere, ... | One library, all providers |

**Stop rewriting the same parallel processing code.** callm handles the infrastructure so you can focus on your application.

> *Testing multiple providers? Just swap the provider class—no new dependencies, no code changes. Find what works best for your use case.*

## Installation

```bash

pip install callm-py

```

**From source:**

```bash

git clone https://github.com/milistu/callm.git

cd callm

pip install -e .

```

## Quick Start

Process 1,000 product descriptions to extract structured data—in under a minute:

```python

import asyncio

from callm import process_requests, RateLimitConfig

from callm.providers import OpenAIProvider

# Configure your provider

provider = OpenAIProvider(

    api_key="sk-...",

    model="gpt-5-mini",

    request_url="https://api.openai.com/v1/responses",

)

# Your data processing requests

products = [

    {"id": 1, "description": "Nike Air Max 90 - Classic sneakers in white/black, size 10"},

    {"id": 2, "description": "Sony WH-1000XM5 Wireless Headphones - Noise cancelling, 30hr battery"},

    # ... thousands more

]

requests = [

    {

        "input": f"Extract brand, category, and key features from: {p['description']}",

        "metadata": {"product_id": p["id"]},

    }

    for p in products

]

async def main():

    results = await process_requests(

        provider=provider,

        requests=requests,

        rate_limit=RateLimitConfig(

            max_requests_per_minute=5_000,    # Stay under your tier limit

            max_tokens_per_minute=2_000_000,

        ),

    )

    print(f"Processed {results.stats.successful} requests in {results.stats.duration_seconds:.1f}s")

    print(f"Tokens used: {results.stats.total_input_tokens + results.stats.total_output_tokens:,}")

    # Access results

    for result in results.successes:

        print(f"Product {result.metadata['product_id']}: {result.response}")

asyncio.run(main())

```

## Features

- **Precise Rate Limiting** — Token buckets for RPM and TPM, respects provider limits

- **Smart Retries** — Exponential backoff with jitter, automatic 429/5xx handling

- **Usage Tracking** — Metrics for input tokens and output tokens

- **Flexible I/O** — Process from Python lists or JSONL files, output to memory or disk

- **Structured Outputs** — Support for Pydantic models and JSON schemas

- **Provider Agnostic** — Same API across OpenAI, Anthropic, Gemini, DeepSeek, and more

## Supported Providers

  

    

      


      OpenAI


      _{Chat, Responses, Embeddings}

    

    

      


      Anthropic


      _{Messages API}

    

    

      


      Gemini


      _{Generate, Embeddings}

    

  

  

    

      


      DeepSeek


      _{Chat Completions}

    

    

      


      Cohere


      _{Embed API}

    

    

      


      Voyage AI


      _Embeddings

    

  

## Examples

Explore real-world use cases in the [`examples/`](examples/) directory:

| Use Case | Description |

|----------|-------------|

| [**Data Extraction**](examples/data_extraction/) | Extract structured data from product listings, invoices |

| [**Embeddings**](examples/embeddings/) | Generate embeddings for RAG and semantic search |

| [**Evaluation**](examples/evaluation/) | Multi-judge consensus evaluation |

| [**Synthetic Data**](examples/synthetic_data/) | Generate training data and evaluation sets |

| [**Classification**](examples/classification/) | Sentiment analysis, content moderation |

| [**Translation**](examples/translation/) | Dataset translation for multilingual evaluation |

### Processing Modes

callm supports four processing modes depending on your input source and output destination:

| Input | Output | Best For |

|-------|--------|----------|

| Python list | In-memory | Small batches, interactive use |

| Python list | JSONL file | Medium batches, need persistence |

| JSONL file | JSONL file | Large batches, low memory |

| JSONL file | In-memory | Loading saved requests, testing |

```python

# 1. List → Memory (small batches)

results = await process_requests(

    provider=provider,

    requests=my_list,

    rate_limit=rate_limit,

)

# Access: results.successes, results.failures

# 2. List → File (persist results)

results = await process_requests(

    provider=provider,

    requests=my_list,

    rate_limit=rate_limit,

    output_path="results.jsonl",

)

# 3. File → File (large batches, low memory)

results = await process_requests(

    provider=provider,

    requests="input.jsonl",

    rate_limit=rate_limit,

    output_path="results.jsonl",

)

# 4. File → Memory (reload saved requests)

results = await process_requests(

    provider=provider,

    requests="input.jsonl",

    rate_limit=rate_limit,

)

```

### Configuration

```python

from callm import RateLimitConfig, RetryConfig

# Rate limiting (required)

rate_limit = RateLimitConfig(

    max_requests_per_minute=1000,

    max_tokens_per_minute=100_000,

)

# Retry behavior (optional, sensible defaults)

retry = RetryConfig(

    max_attempts=5,

    base_delay_seconds=0.5,

    max_delay_seconds=15.0,

    jitter=0.1,

)

results = await process_requests(

    provider=provider,

    requests=requests,

    rate_limit=rate_limit,

    retry=retry,

)

```

## API Reference

### `process_requests()`

Main function for parallel API request processing.

| Parameter | Type | Description |

|-----------|------|-------------|

| `provider` | `BaseProvider` | Provider instance (OpenAI, Anthropic, etc.) |

| `requests` | `list[dict] \| str` | List of request dicts or path to JSONL file |

| `rate_limit` | `RateLimitConfig` | RPM and TPM limits |

| `retry` | `RetryConfig` | Optional retry configuration |

| `output_path` | `str` | Optional path for output JSONL (enables streaming) |

| `errors_path` | `str` | Optional path for error JSONL |

| `logging_level` | `int` | Logging verbosity (default: 20/INFO) |

**Returns:** `ProcessingResults` with `successes`, `failures`, and `stats`.

## Contributing

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

```bash

# Setup development environment

git clone https://github.com/milistu/callm.git

cd callm

uv sync --dev

uv run pre-commit install

# Run tests

uv run nox

```

## License

MIT License - see [LICENSE](LICENSE) for details.

---



  _{Built with 🧡 for engineers who process data at scale}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/milistu/callm

Awesome Lists containing this project

README

callm