https://github.com/the-crypt-keeper/cascade
Async Generative Workflows Manager
https://github.com/the-crypt-keeper/cascade
Last synced: 28 days ago
JSON representation
Async Generative Workflows Manager
- Host: GitHub
- URL: https://github.com/the-crypt-keeper/cascade
- Owner: the-crypt-keeper
- Created: 2024-12-17T22:05:24.000Z (6 months ago)
- Default Branch: master
- Last Pushed: 2025-01-17T21:27:47.000Z (5 months ago)
- Last Synced: 2025-05-08T19:05:41.021Z (28 days ago)
- Language: Python
- Size: 181 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cascade
Cascade is a Python asyncio-based streaming pipeline system for content generation tasks. It enables the construction of idempotent, parallel processing pipelines through a simple async python API.
## Key Features
- **Simple API**: Pipelines are defined using Python.
- **Streaming Architecture**: Steps process items asynchronously through named streams
- **Flexible Step Types**: Source, Transform, and Sink steps for different processing needs
- **Idempotent Processing**: Work is tracked through cascade IDs, ensuring each item is processed exactly once
- **Load Balancing**: Multiple consumers can process items from a stream with configurable weights
- **Parallel Processing**: Multiple workers can process items from a stream in parallel.## Usage
First, set OPENAI_BASE_URL, OPENAI_API_KEY enviroment variables.
Next run the pipeline with uv (which will handle creating the venv for you):
```bash
uv run your_pipeline.py
```## Example Pipelines
TODO: document code-challenege, logo-gen, example-simple-image, world-builder
## Roadmap (TODOs)
[ ] Pipeline multiple-input flows (theoretically supported but likely doesnt work right)
[ ] More Source Steps
[ ] VLM Inference Step (Image+Text to Text) maybe with ollama API## How It Works
### Streams and Steps
In the Cascade pipeline, data flows through Steps via named Streams.
Steps can:
- Produce data (Source steps)
- Consume and transform data (Transform steps)
- Consume and export data (Sink steps)Multiple Steps can consume from the same Stream with configurable load balancing strategies.
### Cascade IDs
The core concept in Cascade is the cascade ID, which tracks the lineage of each piece of data through the pipeline. Cascade IDs are built up as data flows through steps:
```
source_step:count=0 # Initial generation
source_step:count=0/transform_step # After transformation
[branch1:count=0|branch2:count=1]/merge # Merged from multiple sources
```This ID system ensures idempotency and enables tracing of data lineage.
## Architecture
- `cascade_base.py`
- **Cascade**: Main pipeline class for constructing and running pipelines
- **Stream**: Handles message passing between steps with fair load balancing
- **CascadeManager**: Coordinates steps and streams, tracks pipeline completion
- **SQLiteStorage**: Provides persistent storage and idempotency checking
- `cascade_steps.py`: Provides the core step implementations:
- Source Steps:
- [StepIdeaSource](#stepideasource): Generates data by sampling from configured sources
- Transform Steps:
- [StepExpandTemplate](#stepexpandtemplate): Expands Jinja2 templates
- [StepLLMCompletion](#stepllmcompletion): Processes text through language models
- [StepJSONParser](#stepjsonparser): Parses and transforms JSON data
- [StepText2Image](#steptext2image): Generates images from text descriptions
- Sink Steps:
- [StepJSONSink](#stepjsonsink): Exports cascade histories to JSON files
- [StepConsoleSink](#stepconsolesink): Outputs messages to console### Source Steps
Source steps generate initial data into the pipeline. They have no input streams and one or more output streams.
#### StepIdeaSource
Generates data by sampling from configured sources according to a schema.Streams:
- **output**: Produces generated data samplesParameters:
- **count**: Number of scenarios to generate (default: 1)
- **schema**: Dictionary defining data generation rules
- Each key defines a field to generate
- Values can be:
- **sample**: List to sample from
- **count**: Number of items to sample
- **always_array**: Always return as array even if count=1
- **constant**: Fixed value to useExample:
```python
await cascade.step(StepIdeaSource(
name='generate_scenario',
streams={'output': 'vars'},
params={
'count': 5,
'schema': {
'random_words': {
'sample': word_list,
'count': 3,
'always_array': True
},
'constant_value': {
'constant': 'fixed string'
}
}
}
))
```### Transform Steps
Transform steps process input data and produce transformed output. They support parallel processing through the `parallel` parameter.
#### StepExpandTemplate
Expands Jinja2 templates using input data.Streams:
- **input**: Receives data for template variables
- **output**: Produces expanded template textParameters:
- **template**: Jinja2 template string to expandExample:
```python
await cascade.step(StepExpandTemplate(
name='expand_template',
streams={
'input': 'vars:1',
'output': 'prompts'
},
params={
'template': "Template using {{variable}}"
}
))
```#### StepLLMCompletion
Processes text through language models.Streams:
- **input**: Receives prompts for completion
- **output**: Produces model responsesParameters:
- **model**: *Required* Name of model to use
- **tokenizer**: Optional tokenizer name to use text-completion instead of chat-completion endpoint
- **sampler**: Dictionary of sampling parameters
- **temperature**: Sampling temperature
- **max_tokens**: Maximum tokens to generate
- Additional model-specific parameters
- **parallel**: Number of parallel workers (default: 1)
- **schema_mode**: JSON generation mode (default: "none")
- **none**: No structured output
- **openai-schema**: Use OpenAI function schema
- **openai-json**: Force JSON object output
- **vllm**: Use vLLM guided JSON
- **llama**: Use Llama JSON schema
- **schema_json**: JSON schema for structured generation modesExample:
```python
await cascade.step(StepLLMCompletion(
name='generate',
streams={
'input': 'prompts:1',
'output': 'responses'
},
params={
'model': 'gpt-4',
'parallel': 2,
'schema_mode': 'openai-schema',
'schema_json': {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
},
'sampler': {
'temperature': 0.7,
'max_tokens': 1024
}
}
))
```#### StepJSONParser
Parses and transforms JSON data.Streams:
- **input**: Receives JSON text to parse
- **output**: Produces parsed JSON objectsParameters:
- **first_key**: Extract value of first key only
- **explode_list**: Split list field into separate outputs
- **explode_keys**: List of keys to output separatelyExample:
```python
await cascade.step(StepJSONParser(
name='parse_json',
streams={
'input': 'responses:1',
'output': 'parsed'
},
params={
'first_key': True,
'explode_list': 'items',
'explode_keys': ['key1', 'key2']
}
))
```#### StepText2Image
Generates images from text descriptions using Stable Diffusion.Streams:
- **input**: Receives text prompts
- **output**: Produces generated imagesParameters:
- **api_url**: URL of Stable Diffusion API
- **width**: Image width (default: 512)
- **height**: Image height (default: 512)
- **steps**: Number of diffusion steps (default: 20)Example:
```python
await cascade.step(StepText2Image(
name='generate_image',
streams={
'input': 'prompts:1',
'output': 'images'
},
params={
'api_url': 'http://localhost:7860',
'width': 768,
'height': 768,
'steps': 30
}
))
```### Sink Steps
Sink steps consume data from the pipeline and perform final processing. They have one or more input streams but no outputs.
#### StepJSONSink
Exports complete cascade histories to JSON files.Streams:
- **input**: Receives data to export as JSONParameters:
- **output_dir**: Directory to write JSON files (default: '.')Example:
```python
await cascade.step(StepJSONSink(
name='export_json',
streams={
'input': 'final_output:1'
},
params={
'output_dir': 'output/results'
}
))
```Output format:
```json
{
"cascade_id": "source:count=0/transform",
"history": {
"source": {"generated": "data"},
"transform": "processed result"
}
}
```#### StepConsoleSink
Outputs messages directly to console.Streams:
- **input**: Receives messages to printParameters: None
Example:
```python
await cascade.step(StepConsoleSink(
name='console',
streams={
'input': 'responses:1'
}
))
```