https://github.com/Saganaki22/ComfyUI-Foundation-1

ComfyUI custom nodes for Foundation-1 | Structured Text-to-Sample Diffusion for Music Production
https://github.com/Saganaki22/ComfyUI-Foundation-1
audiogen audiogeneration comfyui comfyui-custom-node comfyui-nodes foundation-1 python3 stability-ai text-to-sample texttoaudio
Last synced: 5 days ago
JSON representation
ComfyUI custom nodes for Foundation-1 | Structured Text-to-Sample Diffusion for Music Production
Host: GitHub
URL: https://github.com/Saganaki22/ComfyUI-Foundation-1
Owner: Saganaki22
License: other
Created: 2026-03-17T05:25:57.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-03-17T07:09:07.000Z (3 months ago)
Last Synced: 2026-03-17T21:48:54.757Z (3 months ago)
Topics: audiogen, audiogeneration, comfyui, comfyui-custom-node, comfyui-nodes, foundation-1, python3, stability-ai, text-to-sample, texttoaudio
Language: Python
Homepage: https://huggingface.co/RoyalCities/Foundation-1
Size: 479 KB
Stars: 7
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project

awesome-comfyui - **Foundation-1** - 1 | structured text-to-sample generation for music production (Workflows (5027) sorted by GitHub Stars)
README

          


  





  ComfyUI-Foundation-1


  


    ComfyUI custom nodes for


    Foundation-1 — Structured Text-to-Sample Diffusion for Music Production

  

  

    

    

    

    

    

  





---

## Overview

**Foundation-1** is a structured text-to-sample diffusion model for music production. It understands instrument identity, timbre, FX, musical notation, BPM, bar count, and key as separate composable controls — enabling precise, predictable synthesis of musical loops.

This ComfyUI wrapper provides native node-based integration with:

- **Structured prompting** with instrument, timbre, FX, and notation tags

- **Tempo-synced generation** with BPM and bar count controls

- **Key-aware synthesis** with full western key support

- **Native progress bars** and interruption support

> **Companion Video:** [Watch the Foundation-1 overview and design philosophy](https://www.youtube.com/watch?v=O2iBBWeWaL8)

https://github.com/user-attachments/assets/6c00c56d-ea46-4feb-9bf8-876ecb2487b2

---

## Features

- Structured Text-to-Sample — Generate musical loops from structured text prompts

- **Audio-to-Audio Variations** — Connect any audio input to create variations/interpretations guided by your prompt

- Tempo-Synced Duration — Automatic duration calculation from BPM and bar count

- 24 Musical Keys — Full western key support (major and minor)

- Native ComfyUI Integration — AUDIO noodle outputs, progress bars, interruption support

- Optimized Performance — Support for SDPA, FlashAttention 2, SageAttention

- Smart Auto-Download — Model weights auto-downloaded from HuggingFace on first use

- Smart Caching — Optional model offloading to CPU RAM between runs

---

## Requirements

- **GPU:** NVIDIA GPU with **8GB VRAM minimum** (CUDA required)

  - Typical VRAM usage: **~7GB** during generation

  - Generation speed: **~20 it/s** (iterations per second) with default sampler

- **CPU/MPS:** Not supported — Foundation-1 uses Flash Attention which is CUDA-only

- **Python:** 3.10+

- **CUDA:** 11.8+

- **Flash Attention:** Required (comes with PyTorch 2.0+ SDPA)

- **SageAttention:** Optional but recommended (tested on 2.2.0)

> [!NOTE]

> **Attention Requirements:**

> - **Minimum:** Flash Attention 2 (built into PyTorch 2.0+ SDPA backend)

> - **Recommended:** SageAttention 2.2.0+ for better performance

> [!IMPORTANT]

> **First Run Requires Internet**

> - The T5 text encoder (~900MB) is downloaded automatically from HuggingFace on first use

> - Model weights (~3GB) are also downloaded on first use

> - Subsequent runs work offline once everything is cached

---

## Installation

Click to expand installation methods

### Method 1: ComfyUI Manager (Recommended)

1. Open ComfyUI Manager

2. Search for "Foundation-1"

3. Click Install

4. Restart ComfyUI

### Method 2: Manual Installation

```bash

cd ComfyUI/custom_nodes

git clone https://github.com/saganaki22/ComfyUI-Foundation-1.git

cd ComfyUI-Foundation-1

python install.py

```

> **Note:** The `install.py` script handles all dependency installation. See the [Dependency Details](#dependency-details) section below for what gets installed and why.

### ComfyUI Manager "Install" Button Not Working?

If the Manager install button still does nothing after the fixes in v0.1.4, the nuclear option is:

```bash

cd ComfyUI/custom_nodes

git clone https://github.com/Saganaki22/ComfyUI-Foundation-1.git

cd ComfyUI-Foundation-1

python install.py

```

---

## Dependency Details

Click to expand dependency installation details

All dependencies are automatically installed at ComfyUI startup. You do **not** need to run `pip install` manually.

### Already included in ComfyUI

These packages are typically already present in ComfyUI environments:

```

torch

torchaudio      # Required for Foundation-1 audio processing

numpy

safetensors

transformers    # T5 text encoder

huggingface_hub # Model downloads

```

If `torchaudio` is missing for some reason, install it manually:

```bash

pip install torchaudio

```

### Normal pip installs (auto-installed)

These packages are installed normally:

```

einops>=0.7.0

alias-free-torch>=0.0.6

ema-pytorch>=0.2.3

einops-exts>=0.0.3

```

### Special installs with --no-deps

These packages require special handling:

| Package | Install Command | Reason |

|---------|-----------------|--------|

| `stable-audio-tools` | `pip install stable-audio-tools --no-deps` | Avoids `pandas==2.0.2` which has no Python 3.13 wheel and fails to build from source |

| `k-diffusion` | `pip install k-diffusion==0.1.1 --no-deps --target ./k_diffusion_files/` | Installed to private directory to avoid conflicts with ComfyUI's bundled k_diffusion and the `clip->pkg_resources` import chain issue |

### What NOT to install manually

> [!WARNING]

> **Do NOT run these commands:**

> ```bash

> pip install stable-audio-tools      # WRONG - will pull pandas==2.0.2

> pip install k-diffusion             # WRONG - will conflict with ComfyUI's version

> ```

>

> These are handled automatically at startup with the correct flags.

### Optional packages

- `sageattention` — Install manually for better performance: `pip install sageattention`

### Installing SageAttention (Recommended)

```bash

pip install sageattention

```

Tested and working with SageAttention 2.2.0.

---

## Quick Start

### Basic Workflow

1. **Add Model Loader**

   - Add `Foundation-1 Model Loader` node

   - Model auto-downloads from [RoyalCities/Foundation-1](https://huggingface.co/RoyalCities/Foundation-1) on first use

   - Select attention type (auto/sdpa/flash/sageattention)

2. **Add Generator**

   - Add `Foundation-1 Generate` node

   - Connect model output from loader

   - Enter tags: `Synth Lead, Warm, Bright, Melody`

   - Select BPM, bars, and key

3. **Run!**

   - Execute the workflow

   - Audio output ready for ComfyUI audio nodes

---

## Node Reference

### Foundation-1 Model Loader

Loads a Foundation-1 checkpoint and prepares it for generation.

**Inputs:**

| Parameter | Type | Description |

|-----------|------|-------------|

| `model` | dropdown | Foundation-1 checkpoint (auto-downloaded on first run) |

| `attention` | dropdown | Attention mechanism: `auto`, `sdpa`, `flash_attention_2`, `sageattention` |

**Outputs:**

| Output | Type | Description |

|--------|------|-------------|

| `model` | FOUNDATION1_MODEL | Loaded model for generator node |

---

### Foundation-1 Generate

Generates a tempo-synced musical loop. Optionally accepts an audio input for variation generation.

**Required Inputs:**

| Parameter | Type | Default | Description |

|-----------|------|---------|-------------|

| `model` | FOUNDATION1_MODEL | — | Connect from Model Loader |

| `tags` | STRING | `Synth Lead, Warm, ...` | Instrument, timbre, FX, notation tags |

| `bpm` | dropdown | `140 BPM` | Tempo (100-150 BPM options) |

| `bars` | dropdown | `8 Bars` | Loop length (4 or 8 bars) |

| `key` | dropdown | `E minor` | Musical key (24 options) |

| `steps` | INT | 250 | Diffusion steps (10-500) |

| `cfg_scale` | FLOAT | 7.0 | Classifier-free guidance (1.0-15.0) |

| `seed` | INT | 0 | Generation seed |

| `sampler_type` | dropdown | `dpmpp-3m-sde` | Diffusion sampler |

| `sigma_min` | FLOAT | 0.3 | Minimum noise level |

| `sigma_max` | FLOAT | 500.0 | Maximum noise level |

| `unload_after_generate` | BOOLEAN | False | Offload to CPU RAM after generation |

| `torch_compile` | BOOLEAN | False | Enable torch.compile (first run slower) |

**Optional Inputs (Audio Variation):**

| Parameter | Type | Default | Description |

|-----------|------|---------|-------------|

| `audio` | AUDIO | None | Input audio for variation — connect from LoadAudio, previous generation, etc. |

| `init_noise_level` | FLOAT | 0.7 | Variation strength (0.01–1.0). Lower = closer to input, higher = more creative |

**Outputs:**

| Output | Type | Description |

|--------|------|-------------|

| `audio` | AUDIO | Generated audio waveform |

How Audio Variation Works

Connect any AUDIO output (e.g., from a `LoadAudio` node, or a previous `Foundation-1 Generate` output) to the optional `audio` input. The model will use this as a starting point and create a variation guided by your prompt tags, BPM, bars, and key.

**`init_noise_level` controls the variation strength:**

- **0.1–0.3** — Output stays close to the input audio

- **0.5–0.75** — Balanced musical variations (recommended)

- **0.9–1.0** — Maximum creative freedom, output may differ significantly from input

Leave the `audio` input disconnected for standard text-to-audio generation.

---

## Prompt Tags

Click to expand tag reference

Foundation-1 uses structured tags for precise control over generation. Tags should describe:

- **Instrument** — e.g., `Synth Lead`, `Piano`, `Guitar`, `Drums`

- **Timbre** — e.g., `Warm`, `Bright`, `Dark`, `Rich`, `Clean`

- **FX** — e.g., `Reverb`, `Delay`, `Distortion`, `Chorus`

- **Notation** — e.g., `Arp`, `Chord`, `Melody`, `Bassline`

- **Character** — e.g., `Spacey`, `Intimate`, `Wide`, `Thick`

**Example prompts:**

```

Synth Lead, Warm, Wide, Bright, Clean, Melody

Piano, Soft, Intimate, Reverb, Chord Progression

Drums, Punchy, Tight, Kick, Snare, Hi-Hat

Bass, Deep, Sub, Rolling, Groove

```

> **Note:** BPM, Bars, and Key are controlled via dropdowns — do not include them in the tags field.

### 📋 Full Tag Reference

For the complete list of supported tags, see the **[Master Tag Reference Sheet](./Master_Tag_Reference.md)**.

### Tag Distribution Charts

Click to expand tag distribution charts



  

  Instrument Sub-Family Coverage





  

  Timbre Descriptor Coverage





  

  FX Descriptor Coverage



---

## Musical Keys

Click to expand supported keys

**Major Keys:**

C major, C# major, D major, Eb major, E major, F major, F# major, G major, Ab major, A major, Bb major, B major

**Minor Keys:**

C minor, C# minor, D minor, D# minor, E minor, F minor, F# minor, G minor, G# minor, A minor, Bb minor, B minor

---

## Duration Calculation

Duration is automatically calculated from BPM and bars:

```

duration (seconds) = round(bars x 4 / BPM x 60)

```

**Examples:**

| BPM | Bars | Duration |

|-----|------|----------|

| 100 | 8 | 19s |

| 120 | 4 | 8s |

| 140 | 8 | 14s |

| 150 | 4 | 6s |

**Maximum duration:** 20 seconds (model limit)

---

## File Structure

```

ComfyUI/

├── models/

│   └── stable_audio/

│       └── Foundation-1/              # Auto-downloaded

│           ├── Foundation_1.safetensors

│           └── model_config.json

└── custom_nodes/

    └── ComfyUI-Foundation-1/

        ├── __init__.py

        ├── nodes/

        │   ├── __init__.py

        │   ├── loader_node.py

        │   ├── generate_node.py

        │   └── model_cache.py

        ├── k_diffusion_files/         # Private k-diffusion install

        ├── pyproject.toml

        ├── requirements.txt

        └── README.md

```

---

## Parameters Explained

Click to expand parameter details

| Parameter | Description | Recommended |

|-----------|-------------|-------------|

| **attention** | Attention mechanism | `auto` (SageAttention if available, else SDPA) |

| **steps** | Diffusion steps | `250` (training default), `100-150` for faster results |

| **cfg_scale** | Classifier-free guidance | `7.0` (training default), `6-8` for balance |

| **sampler_type** | Diffusion sampler | `dpmpp-3m-sde` (recommended, best quality), `k-dpm-fast` (fastest, needs fewer steps) |

| **sigma_min** | Min noise level | `0.3` (default) |

| **sigma_max** | Max noise level | `500.0` (default) — note: when using audio variation, this is internally overridden by `init_noise_level` |

| **audio** | Optional input audio for variations | Connect any AUDIO output, or leave disconnected for text-to-audio |

| **init_noise_level** | Variation strength | `0.5-0.75` (balanced), `0.1-0.3` (close to input), `1.0` (max variation) |

| **unload_after_generate** | Offload to CPU RAM | `True` to free VRAM between runs |

| **torch_compile** | torch.compile optimization | `True` (first run slow, subsequent faster) |

---

## Troubleshooting

Click to expand troubleshooting guide

### "No module named 'stable_audio_tools'"?

This means `stable-audio-tools` was not installed. This can happen if you cloned the repo manually without running `install.py`, or if your pip environment is different from ComfyUI's.

Fix:

```bash

pip install stable-audio-tools --no-deps

```

Then restart ComfyUI.

> [!WARNING]

> You **must** use `--no-deps`. Running `pip install stable-audio-tools` without it will pull in `pandas==2.0.2` which breaks on Python 3.13+.

### Model Not Downloading?

Manually download from [RoyalCities/Foundation-1](https://huggingface.co/RoyalCities/Foundation-1):

```bash

pip install -U huggingface_hub

huggingface-cli download RoyalCities/Foundation-1 --local-dir ComfyUI/models/stable_audio/Foundation-1

```

Only these two files are required:

- `Foundation_1.safetensors` (~3GB model weights)

- `model_config.json` (model configuration)

### Dependency Installation Failed?

The `__init__.py` auto-installs dependencies at startup. If it fails, install manually:

**Normal pip installs:**

```bash

pip install einops>=0.7.0

pip install alias-free-torch

pip install ema-pytorch

pip install einops-exts

```

**Special installs with `--no-deps` (required!):**

These packages MUST be installed with `--no-deps` or they will break your ComfyUI environment:

```bash

# stable-audio-tools --no-deps avoids pandas==2.0.2 (no Python 3.13 wheel)

pip install stable-audio-tools --no-deps

# k-diffusion must go to private folder (avoids conflict with ComfyUI's bundled version)

pip install k-diffusion==0.1.1 --no-deps --target ComfyUI/custom_nodes/ComfyUI-Foundation-1/k_diffusion_files/

```

> [!WARNING]

> **Do NOT run:**

> ```bash

> pip install stable-audio-tools    # WRONG - pulls pandas==2.0.2

> pip install k-diffusion           # WRONG - conflicts with ComfyUI

> ```

### What Goes in k_diffusion_files/?

The `k_diffusion_files/` folder is created automatically by the auto-installer. It contains a private copy of `k-diffusion` that's loaded at runtime via `importlib` — this prevents conflicts with ComfyUI's own bundled `k_diffusion` and avoids the `clip→pkg_resources` import chain issue.

If this folder is missing or corrupted, the node will re-download `k-diffusion==0.1.1` automatically on next startup.

### Out of Memory?

- Enable `unload_after_generate=True` to offload to CPU RAM

- Reduce `steps` (100-150 still gives good results)

- Close other GPU applications

### Slow Generation?

- Install SageAttention: `pip install sageattention`

- Enable `torch_compile=True` (first run is slower, subsequent runs faster)

- Use `dpmpp-2m-sde` sampler (slightly faster than `dpmpp-3m-sde`)

### k_diffusion Conflicts?

Foundation-1 installs k-diffusion to a private directory (`k_diffusion_files/`) to avoid conflicts with ComfyUI's bundled version. Never install k-diffusion to site-packages manually.

---

## 🔗 Important Links

### 🤗 HuggingFace

- **Model:** [RoyalCities/Foundation-1](https://huggingface.co/RoyalCities/Foundation-1)

### 📄 Code

- **Inference Engine:** [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools)

### 🌐 Community

- **Companion Video:** [Foundation-1 Overview](https://www.youtube.com/watch?v=O2iBBWeWaL8)

---

## 📄 License



This model is licensed under the **Stability AI Community License**:

- ✅ **Non-commercial use** — permitted

- ✅ **Limited commercial use** — entities with annual revenues below USD $1M

- ⚠️ **Revenue exceeding USD $1M** — refer to the repository license file for full terms

Model weights from [RoyalCities/Foundation-1](https://huggingface.co/RoyalCities/Foundation-1) are subject to the same license.

---

## ⚠️ Usage Disclaimer

Foundation-1 is intended for music production, creative applications, and legitimate purposes. Please use responsibly and ethically. We do not hold any responsibility for any illegal usage. Please refer to your local laws regarding generated content.

---



    Structured Text-to-Sample Diffusion for Music Production
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Saganaki22/ComfyUI-Foundation-1

Awesome Lists containing this project

README

ComfyUI-Foundation-1