An open API service indexing awesome lists of open source software.

https://github.com/da0101/prompt2midi

Generate MIDI patterns from natural language prompts using ChatGPT and Python. Connects seamlessly with Ableton Live via a custom AU plugin.
https://github.com/da0101/prompt2midi

Last synced: 23 days ago
JSON representation

Generate MIDI patterns from natural language prompts using ChatGPT and Python. Connects seamlessly with Ableton Live via a custom AU plugin.

Awesome Lists containing this project

README

          

# prompt2midi

Local-first AI co-producer for Ableton Live.

prompt2midi analyzes a reference track, turns the musical evidence into producer-readable structure, generates editable MIDI, creates an original inspired loop package, and prepares a Suno-ready prompt. The core path is designed to run locally: JUCE is the Ableton-facing client, Node owns the job API, and Python owns audio analysis and generation helpers.

The product goal is not to copy songs. It is a pre-production tool for making new material from reference traits: tempo, key, groove, drum feel, bass movement, arrangement energy, and production language.

prompt2midi is open source. Producers, artists, engineers, researchers, and tool builders are welcome to contribute.

There are two main ways to use it:

- **DAW inspiration starter:** generate editable ideas, MIDI, arrangement notes, and prompts that a producer can continue shaping in Ableton, Logic, FL Studio, Bitwig, or any other DAW.
- **Pre-SUNO tool:** turn an inspired idea/reference into a cleaner prompt, structure guide, and optional proxy package that an artist can finish in SUNO or continue developing locally.

## Current Status

The repo already contains a working local vertical slice:

- JUCE plugin UI for WAV/MP3 selection, prompt entry, job polling, result display, and prompt copy.
- Local Node backend on `127.0.0.1:47321`.
- Python WAV analysis with BPM, key, energy, loudness, spectral features, genre/groove hints, chords, structure, stems, transcription, composition, and optional audio generation.
- Deterministic inspired-loop generator that writes `bass.mid`, `drums.mid`, `chords.mid`, `melody.mid`, `full_loop.mid`, `summary.json`, and `prompt.txt`.
- Optional model paths for Basic Pitch, Demucs, CLAP genre detection, Gemini Suno prompt generation, ACE-Step, and MusicGen.

This is still an MVP/research codebase. Some outputs are production useful, but model transcription, stem splitting, and MIDI mapping are still weak in places and should be treated as editable evidence, not finished arrangements. Improving stem separation, source-aware MIDI mapping, and full JUCE AU/VST integration are the next major production-readiness features.

## Branch Flow

- `develop` is the default integration branch for daily work.
- Feature branches start from `develop` and merge back into `develop`.
- `main` is release-only. When `develop` is ready to ship, merge `develop` into `main` and create a version tag.
- Do not open normal feature PRs directly into `main`.

## Why It Exists

Producers often know what they like about a reference record but cannot quickly turn that into reusable production material. prompt2midi closes that gap:

1. Drop in a reference track.
2. Extract musical facts and evidence.
3. Generate original MIDI parts that match the useful traits, not the exact song.
4. Generate a clear AI-music prompt for SUNO or similar tools.
5. Optionally render a local reference-inspired sample before uploading anything elsewhere.
6. Finish the idea either inside a DAW or inside SUNO.

When a generation service rejects direct artist/song prompts, the correct workflow is not to bypass the filter. prompt2midi uses the reference to identify neutral production traits, then creates new musical assets and a prompt that avoids artist imitation, copied hooks, lyrics, and vocal likeness.

Example framing:

- Avoid: `make a Michael Jackson song` or `copy the bassline from this record`.
- Prefer: `1980s pop-funk feel, tight dance groove, bright chord stabs, syncopated bass movement, crisp drums, original melody, no copied lyrics, no vocal imitation`.

The tool is designed to reduce copying risk by creating original material and by describing musical traits instead of requesting a clone. It does not guarantee legal clearance, does not replace rights review, and should not be used to bypass copyright or platform policies.

## Architecture

prompt2midi is a local-first desktop production system. The plugin is only the DAW-facing client; the local backend owns job orchestration; Python owns audio intelligence and generation helpers; optional model services improve output quality without becoming required for the core workflow.

```mermaid
flowchart TD
Producer["Producer in Ableton Live"] --> Plugin["JUCE plugin UI
file/prompt input, progress, result display"]
Plugin -->|POST /analyze| Node["Local Node backend
127.0.0.1:47321"]
Node --> Jobs["Job store + progress events
queued/running/succeeded/failed"]
Node --> Decode["Input validation + FFmpeg MP3 decode
WAV passed to Python"]
Decode --> Python["Python analysis package
analysis/analyze.py"]
Python --> Core["Core facts
BPM, key, loudness, energy, spectral features"]
Python --> Deep["Optional deeper analysis
chords, drums, structure, stems, transcription"]
Python --> Compose["Original composition package
bass/drums/chords/melody/full_loop MIDI"]
Python --> Arrange["Arrangement Lock / full-track proxy
maps, reports, guide MIDI, proxy audio"]
Compose --> Exports["Local exports
MIDI, summary.json, prompt.txt"]
Arrange --> Exports
Python --> Node
Node --> Prompt["Prompt layer
deterministic local prompt + optional Gemini SUNO prompt"]
Node --> Result["Aggregated result JSON
analysis, warnings, assets, prompts, paths"]
Result --> Plugin
Plugin --> DAW["Producer actions
audition, copy prompt, import MIDI, package for SUNO"]

ACE["Optional local ACE-Step API
127.0.0.1:8001"] -. audio candidates .-> Arrange
Models["Optional local engines
Basic Pitch, Demucs, CLAP, MusicGen, All-In-One Docker"] -. evidence .-> Deep
Gemini["Optional cloud Gemini
GEMINI_API_KEY"] -. SUNO prompt .-> Prompt
```

```text
Ableton / JUCE plugin
|
| POST /analyze
v
Local Node backend
|
| validates input, decodes MP3, creates job, calls Python
v
Python analysis engine
|
| returns structured JSON, MIDI paths, composition package, optional audio sample
v
Node aggregation
|
| deterministic producer prompt, optional Gemini Suno prompt
v
JUCE result display
```

### Runtime Flow

1. The producer selects a WAV/MP3 reference and/or enters a direction in the plugin.
2. The JUCE client posts the request to the localhost Node backend and keeps the audio thread pass-through.
3. Node validates local paths, decodes MP3 to WAV when needed, creates a job, and publishes progress events.
4. Python analyzes the WAV, writes structured JSON, MIDI evidence, composition assets, and optional arrangement/proxy artifacts.
5. Node aggregates the Python result with producer-facing prompt text and optional Gemini SUNO text.
6. The plugin polls status/result and displays confidence-aware output paths, warnings, and copy/export actions.

### Component Responsibilities

| Layer | Files | Responsibility |
|---|---|---|
| JUCE plugin | `Source/PluginEditor.*`, `Source/PluginProcessor.*`, `Source/LocalApiClient.h` | UI only: choose/drop reference, send local job, poll status, show results. Audio processing stays pass-through. |
| Node backend | `backend/server.js`, `backend/lib/*` | Local API, job state, input validation, MP3 decode, Python invocation, prompt aggregation, error normalization. |
| Python analysis | `analysis/analyze.py`, `analysis/core/*`, `analysis/detectors/*` | Extract structured musical facts from WAV audio. Optional libraries improve results, but fallback paths keep the core running. |
| MIDI/transcription | `analysis/midi/*` | Write MIDI, run Basic Pitch, run Demucs, expose provenance and limitations for every MIDI asset. |
| Composition | `analysis/composition/*` | Generate a new original loop package from analysis hints. This is the main product output. |
| Prompting | `backend/lib/promptGenerator.js`, `backend/lib/geminiPromptGenerator.js` | Turn structured facts into producer-facing copy and Suno prompts. Gemini is optional. |
| Audio generation | `analysis/generation/*` | Optional local sample generation using ACE-Step first, then AudioCraft/MusicGen fallback paths. |
| Tooling scripts | `scripts/pipelines/*`, `scripts/setup/*`, `scripts/packaging/*`, `scripts/services/*`, `scripts/dev/*` | Local CLI runners, setup commands, package builders, service launchers, and developer refresh tools. |
| Requirements/docs | `requirements/*`, `docs/pipelines/*`, `docs/backend/*`, `docs/qa/*` | Optional engine dependency pins and topic-grouped operational docs. |

## End-to-End Flow

### 1. User Input

The plugin accepts:

- A local `.wav` / `.wave` / `.mp3` reference file.
- A text direction.
- Or prompt-only mode when no audio file is supplied.

The plugin posts JSON to the local backend:

```json
{
"audioPath": "/absolute/path/to/reference.mp3",
"prompt": "same groove, change the bass notes a little, replace the main stab"
}
```

### 2. Node Job Orchestration

`backend/server.js` exposes:

- `GET /health`
- `POST /analyze`
- `GET /status?id=`
- `GET /result?id=`

Node creates a job immediately so the plugin remains responsive. It validates absolute audio paths, rejects unsupported formats, enforces a size limit, and decodes MP3 input through `ffmpeg` into `tmp/jobs//decoded-input.wav`.

Node then starts `python -m analysis.analyze` as a child process and records pipeline events so the UI can show progress.

### 3. Python Feature Analysis

The Python engine reads PCM WAV and returns structured JSON. The dependency-free base path extracts:

- duration, sample rate, channel count
- energy curve
- loudness
- zero-crossing rate and peak amplitude
- approximate BPM
- approximate key
- warnings when confidence is low

Optional `librosa`/`scipy` paths improve:

- BPM estimation
- key estimation
- chord progression detection
- arrangement/section analysis
- groove descriptors

Optional CLAP genre detection uses `laion/larger_clap_music` through `transformers` when available.

### 4. Stem and MIDI Evidence

Every MIDI file is labeled by source and confidence. This area is intentionally conservative: stem splitting and MIDI mapping exist, but they are not yet production-grade. They are useful for evidence, sketching, and direction, but the next feature work should improve source separation, note assignment, timing cleanup, and DAW-ready mapping.

| Asset | How it is made | Meaning |
|---|---|---|
| `reference-sketch.mid` | Deterministic pattern from estimated BPM/key | Generated sketch, not transcription. |
| `model-transcription.mid` | Basic Pitch on full mix | Model transcription candidate; needs ear correction. |
| `source-bass-transcription.mid` | Demucs bass stem + Basic Pitch | Stem-aware bass candidate; still may contain bleed. |
| `source-drum-groove.mid` | Demucs drums stem + onset detection | Quantized drum groove estimate. |
| `model-bass-transcription.mid` | Pitch-filtered Basic Pitch notes from full mix | Fallback bass candidate, not source-separated. |
| `bass-transcription.mid` | Monophonic low-frequency tracking | Legacy heuristic fallback. |

Recommended evidence exports are copied to:

```text
tmp/jobs//exports/
```

The code intentionally distinguishes generated MIDI from transcription evidence. This matters because only source-aware paths should be described as source-aware.

Current limitations:

- Demucs-style stem splitting can bleed bass, drums, vocals, and harmonic material into each other.
- Full-mix model transcription often produces extra notes and wrong instrument ownership.
- Bass, drum, chord, and melody mappings still need stronger source-aware cleanup before they should be considered arrangement-ready.
- All extracted MIDI should be auditioned and edited in Ableton before being used as final material.

Next work:

- Improve stem-aware bass, drum, chord, and melody extraction.
- Improve mapping from analysis evidence into separate DAW tracks.
- Tighten quantization, note filtering, register selection, and confidence labels.
- Complete JUCE integration for real AU/VST plugin workflows, including more polished import/export behavior.

### 5. Reference Transformation

`analysis/reference/reference_groove.py` fingerprints the reference for:

- kick accents
- bass accents
- hat/percussion accents
- swing
- low-end weight
- bass note tendencies
- club energy

`analysis/reference/reference_transform.py` converts the user's direction into controls such as:

- preserve groove similarity
- keep bass rhythm but vary notes
- replace a stab/timbre role
- keep kick and hat feel while using new samples

This is the bridge between "I like this song" and "make a new production with similar traits."

### 6. Original Composition Package

`analysis/composition/composition.py` generates the main product output:

```text
tmp/jobs//exports/
midi/
bass.mid
drums.mid
chords.mid
melody.mid
full_loop.mid
summary.json
prompt.txt
```

The generator is deterministic in structure but randomized in musical choices. It uses BPM, key, detected chords, drum pattern evidence, genre/style hints, and user direction to choose one of several composition modes:

- house
- techno
- synth wave
- hip hop
- ambient

`full_loop.mid` is MIDI format type 1 so a DAW can import separate tracks.

### 7. Suno Prompt Package

There are two prompt paths:

1. Python stub prompt from `analysis/composition/composition.py`, always local.
2. Optional Gemini prompt from `backend/lib/geminiPromptGenerator.js` when `GEMINI_API_KEY` is present.

The Gemini path uses `gemini-2.0-flash` by default and writes a single Suno paragraph from structured analysis and composition data. If Gemini is disabled, missing, times out, or fails, the job still succeeds with the local stub prompt.

The prompt contract ends with a protective instruction:

```text
Instrumental, no vocals. Inspired by the reference groove and production style, not a cover and not a copy.
```

### 8. Optional Local Audio Sample

`analysis/generation/audio_generation.py` can prepare a 30-second local sample before the user uploads anything to Suno.

Provider order:

1. ACE-Step local API, enabled by `PROMPT2MIDI_ENABLE_ACE_STEP=1`.
2. AudioCraft MusicGen, enabled by `PROMPT2MIDI_ENABLE_AUDIOCRAFT=1`.
3. Transformers MusicGen Melody, enabled by `PROMPT2MIDI_ENABLE_MUSICGEN=1`.

ACE-Step uses a local API at `127.0.0.1:8001` by default and model settings around:

- `acestep-v15-turbo`
- `acestep-5Hz-lm-0.6B`
- MLX backend by default on macOS

The sample is scored for duration, loudness, pulse consistency, clipping, harshness, and reference groove similarity. The score helps pick the best candidate, but listening is still required.

## Models and Engines

| Engine | Required? | Purpose | Setup |
|---|---:|---|---|
| Python stdlib WAV analyzer | Yes | Base BPM/key/energy/loudness/spectral analysis | Built in |
| `ffmpeg` | For MP3 | Decode MP3 to WAV and export reference sections | Install separately or set `PROMPT2MIDI_FFMPEG` |
| `librosa` / `scipy` | Optional | Better BPM/key, chords, structure, drums, groove | Python environment |
| CLAP `laion/larger_clap_music` | Optional | Zero-shot genre tags | Python ML deps |
| Basic Pitch | Optional | Model MIDI transcription | `npm run setup:transcription` |
| Demucs `htdemucs` | Optional | Bass/drum/other/vocal stems | `npm run setup:stems` |
| Gemini `gemini-2.0-flash` | Optional cloud | Higher quality Suno prompt | `GEMINI_API_KEY=...` |
| ACE-Step 1.5 | Optional local service | Reference-guided audio samples | `npm run setup:ace-step`, then `npm run ace-step:start` |
| AudioCraft MusicGen | Optional local | Fallback sample generation | `npm run setup:musicgen` |
| Transformers MusicGen Melody | Optional local | Legacy fallback sample generation | Set `PROMPT2MIDI_ENABLE_MUSICGEN=1` with deps installed |

## Legal and Platform-Safety Position

prompt2midi is built around reference-inspired transformation, not cloning.

It should:

- Analyze traits instead of copying a recording.
- Generate new MIDI instead of exporting copyrighted melodies as final output.
- Use neutral production language instead of artist-name prompting.
- Avoid vocals, lyrics, artist likeness, copied hooks, and exact bass/melody sequences.
- Keep every extracted/transcribed artifact labeled as evidence, not guaranteed clearance.

It should not:

- Promise that any output is free of legal issues.
- Claim to bypass Suno copyright filters.
- Recreate a protected song, master recording, vocal likeness, lyric, or signature hook.
- Tell users that a generated sample is automatically safe to upload commercially.

Use references you own, created, licensed, or are otherwise allowed to analyze. Treat the generated Suno prompt and local sample as a safer creative starting point, not legal advice.

## Contributing

This is an open-source project and contributions are welcome. Useful areas include:

- stronger stem separation and source-aware MIDI mapping
- better AU/VST/JUCE host integration
- Ableton, Logic, and other DAW workflow testing
- prompt packaging for SUNO and other music tools
- audio-analysis fixtures and regression tests
- documentation, examples, setup scripts, and UX polish

Branch from `develop`, keep changes local-first, and label extracted MIDI honestly when confidence is limited.

## Running Locally

Install Node dependencies:

```bash
npm install
```

Start the backend:

```bash
npm start
```

Run the developer loop with backend logs:

```bash
npm run dev:refresh
```

Install optional engines:

```bash
npm run setup:transcription
npm run setup:stems
npm run setup:ace-step
npm run setup:musicgen
```

Start ACE-Step when using local sample generation:

```bash
npm run ace-step:start
```

Useful environment flags:

```bash
PROMPT2MIDI_DISABLE_MODEL=1
PROMPT2MIDI_DISABLE_STEMS=1
PROMPT2MIDI_DISABLE_SUNO=1
PROMPT2MIDI_DISABLE_LIBROSA=1
PROMPT2MIDI_DISABLE_GENRE=1
PROMPT2MIDI_DISABLE_CHORDS=1
PROMPT2MIDI_DISABLE_STRUCTURE=1
PROMPT2MIDI_DISABLE_DRUMS=1
PROMPT2MIDI_ENABLE_ACE_STEP=1
PROMPT2MIDI_ENABLE_AUDIOCRAFT=1
PROMPT2MIDI_ENABLE_MUSICGEN=1
GEMINI_API_KEY=...
```

## API Example

```bash
curl -s -X POST http://127.0.0.1:47321/analyze \
-H 'Content-Type: application/json' \
-d '{"audioPath":"/absolute/path/to/reference.wav","prompt":"keep the groove, vary the bass notes, replace the stab sound"}'
```

Poll status:

```bash
curl -s 'http://127.0.0.1:47321/status?id='
```

Fetch result:

```bash
curl -s 'http://127.0.0.1:47321/result?id='
```

## Testing

Run Python tests:

```bash
python3 -m unittest analysis.tests.test_feature_extraction
python3 -m unittest analysis.tests.test_composition
```

Run Node tests:

```bash
npm test
```

Compile-check Python:

```bash
python3 -m compileall analysis
```

## Important Invariants

- Keep the core workflow local-first.
- Do not run long jobs, subprocesses, network calls, or file-heavy analysis in JUCE `processBlock`.
- Node owns orchestration and aggregation.
- Python returns structured analysis JSON and file paths.
- Generated MIDI is product output; extracted MIDI is evidence.
- Optional engines must degrade to warnings, not hard job failure.
- Producer-facing copy must describe confidence and limitations honestly.