https://github.com/akspa0/the-machine

Audio processing tool focused on metadata and context generation via off-the-shelf components. Leverages several local ML/AI tools to accomplish transcription, context clues, and audio processing tasks. Designed with extensibility in mind.
https://github.com/akspa0/the-machine

audio-context audio-processing audio-processing-with-python cuda ffmpeg lmstudio parakeet pyannote-audio pytorch speech-processing stt whisper

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/akspa0/the-machine
Owner: akspa0
License: mit
Created: 2025-05-29T21:49:07.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-06-11T04:51:41.000Z (4 months ago)
Last Synced: 2025-06-11T05:27:53.954Z (4 months ago)
Topics: audio-context, audio-processing, audio-processing-with-python, cuda, ffmpeg, lmstudio, parakeet, pyannote-audio, pytorch, speech-processing, stt, whisper
Language: Python
Homepage: https://immoralhole.com
Size: 1.06 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# The-Machine 🧠🔊

Dedicated to the memory of Carlito Cross [Madhouse Live](https://madhouselive.com)
---

**A context-driven, privacy-first, modular pipeline for understanding, transforming, and building on top of audio recordings.**

---

## 🚀 What is The-Machine?

The-Machine is a powerful, extensible toolkit for:
- Adding rich context to audio recordings (calls, music, podcasts, etc.)
- Preparing audio and metadata for dataset use, research, and creative projects
- Building new tools and workflows on top of audio context and transcriptions
- Enabling privacy-first, traceable, and reproducible audio processing

**Why?**
> Audio is more than just sound — it is context, story, and data. The-Machine helps you unlock, organize, and use that context for anything from dataset curation to creative AI workflows.

---

## ✨ Features

- 🎙️ **Audio Ingestion & PII Removal**: Ingests audio, removes PII from filenames, and anonymizes all logs/outputs.
- 🗂️ **Context-Driven Processing**: Every file is tracked, indexed, and processed with full lineage and manifesting.
- 🧩 **Extension System**: Modular, plug-and-play extensions for everything—transcription, CLAP annotation, LLM tasks, remixing, show creation, and more.
- 🦾 **LLM Integration**: Local LLM support (LM Studio, etc.) for titles, summaries, image prompts, and more—fully privacy-safe.
- 🗣️ **Speaker Diarization & Transcription**: Segments audio by speaker, transcribes with Parakeet/Whisper, and aligns with context.
- 🥁 **CLAP Annotation & Segmentation**: Detects events (e.g., ringing, hang-up) and segments calls using CLAP.
- 🎚️ **Normalization & Remixing**: Loudness normalization, true peak, and creative remixing for dataset or show use.
- 🖼️ **Image/Video Generation**: Extensions for SDXL/ComfyUI image and video generation from transcripts and personas.
- 📜 **Manifest & Traceability**: Every output is tracked in a manifest—no lost context, ever.
- 🔒 **Privacy-First**: No PII in logs, outputs, or manifests. All processing is anonymized by design.
- 🧠 **Memory Bank**: Project context, progress, and system patterns are tracked for robust, extension-driven development.
- 🛠️ **Workflow-Driven**: All logic and configuration is defined in JSON workflows—easy to extend, modify, and share.
- 🏗️ **Ready for Dataset Prep**: Designed to help you build, clean, and annotate audio datasets for ML/AI.
- 🔄 **Resume & Robustness**: Pipeline can resume from any stage, with full error recovery and validation.
- 🧬 **Designed for Extensibility**: Build your own extensions to add new context, analysis, or creative outputs.
- Persona builder audio samples are now lossless, using numpy+soundfile to concatenate original .wav files (not _16k.wav), with no resampling or pydub, guaranteeing high fidelity for all persona samples.
- System prompt for persona generation now instructs the LLM to be concise, allow for absurdity, and keep responses below 300 tokens.
- All LLM chunking/continuation logic is removed; only direct responses are used for persona and all LLM tasks.
- Logging and debug output is robust and clear for all pipeline and extension stages.

---

## 🧩 Extension System

All new features are implemented as modular **extensions** in the `extensions/` folder. Extensions can:
- Run after the main pipeline or independently
- Use all context, transcripts, and outputs
- Add new analysis, creative outputs, or integrations

**See [`extensions/README.md`](./extensions/README.md) for a full catalog and authoring guide.**

---

## 🛠️ Example Usage

### Ingest and Process Audio
```sh
python pipeline_orchestrator.py input_audio/
```

### Run an Extension (e.g., Persona Builder)
```sh
python extensions/character_persona_builder.py outputs/run-YYYYMMDD-HHMMSS --llm-config workflows/llm_tasks.json
```

### Generate Avatars/Images
```sh
python extensions/avatar/sdxl_avatar_generator.py \
--persona-manifest outputs/run-YYYYMMDD-HHMMSS/characters/persona_manifest.json \
--output-root outputs/run-YYYYMMSS
```

### Use the LLM Utilities (chunking, summarization, etc.)
```sh
python extensions/llm_utils.py --help
```

---

## 📚 Project Structure

- `extensions/` — All modular extensions (see README inside)
- `workflows/` — JSON configs for pipeline, CLAP, LLM, etc.
- `memory-bank/` — Project context, progress, and system patterns
- `outputs/` — All run outputs (timestamped folders)
- `specification/` — System and node documentation

---

## 🧠 How to Build Your Own Extensions

1. Copy `extension_base.py` and inherit from `ExtensionBase`.
2. Use context, transcripts, and outputs from any run folder.
3. Add your logic—analysis, creative output, new integrations, etc.
4. Log only anonymized, PII-free information.
5. Document your extension and add it to the catalog!

See [`extensions/README.md`](./extensions/README.md) for more.

---

## 🌟 Vision & Future

- **Context Everywhere:** Audio is just the start — The-Machine is designed to add, use, and build on context for any data.
- **Multimodal Workflows:** Future extensions will support image→text→audio pipelines, creative AI, and dataset generation in all directions.
- **Reverse Pipelines:** Imagine describing an image with a local LLM, then generating audio or music from that description—The-Machine will make it possible.
- **Open, Extensible, and Privacy-First:** Built for researchers, creators, and anyone who wants to understand and use audio context.

---

## 📝 Documentation & Resources

- [Extension Catalog & Guide](./extensions/README.md)
- [Workflow Configs](./workflows/README.md)
- [Memory Bank & Project Context](./memory-bank/README.md)
- [System Specifications](./specification/README.md)

---

## 🤝 Contributing

- Contributions, new extensions, and feedback are welcome!
- Please see the extension authoring guide and open an issue or PR.

---

**Built for context, privacy, and creativity.**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/akspa0/the-machine

Awesome Lists containing this project

README