https://github.com/aydrian/broll-organizer

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/aydrian/broll-organizer
Owner: aydrian
License: mit
Created: 2026-02-16T01:20:05.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-04-05T17:20:58.000Z (2 months ago)
Last Synced: 2026-04-05T17:25:14.238Z (2 months ago)
Language: Python
Size: 245 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 🎬 B-Roll Organizer

AI-powered b-roll organizer using **Fireworks AI** for vision analysis (with Ollama for local embeddings and whisper.cpp for transcription).

## Overview

`broll-organizer` is an AI-powered video cataloging tool designed to organize and search large collections of b-roll footage stored on external drives. It uses Large Language Models (LLMs) to automatically analyze, tag, and describe video clips, making them searchable through natural language—including **spoken content** via audio transcription.

The project has two main interfaces:
1. A **Command-Line Interface (CLI)** for initializing the catalog, processing videos, and performing searches.
2. A **Web Interface (Flask)** for visually browsing the catalog, searching for clips, and viewing video details.

An **OpenClaw Agent API** is also available for programmatic access by AI assistants.

## Key Technologies

* **Backend:** Python 3.12+ with **asyncio** for parallel processing
* **CLI:** `click`
* **Web Framework:** `flask`
* **Video Processing:** `ffmpeg` (subprocess)
* **Image Processing:** `pillow`
* **Database:** SQLite with FTS5 (keyword search) + `sqlite-vec` (semantic search)
* **AI/ML:**
* **Vision Analysis:** [Fireworks AI](https://fireworks.ai/) `kimi-k2p5-turbo` (multimodal) - scene descriptions and tags
* **Embeddings:** [Ollama](https://ollama.com/) `nomic-embed-text` (local, default) or Fireworks - semantic search
* **Transcription:** [whisper.cpp](https://github.com/ggerganov/whisper.cpp) (local) - audio-to-text for spoken content search
* **Folder-based location:** Since devices like Osmo Pocket 3 don't encode GPS, location is inferred from folder names.

## AI Provider Setup

The application uses a **hybrid approach** to maximize speed while preserving API quotas:

| Task | Default Provider | Reason |
|------|-----------------|--------|
| Vision Analysis | **Fireworks AI** | Kimi K2.5 Turbo - high-quality scene analysis (core value) |
| Embeddings | **Ollama (local)** | nomic-embed-text - runs fast locally, high volume |
| Transcription | **whisper.cpp (local)** | Runs entirely offline - no API usage |

### Required Setup

#### 1. Fireworks AI (Vision Analysis)

Set your Fireworks API key for vision analysis. You can either:

**A. Use a `.env` file (recommended):**
```bash
cp .env.example .env
# Edit .env and add your key:
FIREWORKS_API_KEY=your-api-key-here
```

**B. Set environment variable:**
```bash
export FIREWORKS_API_KEY="your-api-key-here"
```

#### 2. Ollama (Embeddings - Local, Required)

Embeddings default to local Ollama to preserve Fireworks quota for vision tasks:

1. **Install Ollama:** Download via [ollama.com](https://ollama.com/)
2. **Pull the embedding model:**
```bash
ollama pull nomic-embed-text # Required for semantic search
```

#### 3. whisper.cpp (Transcription - Optional)

For audio transcription (enables search of spoken content):

**macOS (Apple Silicon with Metal GPU):**
```bash
# Install via Homebrew
brew install whisper.cpp

# Download model (small is fast with Metal GPU)
whisper-cli --model small --download-model
```

**Linux (x86_64 or ARM64 like Raspberry Pi):**
```bash
# Build from source
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make

# Download appropriate model
# For Raspberry Pi 5 (ARM64): use 'tiny' (39MB)
./models/download-ggml-model.sh tiny

# For desktop x86_64: use 'base' (74MB)
./models/download-ggml-model.sh base
```

The application auto-detects your platform and selects the optimal model:
- **macOS Metal (M1/M2/M3):** `small` model (~2-4x real-time speed)
- **Linux x86_64:** `base` model (~0.5-1x real-time speed)
- **Linux ARM64 (Raspberry Pi 5):** `tiny` model (~0.1-0.3x real-time speed)

## Installation

This project is managed with [`uv`](https://github.com/astral-sh/uv).

```bash
# Install dependencies
uv sync

# Run the app
uv run broll --help
```

## Getting Started

The application is designed to be run directly on a host machine with access to an external drive containing video files.

### 1. Initialize the Catalog
Create the database and folder structure on the target drive.
```bash
uv run broll init /path/to/your/external-drive
```

### 2. Process Videos
Scan the drive to find new videos, extract metadata, analyze them with the LLM, and add them to the catalog. Processing now runs **in parallel** with a default concurrency of 5 videos.
```bash
uv run broll process /path/to/your/external-drive
```
* Use `--scan-only` to quickly catalog files by metadata without running the slower LLM analysis.
* Use `--force` to re-process all videos, even if they are already in the catalog.
* Use `--concurrency N` to adjust parallel processing (default: 5).
* Use `--no-transcribe` to skip audio transcription if whisper.cpp is installed but you want faster processing.

## Usage

### CLI

* **Process Videos:** Scan, analyze, and catalog videos with **parallel processing** and optional **audio transcription**.
```bash
# Process with default settings (concurrency=5, transcribe enabled)
uv run broll process /path/to/drive

# Adjust parallel processing
uv run broll process /path/to/drive --concurrency 10

# Skip transcription for faster processing
uv run broll process /path/to/drive --no-transcribe
```

* **Transcribe (Retroactive):** Add audio transcription to previously processed videos.
```bash
# Transcribe all videos missing transcripts
uv run broll transcribe /path/to/drive

# Transcribe specific video by ID
uv run broll transcribe /path/to/drive --video-id 123

# Force re-transcribe all videos
uv run broll transcribe /path/to/drive --force
```

* **Search:** Find videos using natural language (including **spoken content** from transcripts).
```bash
uv run broll search "a slow-motion shot of a waterfall at sunset" --drive /path/to/drive

# Search for spoken content
uv run broll search "interview about climate change" --drive /path/to/drive
```
You can specify different search modes: `hybrid` (default), `keyword`, or `semantic`.

* **Statistics:** View stats about the catalog.
```bash
uv run broll stats /path/to/drive
```

* **OpenClaw Agent Mode:** Launch the agent API for AI assistant integration.
```bash
uv run broll agent /path/to/drive
```
The agent API will be available at `http://127.0.0.1:5556`.

### Web UI

* Launch the web interface to browse and search the catalog visually.
```bash
uv run broll web /path/to/drive
```
The web UI will be available at `http://127.0.0.1:5555` by default.

## OpenClaw Skill

This repo includes an OpenClaw skill for easy AI assistant integration:

```bash
# Install the skill via npx
npx openclaw skills add aydrian/broll-organizer/skills/broll-catalog
```

Once installed, OpenClaw agents can query your B-roll catalog directly:
- Search by keyword, location, mood
- Get video details and thumbnails
- Find clips for content creation

See [skills/broll-catalog/SKILL.md](skills/broll-catalog/SKILL.md) for details.

## Configuration

Environment variables:

| Variable | Description | Default |
|----------|-------------|---------|
| `FIREWORKS_API_KEY` | Your Fireworks API key (required for vision analysis) | (none) |
| `VISION_PROVIDER` | Provider for vision analysis: `fireworks` or `ollama` | `fireworks` |
| `EMBEDDING_PROVIDER` | Provider for embeddings: `fireworks` or `ollama` | `ollama` (local) |
| `BROLL_WHISPER_MODEL` | Override whisper.cpp model (tiny, base, small, medium) | `auto` (platform-optimized) |
| `WHISPER_CPP_PATH` | Path to whisper-cli executable | (auto-detected) |
| `BROLL_TRANSCRIPTION_ENABLED` | Enable/disable transcription | `true` |

### Legacy Options

| Variable | Description | Default |
|----------|-------------|---------|
| `AI_PROVIDER` | Legacy fallback for all tasks | `fireworks` |

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aydrian/broll-organizer

Awesome Lists containing this project

README