{"id":48908722,"url":"https://github.com/aojdevstudio/transcript-library","last_synced_at":"2026-04-16T22:03:57.950Z","repository":{"id":339918258,"uuid":"1163118835","full_name":"AojdevStudio/transcript-library","owner":"AojdevStudio","description":"Browse-first knowledge library for YouTube playlist transcripts and curated insights. Built with Next.js 16, React 19, and Tailwind CSS 4.","archived":false,"fork":false,"pushed_at":"2026-04-10T13:35:35.000Z","size":19450,"stargazers_count":0,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-10T15:25:44.949Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AojdevStudio.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-02-21T05:48:48.000Z","updated_at":"2026-04-10T13:35:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/AojdevStudio/transcript-library","commit_stats":null,"previous_names":["jarvis-aojdevstuio/transcript-library","aojdevstudio/transcript-library"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/AojdevStudio/transcript-library","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AojdevStudio%2Ftranscript-library","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AojdevStudio%2Ftranscript-library/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AojdevStudio%2Ftranscript-library/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AojdevStudio%2Ftranscript-library/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AojdevStudio","download_url":"https://codeload.github.com/AojdevStudio/transcript-library/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AojdevStudio%2Ftranscript-library/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31905896,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-16T18:22:33.417Z","status":"ssl_error","status_checked_at":"2026-04-16T18:21:47.142Z","response_time":69,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-16T22:03:40.678Z","updated_at":"2026-04-16T22:03:57.915Z","avatar_url":"https://github.com/AojdevStudio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Transcript Library\n\n### **Watch the source. Read the analysis. Keep the signal.**\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n[![Next.js](https://img.shields.io/badge/Next.js-16-black)](https://nextjs.org)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/AojdevStudio/transcript-library/pulls)\n\n_A private reading room for a small group of friends who take YouTube seriously._\n\n[**Library**](#quick-start) · [**Knowledge Base**](#how-it-works) · [**Analysis Runtime**](#how-it-works)\n\n\u003c/div\u003e\n\n---\n\n## The Problem With Shared Playlists\n\nYou drop a YouTube video in the group chat. Three friends say they'll watch it. One actually does, a week later, alone, and forgets what they wanted to say. The other two never get around to it.\n\nThe video had real signal. A framework you could apply. A story worth discussing. But the knowledge dissolved — into separate browser sessions, half-watched tabs, and messages that got buried.\n\n- The insight lived in your head, not somewhere shareable\n- There was no way to read the transcript without leaving the video\n- Analysis you'd want to reference later didn't exist\n- You watched it once and moved on\n\n**Sound familiar?**\n\n\u003e _\"I'll send you the timestamp.\" — said before forgetting the timestamp, the video, and what it was about._\n\n---\n\n## The Insight\n\nEveryone in the group is curious. Nobody has unlimited time. You need a way to extract signal from a video without treating it like a solo research project.\n\n\u003cdiv align=\"center\"\u003e\n\n### **Watch the video inside the app.**\n\n### **Let the analysis run in the background.**\n\n\u003c/div\u003e\n\nThe transcript is already there. The AI tooling already exists. The only missing piece was a workspace that wired it together — for a specific group of people who already trust each other's taste in content.\n\n\u003cdiv align=\"center\"\u003e\n\n## **A reading room for your shared playlist.**\n\n\u003c/div\u003e\n\n---\n\n## What This Is\n\nTranscript Library is a private internal tool for a small group of friends built around a shared YouTube playlist.\n\n| Layer         | What It Does                                                                   |\n| :------------ | :----------------------------------------------------------------------------- |\n| **Catalog**   | Refreshes a local SQLite catalog from the transcript repo for all browse reads |\n| **Player**    | Embeds the YouTube video in-app — no tab switching                             |\n| **Analysis**  | Runs AI synthesis headlessly via `claude` CLI or `codex` CLI                   |\n| **Knowledge** | Stores markdown notes alongside video insights for long-term reference         |\n\nThis is not a SaaS product. It is a proof of concept for a trusted group that already has access to Claude and ChatGPT tooling.\n\n---\n\n## See It In Action\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eThe workspace: player + analysis on one page\u003c/b\u003e\u003c/summary\u003e\n\n```\nLibrary \u003e Channel \u003e Video Title\n\n[  YouTube player — full width, no chrome  ]\n\nAnalysis\n──────────────────────────────────────────\nSummary    Key Takeaways    Action Items\n\nFull report ↓ (rendered inline, no disclosure)\n\nTranscript\n──────────────────────────────────────────\nPart 1  ·  2,400 words         Open ↗\nPart 2  ·  1,800 words         Open ↗\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eThe pipeline: how a video becomes an insight\u003c/b\u003e\u003c/summary\u003e\n\n```\nShared YouTube Playlist(s)\n        ↓\nGitHub Action (every 4h) — yt-dlp + Python pipeline\n        ↓\npipeline/youtube-transcripts/ (committed to repo)\n        ↓\nCoolify auto-deploy (Docker Compose)\n        ↓\ndocker-entrypoint.sh rebuilds catalog if transcripts changed\n        ↓\nPOST /api/analyze?videoId=...\n        ↓\nclaude CLI or codex CLI (headless, local)\n        ↓\ndata/insights/\u003cvideoId\u003e/analysis.md\n```\n\n\u003c/details\u003e\n\n---\n\n## What You Get\n\n| Feature                   | How It Works                                     | Why It Matters                                                     |\n| :------------------------ | :----------------------------------------------- | :----------------------------------------------------------------- |\n| **Embedded player**       | YouTube iframe, no redirect                      | Watch and read without splitting attention                         |\n| **Headless analysis**     | claude-cli or codex-cli via provider abstraction | Run from any machine, swap providers without touching UI           |\n| **Insight artifacts**     | Canonical `analysis.md` + run metadata per video | Stable lookup by `videoId`, human-readable alongside machine paths |\n| **Live status**           | SSE stream during analysis run                   | Know when it's done without refreshing                             |\n| **Knowledge base**        | Markdown folders alongside video insights        | Essays and notes in the same editorial workspace                   |\n| **Breadcrumb navigation** | Library → Channel → Video                        | Always know where you are, always one click back                   |\n\n---\n\n## Quick Start\n\n### Prerequisites\n\n- Node.js 18+ / [Bun](https://bun.sh)\n- Transcripts are embedded in `pipeline/` — no external repo needed\n- `claude` CLI or `codex` CLI (for running analysis)\n\n### Install\n\n```bash\ngit clone https://github.com/AojdevStudio/transcript-library\ncd transcript-library\nbun install\ncp .env.example .env.local\n```\n\n### Configure\n\n```bash\n# Optional — local dev override only (transcripts are embedded in pipeline/ by default)\n# PLAYLIST_TRANSCRIPTS_REPO=/absolute/path/to/playlist-transcripts\n\n# Optional\nANALYSIS_PROVIDER=claude-cli\nINSIGHTS_BASE_DIR=/srv/transcript-library/insights   # hosted deploys\nCATALOG_DB_PATH=/srv/transcript-library/catalog/catalog.db\n\n# Hosted deployment (set these when deploying, not for local dev)\nHOSTED=true                          # enables preflight validation + hosted guard\nCLOUDFLARE_ACCESS_AUD=\u003ccf-access-aud\u003e # required — trusts browser identity from Cloudflare Access\nPRIVATE_API_TOKEN=\u003cstrong-random\u003e    # machine token for supported automation entrypoints\nSYNC_TOKEN=\u003cwebhook-secret\u003e          # recommended — authenticates /api/sync-hook callers\n```\n\n\u003e **Local dev needs zero hosted config.** Leave `HOSTED` unset and all API routes\n\u003e work without authentication. The server logs warnings for missing vars but never\n\u003e blocks startup.\n\u003e\n\u003e **Hosted access model:** `library.aojdevstudio.me` is the friend-facing Cloudflare Access\n\u003e hostname. Approved friends use browser access there with Cloudflare-managed identity.\n\u003e Do not ship `PRIVATE_API_TOKEN` to the browser or assume bearer-only access is supported on\n\u003e that hostname. Machine access stays on explicit automation paths such as `/api/sync-hook`,\n\u003e same-host cron/systemd jobs, or a dedicated automation/deploy hostname.\n\n### Run\n\n```bash\njust start\n# → http://localhost:3939\n```\n\n---\n\n## How It Works\n\n![Transcript Library Architecture](./docs/architecture-diagram.png)\n\n### Artifact Layout\n\nEach analysis lives under a stable `videoId` path. Local development defaults to\n`data/insights`, while the canonical hosted path is `/srv/transcript-library/insights` via\n`INSIGHTS_BASE_DIR`.\n\n```\ndata/insights/\u003cvideoId\u003e/\n  analysis.json            ← authoritative structured artifact\n  analysis.md              ← human-readable report derived from JSON\n  \u003cslugified-title\u003e.md     ← human-readable copy\n  video-metadata.json      ← channel, topic, published date\n  run.json                 ← provider, model, timing\n  worker-stdout.txt        ← live log during run\n  worker-stderr.txt        ← errors\n  status.json              ← idle | running | complete | failed\n\ndata/insights/.migration-status.json\n  remainingLegacyCount     ← machine-checkable migration window status\n```\n\nLegacy markdown-only artifacts are supported only during the one-time migration window. Operators\ncan check migration completion with `node scripts/migrate-legacy-insights-to-json.ts --check` and\ncomplete the upgrade by rerunning the script without `--check`.\n\n### Catalog Refresh Contract\n\nBrowse reads are SQLite-only after Phase 2. The app keeps the live catalog at\n`data/catalog/catalog.db` by default and writes the latest import report to\n`data/catalog/last-import-validation.json` unless `CATALOG_DB_PATH` points somewhere else.\n\n```bash\nnpx tsx scripts/rebuild-catalog.ts\nnpx tsx scripts/rebuild-catalog.ts --check\n```\n\n- `npx tsx scripts/rebuild-catalog.ts` rebuilds a temp SQLite snapshot, validates it, and atomically\n  swaps it into place only when the import passes.\n- `npx tsx scripts/rebuild-catalog.ts --check` runs the same validation gate without replacing the live\n  DB, while still updating `last-import-validation.json` for operator review.\n- A failed validation leaves the last known-good `catalog.db` in place. The app does not fall back\n  to `videos.csv` at runtime anymore.\n- `POST /api/sync-hook` is retired — it returns 410. Catalog rebuild on deploy is handled by\n  `docker-entrypoint.sh`, which detects transcript changes and triggers a rebuild automatically.\n  `scripts/daily-operational-sweep.ts` uses the same refresh authority before reading browse\n  metadata, so unattended automation and the app use the same catalog authority.\n\n### Provider Abstraction\n\nAnalysis runs through a thin provider boundary. Swap `ANALYSIS_PROVIDER` to switch between `claude-cli` and `codex-cli` — no UI changes, no redeployment.\n\n```bash\n# In .env.local\nANALYSIS_PROVIDER=claude-cli    # default\nANALYSIS_PROVIDER=codex-cli     # alternative\n```\n\n### Runtime Observability Contract\n\nPhase 3 keeps the operator story simple and durable:\n\n- `run.json` is the latest durable run record for a `videoId`, including provider, model, lifecycle, and timing.\n- `status.json` is the compatibility artifact that mirrors the current lifecycle for quick reads and older surfaces.\n- `worker-stdout.txt` and `worker-stderr.txt` remain the raw evidence trail when a run needs deeper inspection.\n- `reconciliation.json` records whether the latest durable run and the expected artifacts still agree, including mismatch reasons and rerun-ready guidance.\n- `GET /api/insight` is the status-first snapshot used by the video workspace. It returns lifecycle, stage, retry guidance, reconciliation details, recent log lines, and the current artifact bundle without making operators read raw files first.\n- `GET /api/insight/stream` reuses a shared per-video snapshot cache so concurrent viewers consume the same live status payload instead of polling disk independently. The workspace prioritizes stage, retry guidance, and `recentLogs`; full raw logs stay secondary.\n\nWhen `reconciliation.json` reports a mismatch, the app treats the latest run as retry-needed instead of quietly presenting it as normal success. The intended operator recovery path is a clean rerun, not manual file repair.\n\n### Core API Routes\n\n```\nPOST /api/analyze?videoId=...         Start headless analysis\nGET  /api/analyze/status?videoId=...  Poll run status\nGET  /api/insight?videoId=...         Fetch completed insight\nGET  /api/insight/stream?videoId=...  SSE stream during run\nGET  /api/raw?path=...                Serve raw transcript chunks\n```\n\n---\n\n## Commands\n\n```bash\njust start              # Dev server\njust prod-start         # Production\njust build              # Next.js build\njust lint               # ESLint\njust typecheck          # tsc --noEmit\njust daily-sweep        # Unattended daily sweep: refresh-only ingest + safe repair, no analysis launch\njust backfill-insights  # Explicit analysis workflow for existing videos\nnpx tsx scripts/rebuild-catalog.ts --check  # Validate catalog parity without cutover\nnpx tsx scripts/benchmark-hosted-scale.ts --check  # Scale validation (1000-video benchmark)\n```\n\n### Unattended daily sweep\n\nSchedule this command for unattended operation:\n\n```bash\njust daily-sweep\n# or: node --import tsx scripts/daily-operational-sweep.ts\n```\n\nThe daily sweep is the unattended default. It refreshes source state, republishes browse state, runs\nonly the conservative historical repair pass, and writes a durable operator record to\n`data/runtime/daily-operational-sweep/latest.json` by default (or the sibling `runtime/`\ndirectory next to `INSIGHTS_BASE_DIR` on hosted installs). Each run also writes an immutable archive\nrecord under `data/runtime/daily-operational-sweep/archive/\u003csweepId\u003e.json`.\n\nWhen the sweep reports `manualFollowUpVideoIds`, those are rerun-only videos: the sweep left them\nvisible for manual follow-up instead of fabricating `run.json` or starting analysis work. Analysis\nremains on-demand or explicit.\n\n---\n\n## The Story\n\nThis started as a frustration. Our group watches a lot of YouTube — not casually, but deliberately. We share links and say \"this one is worth your time.\" But saying it and actually watching it together are different things.\n\nTranscript data for 243 videos across 91 channels was already being pulled — that pipeline is now merged into this repo under `pipeline/`, with a GitHub Action syncing every 4 hours and committing the results. The AI tooling already existed. What didn't exist was a workspace that made the signal accessible without a separate workflow for every person in the group.\n\nSo this became a reading room. You pick a video, the player loads inline, the analysis runs in the background, and the transcript is there if you want the exact words. The knowledge base holds notes alongside the video insights. Everything is organized by the same `videoId` key, so nothing ever gets lost.\n\nIt's private, it's opinionated, and it's built for exactly one use case: a small group of friends who take ideas seriously.\n\n\u003cdiv align=\"center\"\u003e\n\n### The video is the source. The analysis is the shortcut. The discussion is the point.\n\n\u003c/div\u003e\n\n---\n\n## Docs\n\n- [System overview](./docs/architecture/system-overview.md)\n- [Analysis runtime](./docs/architecture/analysis-runtime.md)\n- [Worker topology](./docs/architecture/worker-topology.md)\n- [Artifact schema](./docs/architecture/artifact-schema.md)\n- [Provider runbook](./docs/operations/provider-runbook.md)\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**Built for the group. Kept private. Worth sharing the idea.**\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faojdevstudio%2Ftranscript-library","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faojdevstudio%2Ftranscript-library","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faojdevstudio%2Ftranscript-library/lists"}