{"id":34571393,"url":"https://github.com/devdogukan/youtube-transcript-extarctor","last_synced_at":"2026-05-29T04:31:25.154Z","repository":{"id":328389294,"uuid":"1115373573","full_name":"devdogukan/youtube-transcript-extarctor","owner":"devdogukan","description":null,"archived":false,"fork":false,"pushed_at":"2025-12-12T22:15:19.000Z","size":26,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-14T10:06:57.929Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devdogukan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-12T19:02:51.000Z","updated_at":"2025-12-12T22:15:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/devdogukan/youtube-transcript-extarctor","commit_stats":null,"previous_names":["devdogukan/youtube-transcript-extarctor"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/devdogukan/youtube-transcript-extarctor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdogukan%2Fyoutube-transcript-extarctor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdogukan%2Fyoutube-transcript-extarctor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdogukan%2Fyoutube-transcript-extarctor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdogukan%2Fyoutube-transcript-extarctor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devdogukan","download_url":"https://codeload.github.com/devdogukan/youtube-transcript-extarctor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdogukan%2Fyoutube-transcript-extarctor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33637485,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-24T09:36:48.865Z","updated_at":"2026-05-29T04:31:25.117Z","avatar_url":"https://github.com/devdogukan.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# YouTube Transcript Extractor\n\nA simple, plug-and-play Node.js module that extracts clean transcripts from YouTube videos. No UI, no website - just a reusable backend module.\n\n## Features\n\n- ✅ Extracts transcripts from YouTube videos (manual and auto-generated captions)\n- ✅ Returns transcript segments with text, duration, offset, and language\n- ✅ Multiple output formats: JSON, Text, Markdown, SRT\n- ✅ Automatic HTML entity decoding (converts `\u0026amp;#39;` to `'`, etc.)\n- ✅ CLI support for command-line usage\n- ✅ Written in TypeScript with full type definitions\n- ✅ Works without YouTube login\n- ✅ Minimal dependencies (TypeScript for development, Node.js built-in fetch)\n- ✅ Error handling for videos without captions\n\n## Installation\n\nThis module requires Node.js 18.0.0 or higher (for built-in fetch support).\n\n### Building the Project\n\nFirst, install dependencies and build the TypeScript code:\n\n```bash\nnpm install\nnpm run build\n```\n\nThis will compile TypeScript files from `src/` to `dist/`.\n\n### As a Module\n\nAfter building, import from the compiled output:\n\n```javascript\nimport { getTranscriptFromUrl } from './youtube-transcript-extractor/dist/index.js';\n```\n\nOr if using TypeScript in your project, you can import directly from source:\n\n```typescript\nimport { getTranscriptFromUrl } from './youtube-transcript-extractor/src/index.js';\n```\n\n### As a CLI Tool (Global Installation)\n\n```bash\nnpm install -g .\n```\n\nAfter installation, you can use the `youtube-transcript` command globally.\n\n## Quick Start\n\n### CLI Usage\n\n```bash\n# JSON format (default, outputs to console)\nnode dist/cli.js https://www.youtube.com/watch?v=dQw4w9WgXcQ\n\n# Text format\nnode dist/cli.js https://www.youtube.com/watch?v=dQw4w9WgXcQ --format text\n\n# Markdown format with timestamps, save to file\nnode dist/cli.js https://www.youtube.com/watch?v=dQw4w9WgXcQ --format markdown --output transcript.md --timestamps\n\n# SRT format, save to file\nnode dist/cli.js https://www.youtube.com/watch?v=dQw4w9WgXcQ --format srt --output transcript.srt\n\n# Show help\nnode dist/cli.js --help\n```\n\nIf installed globally:\n\n```bash\nyoutube-transcript https://www.youtube.com/watch?v=dQw4w9WgXcQ --format text\n```\n\n### Module Usage\n\n**JavaScript:**\n\n```javascript\nimport { getTranscriptFromUrl } from './dist/index.js';\n\n// JSON format (default)\nconst result = await getTranscriptFromUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ');\n\nif (typeof result === 'object' \u0026\u0026 !Array.isArray(result) \u0026\u0026 'error' in result) {\n  console.log('Error:', result.error);\n  console.log('Video ID:', result.videoId);\n} else if (Array.isArray(result)) {\n  // result is an array of transcript segments\n  console.log(`Found ${result.length} segments`);\n  result.forEach((segment, index) =\u003e {\n    console.log(`Segment ${index + 1}:`, segment.text);\n    console.log(`  Duration: ${segment.duration}s, Offset: ${segment.offset}s`);\n  });\n}\n\n// Text format\nconst text = await getTranscriptFromUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ', {\n  format: 'text',\n  formatOptions: { sentencesPerParagraph: 3 }\n});\n\n// Markdown format with timestamps, save to file\nawait getTranscriptFromUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ', {\n  format: 'markdown',\n  formatOptions: { includeTimestamps: true },\n  outputFile: './transcript.md'\n});\n```\n\n**TypeScript:**\n\n```typescript\nimport { getTranscriptFromUrl, OutputFormat } from './src/index.js';\nimport type { TranscriptSegment } from './src/types.js';\n\n// JSON format (default)\nconst result = await getTranscriptFromUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ');\n\nif (typeof result === 'object' \u0026\u0026 !Array.isArray(result) \u0026\u0026 'error' in result) {\n  console.log('Error:', result.error);\n  console.log('Video ID:', result.videoId);\n} else if (Array.isArray(result)) {\n  // TypeScript knows result is TranscriptSegment[]\n  const segments: TranscriptSegment[] = result;\n  console.log(`Found ${segments.length} segments`);\n  segments.forEach((segment, index) =\u003e {\n    console.log(`Segment ${index + 1}:`, segment.text);\n    console.log(`  Duration: ${segment.duration}s, Offset: ${segment.offset}s`);\n  });\n}\n\n// Text format with enum\nconst text = await getTranscriptFromUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ', {\n  format: OutputFormat.TEXT,\n  formatOptions: { sentencesPerParagraph: 3 }\n});\n\n// Markdown format with timestamps, save to file\nawait getTranscriptFromUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ', {\n  format: OutputFormat.MARKDOWN,\n  formatOptions: { includeTimestamps: true },\n  outputFile: './transcript.md'\n});\n```\n\n## API Reference\n\n### `getTranscriptFromUrl(url, options)`\n\nExtracts transcript from a YouTube video URL or video ID.\n\n**TypeScript Signature:**\n```typescript\nfunction getTranscriptFromUrl(\n  url: string,\n  options?: TranscriptOptions\n): Promise\u003cTranscriptResponse\u003e\n```\n\n**Parameters:**\n- `url` (string): YouTube video URL (e.g., `https://www.youtube.com/watch?v=VIDEO_ID`) or video ID\n- `options` (TranscriptOptions, optional): Configuration options\n  - `format` (OutputFormat | string): Output format - `OutputFormat.JSON`, `OutputFormat.TEXT`, `OutputFormat.MARKDOWN`, or `OutputFormat.SRT` (default: `OutputFormat.JSON`)\n  - `outputFile` (string, optional): File path to save output\n  - `formatOptions` (FormatOptions, optional): Format-specific options\n    - `sentencesPerParagraph` (number): For `'text'` format - number of sentences per paragraph (default: 3)\n    - `includeTimestamps` (boolean): For `'markdown'` format - include timestamps (default: false)\n\n**Returns:**\n- `Promise\u003cTranscriptResponse\u003e`: \n  - `TranscriptSegment[]` - Array of transcript segments (JSON format)\n  - `string` - Formatted string (text, markdown, or srt formats)\n  - `{ success: true; file: string; format: string }` - Success object (when `outputFile` is specified)\n  - `{ error: string; videoId: string }` - Error object on failure\n\n**Type Definitions:**\n```typescript\n// Available enums\nenum OutputFormat {\n  JSON = 'json',\n  TEXT = 'text',\n  MARKDOWN = 'markdown',\n  SRT = 'srt'\n}\n\n// Transcript segment structure\ninterface TranscriptSegment {\n  text: string;\n  duration: number;\n  offset: number;\n  lang: string;\n}\n\n// Options interface\ninterface TranscriptOptions {\n  format?: OutputFormat | string;\n  outputFile?: string;\n  formatOptions?: FormatOptions;\n}\n\ninterface FormatOptions {\n  sentencesPerParagraph?: number;\n  includeTimestamps?: boolean;\n}\n```\n\n**Success Response (Array of segments):**\n```javascript\n[\n  {\n    text: '♪ Never gonna give you up ♪',\n    duration: 1.96,\n    offset: 161.4,\n    lang: 'en'\n  },\n  {\n    text: '♪ Never gonna let you down ♪',\n    duration: 2.2,\n    offset: 163.44,\n    lang: 'en'\n  },\n  // ... more segments\n]\n```\n\n**Error Response (Object):**\n```javascript\n{\n  error: \"No transcript available\",\n  videoId: \"dQw4w9WgXcQ\"\n}\n```\n\n## Integration Examples\n\n### Next.js API Route\n\n```typescript\n// pages/api/transcript.ts or app/api/transcript/route.ts\nimport { getTranscriptFromUrl } from '../../../transcript-engine/dist/index.js';\nimport type { NextApiRequest, NextApiResponse } from 'next';\n\nexport default async function handler(req: NextApiRequest, res: NextApiResponse) {\n  const { url } = req.query;\n\n  if (!url || typeof url !== 'string') {\n    return res.status(400).json({ error: 'URL parameter is required' });\n  }\n\n  try {\n    const result = await getTranscriptFromUrl(url);\n    \n    if (typeof result === 'object' \u0026\u0026 !Array.isArray(result) \u0026\u0026 'error' in result) {\n      return res.status(404).json(result);\n    }\n    \n    return res.status(200).json(result);\n  } catch (error) {\n    const errorMessage = error instanceof Error ? error.message : 'Unknown error';\n    return res.status(500).json({ error: errorMessage });\n  }\n}\n```\n\n### Express Server\n\n```typescript\n// server.ts\nimport express from 'express';\nimport { getTranscriptFromUrl } from './transcript-engine/dist/index.js';\n\nconst app = express();\n\napp.get('/api/transcript', async (req, res) =\u003e {\n  const { url } = req.query;\n\n  if (!url || typeof url !== 'string') {\n    return res.status(400).json({ error: 'URL parameter is required' });\n  }\n\n  try {\n    const result = await getTranscriptFromUrl(url);\n    \n    if (typeof result === 'object' \u0026\u0026 !Array.isArray(result) \u0026\u0026 'error' in result) {\n      return res.status(404).json(result);\n    }\n    \n    return res.status(200).json(result);\n  } catch (error) {\n    const errorMessage = error instanceof Error ? error.message : 'Unknown error';\n    return res.status(500).json({ error: errorMessage });\n  }\n});\n\napp.listen(3000, () =\u003e {\n  console.log('Server running on port 3000');\n});\n```\n\n### Standalone Node Module\n\n```typescript\n// my-script.ts\nimport { getTranscriptFromUrl } from './transcript-engine/dist/index.js';\nimport type { TranscriptSegment } from './transcript-engine/src/types.js';\n\nasync function main(): Promise\u003cvoid\u003e {\n  const videoUrl = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ';\n  const result = await getTranscriptFromUrl(videoUrl);\n\n  if (typeof result === 'object' \u0026\u0026 !Array.isArray(result) \u0026\u0026 'error' in result) {\n    console.error('Failed to get transcript:', result.error);\n    return;\n  }\n\n  if (Array.isArray(result)) {\n    // TypeScript knows result is TranscriptSegment[]\n    const segments: TranscriptSegment[] = result;\n    console.log(`Found ${segments.length} transcript segments`);\n    \n    // Combine all text\n    const fullText = segments.map(segment =\u003e segment.text).join(' ');\n    console.log('\\n--- Full Transcript ---');\n    console.log(fullText);\n    \n    // Or work with individual segments\n    segments.forEach((segment, index) =\u003e {\n      console.log(`\\n[${segment.offset}s] ${segment.text}`);\n    });\n  }\n}\n\nmain();\n```\n\n## Module Structure\n\n```\nyoutube-transcript-extractor/\n├── src/                      # TypeScript source files\n│   ├── index.ts              # Main entry point (module export)\n│   ├── cli.ts                # CLI interface\n│   ├── constants.ts          # Constants and regex patterns\n│   ├── types.ts              # TypeScript type definitions and enums\n│   ├── formatters/\n│   │   └── index.ts          # Format converters (text, markdown, srt, json)\n│   ├── services/\n│   │   ├── fetchTranscript.ts   # Fetches transcript XML from YouTube\n│   │   └── parseTranscript.ts   # Parses XML into segment objects\n│   └── utils/\n│       ├── videoUtils.ts     # Video ID extraction and HTTP utilities\n│       ├── fileWriter.ts     # File writing utilities\n│       └── htmlEntities.ts   # HTML entity decoding utilities\n├── dist/                     # Compiled JavaScript (generated after build)\n├── tsconfig.json             # TypeScript configuration\n├── package.json\n└── README.md\n```\n\n## Output Formats\n\n- **JSON** (default): Returns an array of segment objects with `text`, `duration`, `offset`, and `lang` properties\n- **Text**: Plain text format with paragraphs (configurable sentences per paragraph)\n- **Markdown**: Markdown formatted text with optional timestamps\n- **SRT**: Standard SRT subtitle format for video players\n\n## How It Works\n\n1. **URL Parsing**: Extracts video ID from YouTube URL\n2. **API Key Extraction**: Fetches YouTube watch page and extracts Innertube API key\n3. **Caption Discovery**: Uses YouTube Innertube API to discover available caption tracks\n4. **Transcript Fetching**: Downloads transcript XML from YouTube's caption endpoint\n5. **Parsing**: Parses XML into segment objects with text, duration, offset, and language\n6. **HTML Entity Decoding**: Automatically decodes HTML entities (e.g., `\u0026amp;#39;` → `'`, `\u0026amp;quot;` → `\"`) to ensure clean, readable text\n7. **Formatting**: Converts segments to requested format (JSON, text, markdown, or SRT)\n\n## Error Handling\n\nThe module handles various error scenarios:\n\n- **Video Unavailable**: Video doesn't exist or has been removed\n- **No Transcript Available**: Video doesn't have captions\n- **Transcripts Disabled**: Captions are disabled for the video\n- **Too Many Requests**: Rate limiting from YouTube\n\nAll errors return a JSON response with `error` and `videoId` fields.\n\n## Requirements\n\n- Node.js \u003e= 18.0.0 (for built-in fetch API)\n- TypeScript \u003e= 5.3.3 (for development)\n- @types/node \u003e= 20.10.6 (for TypeScript types)\n\n## Development\n\nTo work with the source code:\n\n```bash\n# Install dependencies\nnpm install\n\n# Build TypeScript to JavaScript\nnpm run build\n\n# Watch mode for development\nnpm run dev\n```\n\nThe compiled JavaScript files will be output to the `dist/` directory.\n\n## License\n\nMIT\n\n## Notes\n\n- This module uses YouTube's public caption endpoints - no authentication required\n- Supports both manual and auto-generated captions\n- Works with any public YouTube video that has captions available\n- The module is designed to be simple, modular, and easy to integrate\n- Written in TypeScript with full type safety and IntelliSense support\n- All types and enums are exported from `src/types.ts` for use in TypeScript projects\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevdogukan%2Fyoutube-transcript-extarctor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevdogukan%2Fyoutube-transcript-extarctor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevdogukan%2Fyoutube-transcript-extarctor/lists"}