{"id":14979590,"url":"https://github.com/trldvix/youtube-transcript-api","last_synced_at":"2025-06-17T08:07:54.151Z","repository":{"id":239487330,"uuid":"799649623","full_name":"trldvix/youtube-transcript-api","owner":"trldvix","description":"Java library which allows you to retrieve subtitles/transcripts for a single YouTube video or for an entire playlists or channels.","archived":false,"fork":false,"pushed_at":"2025-04-25T10:18:40.000Z","size":1059,"stargazers_count":23,"open_issues_count":1,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-17T08:07:52.627Z","etag":null,"topics":["asr","captions","java","subtitle","subtitles","transcript","transcripts","translating-transcripts","youtube","youtube-api","youtube-asr","youtube-captions","youtube-subtitle","youtube-subtitles","youtube-transcript","youtube-transcript-api","youtube-transcripts","youtube-video"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trldvix.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-05-12T19:04:12.000Z","updated_at":"2025-06-10T17:16:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"6e018afb-facc-4a7c-a012-8401c78761f3","html_url":"https://github.com/trldvix/youtube-transcript-api","commit_stats":{"total_commits":48,"total_committers":4,"mean_commits":12.0,"dds":0.6875,"last_synced_commit":"97599b3453b9955dd4444e63422f7fdcb1f6c116"},"previous_names":["thoroldvix/youtube-transcript-api"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/trldvix/youtube-transcript-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trldvix%2Fyoutube-transcript-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trldvix%2Fyoutube-transcript-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trldvix%2Fyoutube-transcript-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trldvix%2Fyoutube-transcript-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trldvix","download_url":"https://codeload.github.com/trldvix/youtube-transcript-api/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trldvix%2Fyoutube-transcript-api/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260318683,"owners_count":22991121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","captions","java","subtitle","subtitles","transcript","transcripts","translating-transcripts","youtube","youtube-api","youtube-asr","youtube-captions","youtube-subtitle","youtube-subtitles","youtube-transcript","youtube-transcript-api","youtube-transcripts","youtube-video"],"created_at":"2024-09-24T14:00:22.036Z","updated_at":"2025-06-17T08:07:53.711Z","avatar_url":"https://github.com/trldvix.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📝 YouTube Transcript API\n\n![Java CI](https://github.com/thoroldvix/youtube-transcript-api/actions/workflows/ci.yml/badge.svg)\n[![Maven Central](https://img.shields.io/maven-central/v/io.github.thoroldvix/youtube-transcript-api)](https://search.maven.org/artifact/io.github.thoroldvix/youtube-transcript-api)\n[![Javadoc](https://img.shields.io/badge/JavaDoc-Online-green)](https://thoroldvix.github.io/youtube-transcript-api/javadoc/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n## ⚠️WARNING ⚠️\n\n### This library uses undocumented YouTube API, so it's possible that it will stop working at any time. Use at your own risk.\n\n\u003e **Note:** If you want to use this library on Android platform, refer to\n\u003e [Android compatibility](#-android-compatibility).\n\n## 📖 Introduction\n\nJava library which allows you to retrieve subtitles/transcripts for a YouTube video.\nIt supports manual and automatically generated subtitles, bulk transcript retrieval for all videos in the playlist or\non the channel and does not use headless browser for scraping.\nInspired by [Python library](https://github.com/jdepoix/youtube-transcript-api).\n\n## ☑️ Features\n\n✅ Manual transcripts retrieval\n\n✅ Automatically generated transcripts retrieval\n\n✅ Bulk transcript retrieval for all videos in the playlist or channel\n\n✅ Transcript translation\n\n✅ Transcript formatting\n\n✅ Easy-to-use API\n\n✅ Supports Java 11 and above\n\n## 🛠️ Installation\n\n### Maven\n\n```xml\n\n\u003cdependency\u003e\n    \u003cgroupId\u003eio.github.thoroldvix\u003c/groupId\u003e\n    \u003cartifactId\u003eyoutube-transcript-api\u003c/artifactId\u003e\n    \u003cversion\u003e0.3.6\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### Gradle\n\n```groovy\nimplementation 'io.github.thoroldvix:youtube-transcript-api:0.3.6'\n```\n\n### Gradle (kts)\n\n```kotlin\nimplementation(\"io.github.thoroldvix:youtube-transcript-api:0.3.6\")\n```\n\n## 🔰 Getting Started\n\nTo start using YouTube Transcript API, you need to create an instance of `YoutubeTranscriptApi` by\ncalling `createDefault`\nmethod of `TranscriptApiFactory`. Then you can call `listTranscripts` to get a list of all available transcripts for a\nvideo:\n\n```java\n// Create a new default YoutubeTranscriptApi instance\nYoutubeTranscriptApi youtubeTranscriptApi = TranscriptApiFactory.createDefault();\n\n// Retrieve all available transcripts for a given video\nTranscriptList transcriptList = youtubeTranscriptApi.listTranscripts(\"videoId\");\n```\n\n`TranscripList` is an iterable which contains all available transcripts for a video and provides methods\nfor [finding specific transcripts](#find-transcripts) by language or by type (manual or automatically generated).\n\n```java\nTranscriptList transcriptList = youtubeTranscriptApi.listTranscripts(\"videoId\");\n\n// Iterate over transcript list\nfor(Transcript transcript : transcriptList) {\n        System.out.println(transcript);\n}\n\n// Find transcript in specific language\nTranscript transcript = transcriptList.findTranscript(\"en\");\n\n// Find manually created transcript\nTranscript manualyCreatedTranscript = transcriptList.findManualTranscript(\"en\");\n\n// Find automatically generated transcript\nTranscript automaticallyGeneratedTranscript = transcriptList.findGeneratedTranscript(\"en\");\n```\n\n`Transcript` object contains [transcript metadata](#transcript-metadata) and provides methods for translating the\ntranscript to another language\nand fetching the actual content of the transcript.\n\n```java\nTranscript transcript = transcriptList.findTranscript(\"en\");\n\n// Translate transcript to another language\nTranscript translatedTranscript = transcript.translate(\"de\");\n\n// Retrieve transcript content\nTranscriptContent transcriptContent = transcript.fetch();\n```\n\n`TranscriptContent` contains actual transcript content, storing it as a list of `Fragment`.\nEach `Fragment` contains 'text', 'start' and 'duration'\nattributes. If you try to print the `TranscriptContent`, you will get the output looking like this:\n\n```text\ncontent=[{text='Text',start=0.0,dur=1.54},{text='Another text',start=1.54,dur=4.16}]\n```\n\n\u003e **Note:** If you want to get transcript content in a different format, refer\n\u003e to [Use Formatters](#use-formatters).\n\nYou can also use `getTranscript`:\n\n```java\nTranscriptContent transcriptContent = youtubeTranscriptApi.getTranscript(\"videoId\", \"en\");\n```\n\nThis is equivalent to:\n\n```java\nTranscriptContent transcriptContent = youtubeTranscriptApi.listTranscripts(\"videoId\")\n        .findTranscript(\"en\")\n        .fetch();\n```\n\nGiven that English is the most common language, you can omit the language code, and it will default to English:\n\n```java\n// Retrieve transcript content in english\nTranscriptContent transcriptContent = youtubeTranscriptApi.listTranscripts(\"videoId\")\n        //no language code defaults to english\n        .findTranscript()\n        .fetch();\n// Or\nTranscriptContent transcriptContent = youtubeTranscriptApi.getTranscript(\"videoId\");\n```\n\nFor bulk transcript retrieval see [Bulk Transcript Retrieval](#bulk-transcript-retrieval).\n\n## 🤖 Android compatibility\nThis library uses Java 11 HttpClient for making YouTube requests by default, it was done so it depends on minimal amount\nof 3rd party libraries. Since Android SDK doesn't include Java 11 HttpClient, you will have to implement\nyour own `YoutubeClient` for it to work.\n\nYou can check how to do it in [YoutubeClient Customization and Proxy](#youtubeclient-customization-and-proxy).\n\n## 🔧 Detailed Usage\n\n### Use fallback language\n\nIn case if desired language is not available, instead of getting an exception you can pass some other languages that\nwill be used as a fallback.\n\nFor example:\n\n```java\nTranscriptContent transcriptContent = youtubeTranscriptApi.listTranscripts(\"videoId\")\n        .findTranscript(\"de\", \"en\")\n        .fetch();\n\n// Or\nTranscriptContent transcriptContent = youtubeTranscriptApi.getTranscript(\"videoId\", \"de\", \"en\");\n```\n\nIt will first look for a transcript in German, and if it doesn't find one, it will then look for one in English, and so\non.\n\n### Find transcripts\n\nBy default, `findTranscript` will always pick manually created transcripts first and then automatically generated ones.\nIf you want to get only automatically generated or only manually created transcripts, you can use `findManualTranscript`\nor `findGeneratedTranscript`.\n\n```java\n// Retrieve manually created transcript\nTranscript manualyCreatedTranscript = transcriptList.findManualTranscript(\"en\");\n\n// Retrieve automatically generated transcript\nTranscript automaticallyGeneratedTranscript = transcriptList.findGeneratedTranscript(\"en\");\n```\n\n`findGeneratedTranscript` and `findManualTranscript` both\nsupport [fallback languages](#use-fallback-language).\n\n### Transcript metadata\n\n`Transcript` object contains several methods for retrieving transcript metadata:\n\n```java\nString videoId = transcript.getVideoId();\n\nString language = transcript.getLanguage();\n\nString languageCode = transcript.getLanguageCode();\n\n// API URL used to fetch transcript content\nString apiUrl = transcript.getApiUrl();\n\n// Whether it has been manually created or automatically generated by YouTube\nboolean isGenerated = transcript.isGenerated();\n\n// Whether this transcript can be translated or not\nboolean isTranslatable = transcript.isTranslatable();\n\n// Set of language codes which represent available translation languages\nSet\u003cString\u003e translationLanguages = transcript.getTranslationLanguages();\n```\n\n### Use Formatters\n\nBy default, if you try to print `TranscriptContent` it will return the following string representation:\n\n```text\ncontent=[{text='Text',start=0.0,dur=1.54},{text='Another text',start=1.54,dur=4.16}]\n```\n\nSince this default format may not be suitable for all scenarios, you can implement the `TranscriptFormatter` interface\nto customize the formatting of the content.\n\n```java\n// Create a new custom formatter\nFormatter transcriptFormatter = new MyCustomFormatter();\n\n// Format transcript content\nString formattedContent = transcriptFormatter.format(transcriptContent);\n```\n\nThe library offers several built-in formatters:\n\n- `JSONFormatter` - Formats content as JSON\n- `JSONPrettyFormatter` - Formats content as pretty-printed JSON\n- `TextFormatter` - Formats content as plain text without timestamps\n- `WebVTTFormatter` - Formats content as [WebVTT](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API)\n- `SRTFormatter` - Formats content as [SRT](https://www.3playmedia.com/blog/create-srt-file/)\n\nThese formatters can be accessed from the `TranscriptFormatters` class:\n\n```java\n// Get json formatter\nTranscriptFormatter jsonFormatter = TranscriptFormatters.jsonFormatter();\n\nString formattedContent = jsonFormatter.format(transcriptContent);\n````\n\n### YoutubeClient Customization and Proxy\n\nBy default, `YoutubeTranscriptApi` uses Java 11 HttpClient for making requests to YouTube, if you want to use a\ndifferent client or use a proxy,\nyou can create your own YouTube client by implementing the `YoutubeClient` interface.\n\nHere is example implementation using OkHttp:\n\n```java\npublic class OkHttpYoutubeClient implements YoutubeClient {\n\n    private final OkHttpClient client;\n\n    public OkHttpYoutubeClient() {\n      this.client = new OkHttpClient();\n    }\n\n    @Override\n    public String get(String url, Map\u003cString, String\u003e headers) throws TranscriptRetrievalException {\n        Request request = new Request.Builder()\n                .headers(Headers.of(headers))\n                .url(url)\n                .build();\n\n        return sendGetRequest(request);\n    }\n\n    @Override\n    public String get(YtApiV3Endpoint endpoint, Map\u003cString, String\u003e params) throws TranscriptRetrievalException {\n        Request request = new Request.Builder()\n                .url(endpoint.url(params))\n                .build();\n\n        return sendGetRequest(request);\n    }\n\n    private String sendGetRequest(Request request) throws TranscriptRetrievalException {\n        try (Response response = client.newCall(request).execute()) {\n            if (response.isSuccessful()) {\n                ResponseBody body = response.body();\n                if (body == null) {\n                    throw new TranscriptRetrievalException(\"Response body is null\");\n                }\n                return body.string();\n            }\n        } catch (IOException e) {\n            throw new TranscriptRetrievalException(\"Failed to retrieve data from YouTube\", e);\n        }\n        throw new TranscriptRetrievalException(\"Failed to retrieve data from YouTube\");\n    }\n}\n```\nAfter implementing your custom `YouTubeClient` you will need to pass it to `TranscriptApiFactory` `createWithClient` method.\n\n```java\nYoutubeClient okHttpClient = new OkHttpYoutubeClient();\nYoutubeTranscriptApi youtubeTranscriptApi = TranscriptApiFactory.createWithClient(okHttpClient);\n```\n\n### Cookies\n\nSome videos may be age-restricted, requiring authentication to access the transcript.\nTo achieve this, obtain access to the desired video in a browser and download the cookies in Netscape format, storing\nthem as a TXT file.\nYou can use extensions\nlike [Get cookies.txt LOCALLY](https://chromewebstore.google.com/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc)\nfor Chrome or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) for Firefox to do this.\n`YoutubeTranscriptApi` contains `listTranscriptsWithCookies` and `getTranscriptWithCookies` which accept a path to the\ncookies.txt file.\n\n```java\n// Retrieve transcript list\nTranscriptList transcriptList = youtubeTranscriptApi.listTranscriptsWithCookies(\"videoId\", \"path/to/cookies.txt\");\n\n// Get transcript content\nTranscriptContent transcriptContent = youtubeTranscriptApi.getTranscriptWithCookies(\"videoId\", \"path/to/cookies.txt\", \"en\");\n```\n\n### Bulk Transcript Retrieval\n\nThere are a few methods for bulk transcript retrieval in `YoutubeTranscriptApi` \n\nPlaylists and channels information is retrieved from\nthe [YouTube V3 API](https://developers.google.com/youtube/v3/docs/),\nso you will need to provide API key for all methods.\n\nAll methods take a `TranscriptRequest` object as a parameter,\nwhich contains the following fields:\n\n- `apiKey` - YouTube API key.\n- `stopOnError`(optional, defaults to `true`) - Whether to stop on the first error or continue. If true, the method will\n  fail fast by throwing an error if one of the transcripts could not be retrieved,\n  otherwise it will ignore failed transcripts.\n\n- `cookies` (optional) - Path to [cookies.txt](#cookies) file.\n\nAll methods return a map which contains the video ID as a key and the corresponding result as a value.\n\n```java\n// Create a new default PlaylistsTranscriptApi instance\nYoutubeTranscriptApi youtubeTranscriptApi = TranscriptApiFactory.createDefault();\n\n//Create request object\nTranscriptRequest request = new TranscriptRequest(\"apiKey\");\n\n// Retrieve all available transcripts for a given playlist\nMap\u003cString, TranscriptList\u003e transcriptLists = youtubeTranscriptApi.listTranscriptsForPlaylist(\"playlistId\", request);\n\n// Retrieve all available transcripts for a given channel\nMap\u003cString, TranscriptList\u003e transcriptLists = youtubeTranscriptApi.listTranscriptsForChannel(\"channelName\", request);\n```\n\nSame as with the `getTranscript` method, you can also fetch transcript content directly\nusing [fallback languages](#use-fallback-language) if needed.\n\n```java\n//Create request object\nTranscriptRequest request = new TranscriptRequest(\"apiKey\");\n\n// Retrieve transcript content for all videos in a playlist\nMap\u003cString, TranscriptContent\u003e transcriptLists = youtubeTranscriptApi.getTranscriptsForPlaylist(\"playlistId\", request);\n\n// Retrieve transcript content for all videos in a channel\nMap\u003cString, TranscriptContent\u003e transcriptLists = youtubeTranscriptApi.getTranscriptsForChannel(\"channelName\", request, \"en\", \"de\");\n```\n\n\u003e **Note:** If you want to get transcript content in a different format, refer\n\u003e to [Use Formatters](#use-formatters).\n\n## 🤓 How it works\n\nWithin each YouTube video page, there exists JSON data containing all the transcript information, including an\nundocumented API URL embedded within its HTML. This JSON looks like this:\n\n```json\n{\n  \"captions\": {\n    \"playerCaptionsTracklistRenderer\": {\n      \"captionTracks\": [\n        {\n          \"baseUrl\": \"https://www.youtube.com/api/timedtext?v=dQw4w9WgXcQ\u0026asr_langs=de,en,es,fr,it,ja,ko,nl,pt,ru\u0026caps=asr\u0026xorp=true\u0026hl=de\u0026ip=0.0.0.0\u0026ipbits=0\u0026expire=1570645639\u0026sparams=ip,ipbits,expire,v,asr_langs,caps,xorp\u0026signature=5939E534881E9A14C14BCEDF370DE7A4E5FD4BE0.01ABE3BA9B2BCDEC6C51D6A9D9F898460495F0F2\u0026key=yt8\u0026lang=de\",\n          \"name\": {\n            \"simpleText\": \"Deutsch\"\n          },\n          \"vssId\": \".de\",\n          \"languageCode\": \"de\",\n          \"isTranslatable\": true\n        },\n        {\n          \"baseUrl\": \"https://www.youtube.com/api/timedtext?v=dQw4w9WgXcQ\u0026asr_langs=de,en,es,fr,it,ja,ko,nl,pt,ru\u0026caps=asr\u0026xorp=true\u0026hl=de\u0026ip=0.0.0.0\u0026ipbits=0\u0026expire=1570645639\u0026sparams=ip,ipbits,expire,v,asr_langs,caps,xorp\u0026signature=5939E534881E9A14C14BCEDF370DE7A4E5FD4BE0.01ABE3BA9B2BCDEC6C51D6A9D9F898460495F0F2\u0026key=yt8\u0026lang=en\",\n          \"name\": {\n            \"simpleText\": \"Englisch\"\n          },\n          \"vssId\": \".en\",\n          \"languageCode\": \"en\",\n          \"kind\": \"asr\",\n          \"isTranslatable\": true\n        }\n      ],\n      \"translationLanguages\": [\n        {\n          \"languageCode\": \"af\",\n          \"languageName\": {\n            \"simpleText\": \"Afrikaans\"\n          }\n        }\n      ]\n    }\n  }\n}\n```\n\nThis library works by making a single GET request to the YouTube page of the specified video, extracting the JSON data\nfrom the HTML, and parsing it to obtain a list of all available transcripts. To fetch the transcript content, it then\nsends a GET request to the API URL extracted from the JSON. The YouTube API returns the transcript content in XML\nformat, like this:\n\n```xml\n\u003c?xml version=\"1.0\" encoding=\"utf-8\" ?\u003e\n\u003ctranscript\u003e\n    \u003ctext start=\"0\" dur=\"1.54\"\u003eSome text\u003c/text\u003e\n    \u003ctext start=\"1.54\" dur=\"4.16\"\u003eSome additional text\u003c/text\u003e\n\u003c/transcript\u003e\n```\n\n## 📖 License\n\nThis library is licensed under the MIT License. See\nthe [LICENSE](https://github.com/dignifiedquire/youtube-transcript-api/blob/master/LICENSE) file for more information.\n\n\n       \n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrldvix%2Fyoutube-transcript-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrldvix%2Fyoutube-transcript-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrldvix%2Fyoutube-transcript-api/lists"}