{"id":48315363,"url":"https://github.com/stevencrader/transcriptator","last_synced_at":"2026-04-05T00:30:14.376Z","repository":{"id":143227673,"uuid":"611566301","full_name":"stevencrader/transcriptator","owner":"stevencrader","description":"Library for converting the various transcript file formats to a common format.","archived":false,"fork":false,"pushed_at":"2024-01-28T05:48:27.000Z","size":939,"stargazers_count":13,"open_issues_count":2,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-10-18T19:19:58.310Z","etag":null,"topics":["podcasting","transcripts"],"latest_commit_sha":null,"homepage":"https://transcriptator.com/","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stevencrader.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"Contributing.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-03-09T04:38:30.000Z","updated_at":"2025-05-22T04:02:02.000Z","dependencies_parsed_at":"2023-10-15T07:40:21.244Z","dependency_job_id":null,"html_url":"https://github.com/stevencrader/transcriptator","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/stevencrader/transcriptator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevencrader%2Ftranscriptator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevencrader%2Ftranscriptator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevencrader%2Ftranscriptator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevencrader%2Ftranscriptator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stevencrader","download_url":"https://codeload.github.com/stevencrader/transcriptator/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevencrader%2Ftranscriptator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31420016,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T00:25:07.052Z","status":"ssl_error","status_checked_at":"2026-04-05T00:25:05.923Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["podcasting","transcripts"],"created_at":"2026-04-05T00:30:13.507Z","updated_at":"2026-04-05T00:30:14.298Z","avatar_url":"https://github.com/stevencrader.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Transcriptator\n\n\u003cdiv align=\"center\"\u003e\n\n[![GitHub forks](https://img.shields.io/github/forks/stevencrader/transcriptator.svg?style=social\u0026label=Fork\u0026maxAge=2592000)](https://github.com/stevencrader/transcriptator/network/)\n[![GitHub stars](https://img.shields.io/github/stars/stevencrader/transcriptator.svg?style=social\u0026label=Star\u0026maxAge=2592000)](https://github.com/stevencrader/transcriptator/stargazers/)\n\u003cbr\u003e\n\n[![npm](https://img.shields.io/npm/v/transcriptator)](https://www.npmjs.com/package/transcriptator)\n[![npm](https://img.shields.io/npm/v/transcriptator?label=yarn)](https://yarnpkg.com/package?name=transcriptator)\n[![install size](https://packagephobia.com/badge?p=transcriptator)](https://packagephobia.com/result?p=transcriptator)\n![License](https://img.shields.io/badge/License-MIT-blue.svg)\n[![Number of Contributors](https://img.shields.io/github/contributors/stevencrader/transcriptator?style=flat\u0026label=Contributors)](https://github.com/stevencrader/transcriptator/graphs/contributors)\n\u003cbr/\u003e\n\n[![Issues opened](https://img.shields.io/github/issues/stevencrader/transcriptator?label=Issues)](https://github.com/stevencrader/transcriptator)\n[![PRs open](https://img.shields.io/github/issues-pr/stevencrader/transcriptator?label=Pull%20Requests)](https://github.com/stevencrader/transcriptator/pulls)\n[![PRs closed](https://img.shields.io/github/issues-pr-closed/stevencrader/transcriptator?label=Pull%20Requests)](https://github.com/stevencrader/transcriptator/pulls?q=is%3Apr+is%3Aclosed)\n[![codecov](https://codecov.io/gh/stevencrader/transcriptator/branch/master/graph/badge.svg?token=KZMGXY8LIH)](https://codecov.io/gh/stevencrader/transcriptator)\n\u003cbr/\u003e\n\n\u003c/div\u003e\n\nLibrary for converting the various transcript file formats to a common format.\n\nOriginally designed to help users of the [Podcast Namespace](https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/1.0.md#transcript) `podcast:transcript` tag.\n\n## Installation\n\nThis is a Node.js module available through npm or yarn.\n\n### Using npm:\n\n```bash\nnpm install transcriptator\n```\n\n### Using yarn:\n\n```bash\nyarn add transcriptator\n```\n\n### Using CDN:\n\n[transcriptator jsDelivr CDN](https://www.jsdelivr.com/package/npm/transcriptator)\n\n## Usage\n\nThere are three primary methods and two types. See the jsdoc for additional information.\n\nThe `convertFile` function accepts the transcript file data and parses it in to an array of `Segment`.\nIf `transcriptFormat` is not defined, will use `determineFormat` to attempt to identify the type.\n\n    convertFile(data: string, transcriptFormat: TranscriptFormat = undefined): Array\u003cSegment\u003e\n\nThe `determineFormat` function accepts the transcript file data and attempts to identify the `TranscriptFormat`.\n\n    determineFormat(data: string): TranscriptFormat\n\nThe `TranscriptFormat` enum defines the allowable transcript types supported by Transcriptator.\n\nThe `Segment` type defines the segment/cue of the transcript.\n\n### Custom timestamp formatter\n\nTo change the way the `startTime` and `endTime` are formatted in `startTimeFormatted` and `endTimeFormatted`,\nregister a custom formatter to be used instead.\n\nThe formatter function shall accept a single argument as a number and return the value formatted as a string.\n\n```javascript\nimport { TimestampFormatter } from \"transcriptator\"\n\nfunction customFormatter(timestamp) {\n    return timestamp.toString()\n}\n\nTimestampFormatter.registerCustomFormatter(customFormatter)\n```\n\n### Options for segments\n\nAdditional options are available for combining or formatting two or more segments\n\nTo change the options, use the `Options.setOptions` function.\n\nThe options only need to be specified once and will be used when parsing any transcript data.\n\nTo restore options to their default value, call `Options.restoreDefaultSettings`.\n\nThe `IOptions` interface used by `Options` defines options for combining and formatting parsed segments.\n\n-   `combineEqualTimes`: boolean\n    -   Combine segments if the `Segment.startTime`, `Segment.endTime`, and `Segment.speaker` match between the current and prior segments\n    -   Can be used with `combineSegments`. The `combineEqualTimes` rule is applied first.\n    -   Can be used with `speakerChange`. The `speakerChange` rule is applied last.\n    -   Cannot be used with `combineSpeaker`\n    -   Default: false\n-   `combineEqualTimesSeparator`: string\n    -   Character to use when `combineEqualTimes` is true.\n    -   Default: `\\n`\n-   `combineSegments`: boolean\n    -   Combine segments where speaker is the same and concatenated `body` fits in the `combineSegmentsLength`\n    -   Can be used with `combineEqualTimes`. The `combineSegments` rule is applied first.\n    -   Can be used with `speakerChange`. The `speakerChange` rule is applied last.\n    -   Cannot be used with `combineSpeaker`\n    -   Default: false\n-   `combineSegmentsLength`: number\n    -   Max length of body text to use when `combineSegments` is true\n    -   Default: See `DEFAULT_COMBINE_SEGMENTS_LENGTH`\n-   `combineSpeaker`: boolean\n    -   Combine consecutive segments from the same speaker.\n    -   Note: If this is enabled, `combineEqualTimes` and `combineSegments` will not be applied.\n    -   Warning: if the transcript does not contain speaker information, resulting segment will contain entire transcript text.\n    -   Default: false\n-   `speakerChange`: boolean\n    -   Only include `Segment.speaker` when speaker changes\n    -   May be used in combination with `combineSpeaker`, `combineEqualTimes`, or `combineSegments`\n    -   Default: false\n\n```javascript\nimport { Options } from \"transcriptator\"\n\nOptions.setOptions({\n    combineSegments: true,\n    combineSegmentsLength: 32,\n})\n```\n\n## Supported File Formats\n\n### SRT\n\nTranscripts which follow the SRT/SubRip format\n\n```text\n1\n00:00:00,780 --\u003e 00:00:06,210\nAdam Curry: podcasting 2.0 March\n4 2023 Episode 124 on D flat\n\n2\n00:00:06,210 --\u003e 00:00:12,990\nformable hello everybody welcome\nto a delayed board meeting of\n\n```\n\nThe timestamp may contain the hour and minutes but is not required. The millisecond may be separated with either a comma or decimal.\n\nAttempts to find the speaker's name from the beginning of the first line of each segment.\n\nReferences:\n\n-   https://en.wikipedia.org/wiki/SubRip\n\n### HTML\n\nHTML data in format below are considered to be transcripts.\n\nThe elements `cite`, `time`, and `p` are used to define a segment.\nThe `cite` element is not required. The order is also not required.\n\nThe elements may either be a child of the document directly or a direct child of the `html` or `body` element.\n\nElements do not need to be on separate lines.\n\n**Example 1**\n\n```html\n\u003chtml\u003e\n    \u003cbody\u003e\n        \u003ccite\u003eAlban:\u003c/cite\u003e\n        \u003ctime\u003e0:00\u003c/time\u003e\n        \u003cp\u003e\n            It is so stinking nice to like, show up and record this show. And Travis has already put together an\n            outline. Kevin's got suggestions, I throw my thoughts into the mix. And then Travis goes and does all the\n            work from there, too. It's out into the wild. And I don't see anything. That's an absolute joy for at least\n            two thirds of the team. Yeah, I mean, exactly.\n        \u003c/p\u003e\n        \u003ccite\u003eKevin:\u003c/cite\u003e\n        \u003ctime\u003e0:30\u003c/time\u003e\n        \u003cp\u003e\n            You guys remember, like two months ago, when you were like, We're going all in on video Buzzcast. I was\n            like, that's, I mean, I will agree and commit and disagree, disagree and commit, I'll do something. But I\n            don't want to do this.\n        \u003c/p\u003e\n    \u003c/body\u003e\n\u003c/html\u003e\n```\n\n**Example 2**\n\n```html\n\u003cp\u003e\n    It is so stinking nice to like, show up and record this show. And Travis has already put together an outline.\n    Kevin's got suggestions, I throw my thoughts into the mix. And then Travis goes and does all the work from there,\n    too. It's out into the wild. And I don't see anything. That's an absolute joy for at least two thirds of the team.\n    Yeah, I mean, exactly.\n\u003c/p\u003e\n\u003ctime\u003e0:00\u003c/time\u003e\n\u003cp\u003e\n    You guys remember, like two months ago, when you were like, We're going all in on video Buzzcast. I was like,\n    that's, I mean, I will agree and commit and disagree, disagree and commit, I'll do something. But I don't want to do\n    this.\n\u003c/p\u003e\n\u003ctime\u003e0:30\u003c/time\u003e\n```\n\n### JSON\n\nJSON data in one of the formats below are considered to be transcripts.\n\nIn both formats, the data does not need to be in pretty print format.\n\n**Format 1**\n\n```json\n{\n    \"version\": \"1.0.0\",\n    \"segments\": [\n        {\n            \"speaker\": \"Alban\",\n            \"startTime\": 0.0,\n            \"endTime\": 4.8,\n            \"body\": \"It is so stinking nice to\"\n        },\n        {\n            \"speaker\": \"Alban\",\n            \"startTime\": 0.0,\n            \"endTime\": 4.8,\n            \"body\": \"like, show up and record this\"\n        }\n    ]\n}\n```\n\nThere must be a `segments` list of objects containing `speaker`, `startTime`, `endTime`, and `body`.\n\nThe `startTime` and `endTime` are assumed to be in seconds.\n\n**Format 2**\n\n```json\n[\n    {\n        \"start\": 1,\n        \"end\": 5000,\n        \"text\": \"Subtitles: @marlonrock1986 (^^V^^)\"\n    },\n    {\n        \"start\": 25801,\n        \"end\": 28700,\n        \"text\": \"It's another hot, sunny day today\\nhere in Southern California.\"\n    }\n]\n```\n\nThe top level element must be a list of objects containing `start`, `end`, and `text`.\n\nThe `start` and `end` are assumed to be in milliseconds.\n\nAttempts to find the speaker's name from the beginning of the `text` value.\n\n### WebVTT\n\nTranscripts which follow the WebVTT/VTT format\n\n```\nWEBVTT\n\n1\n00:00:00.001 --\u003e 00:00:05.000\nSubtitles: @marlonrock1986 (^^V^^)\n\n2\n00:00:25.801 --\u003e 00:00:28.700\nIt's another hot, sunny day today\nhere in Southern California.\n\n```\n\nThe index number is optional:\n\n```\nWEBVTT\n\n00:00:00.000 --\u003e 00:00:11.840\n Buenas, bienvenidas de vuelta a KDE Express. Esta vez para no perder el ritmo volvemos a la\n\n00:00:11.840 --\u003e 00:00:16.800\n versión movilidad que no tenemos a los compañeros disponibles y hoy quería haceros un especial\n```\n\nThe timestamp may contain the hour and minutes but is not required. The millisecond may be separated with either a comma or decimal.\n\nAttempts to find the speaker's name from the beginning of the first line of each segment.\n\nReferences:\n\n-   https://www.w3.org/TR/webvtt1/\n-   https://en.wikipedia.org/wiki/WebVTT\n\n## Test Transcripts\n\nTranscripts used for testing are excerpts from the following shows.\n\n-   [Podcasting 2.0](https://podcastindex.org/podcast/920666)\n    -   podcasting_20_episode_124.srt (from Episode 124)\n-   [Buzzcast](https://buzzcast.buzzsprout.com/231452/9092843)\n    -   buzzcast.html\n    -   buzzcast.srt\n    -   buzzcast.json\n-   [How to Start a Podcast](https://feeds.buzzsprout.com/1/2562823/)\n    -   how_to_start_a_podcast.json\n    -   how_to_start_a_podcast.html\n-   [Podnews Daily (2024-01-25)](https://podnews.net/update/nz-podcast-summit-2024)\n    -   podnews_daily_2024-01-25.vtt\n-   [Podnews Weekly Review (2023-03-17)](https://feeds.buzzsprout.com/1538779/12458004/)\n    -   podnews_weekly_review_2023-03-17.html\n-   [Podnews Weekly Review (2023-05-05)](https://feeds.buzzsprout.com/1538779/12782529/)\n    -   podnews_weekly_review_2023-05-05.json\n-   [Podnews Weekly Review (2024-01-19)](https://feeds.buzzsprout.com/1538779/14338472/)\n    -   podnews_weekly_review_2024-01-19.vtt\n-   [subtitle.js](https://github.com/gsantiago/subtitle.js)\n    -   LaLaLand.vtt\n    -   LaLaLand.json\n-   [KDE Express](https://kdeexpress.gitlab.io/posts/kdeexpress/16-kde-express/)\n    -   kde_express-16_kde_en_telegram.vtt\n\n## Contributing\n\nPlease see the [Contribution Guide](Contributing.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevencrader%2Ftranscriptator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstevencrader%2Ftranscriptator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevencrader%2Ftranscriptator/lists"}