{"id":16833508,"url":"https://github.com/ole/transcribe","last_synced_at":"2025-04-11T04:33:20.773Z","repository":{"id":77527795,"uuid":"168970356","full_name":"ole/transcribe","owner":"ole","description":"A Swift parser for output files from automated transcription services. An experiment inspired by the Swift Community Podcast.","archived":false,"fork":false,"pushed_at":"2021-05-03T10:55:13.000Z","size":611,"stargazers_count":13,"open_issues_count":2,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-25T02:40:19.321Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ole.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-03T17:10:34.000Z","updated_at":"2021-05-03T10:55:18.000Z","dependencies_parsed_at":null,"dependency_job_id":"98b034f3-9b61-43ff-8710-c5cb318c4aee","html_url":"https://github.com/ole/transcribe","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ole%2Ftranscribe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ole%2Ftranscribe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ole%2Ftranscribe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ole%2Ftranscribe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ole","download_url":"https://codeload.github.com/ole/transcribe/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248345202,"owners_count":21088231,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T11:54:29.229Z","updated_at":"2025-04-11T04:33:20.749Z","avatar_url":"https://github.com/ole.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Transcribe\n\nA Swift parser for output files from automated transcription services.\n\nCreated by [Ole Begemann](https://oleb.net), January 2019.\n\n## Status\n\nVery unstable and incomplete. I wrote this as an experiment to transcribe [episode 1 of the Swift Community Podcast](https://github.com/SwiftCommunityPodcast/podcast/issues/15).\n\n## Supported File Formats\n\nThe only supported input file format is the JSON produced by the [Amazon Transcribe API](https://aws.amazon.com/transcribe/). It should be possible to add support for other formats (such as [Google Cloud Speech-to-Text](https://cloud.google.com/speech-to-text/)) and come up with a universal data structure that understands multiple input formats.\n\n## Requirements\n\nSwift 4.2. I only tested on macOS 10.14, but it should also run on other platforms supported by Swift.\n\n## Dependencies\n\nNone.\n\n## Components\n\nThe package consists of two targets:\n\n- `Transcribe`: the library that implements the parsing of transcription files and conversion to other formats.\n- `TranscribeCLI`: a command line tool that exposes some of the `Transcribe` functionality on the command line.\n\n## Usage\n\nIf you want to create a new transcription, you must first create and run a transcription job on [Amazon Transcribe API](https://aws.amazon.com/transcribe/). This step is not part of this tool. When the transcription job completes, Amazon Transcribe will provide you with a JSON file with the transcription results. This file can be used as the input for this tool.\n\n### In Code\n\nTo use the library in a SwiftPM package, add this to your `Package.swift`:\n\n```swift\nlet package = Package(\n    ...\n    dependencies: [\n        .package(url: \"https://github.com/ole/transcribe\", .branch(\"master\")),\n    ],\n    targets: [\n        .target(name: \"YOUR_TARGET\", dependencies: [\"Transcribe\"]),\n    ]\n)\n```\n\nImport the module with `import Transcribe`.\n\nSample code:\n\n```swift\nimport Transcribe\n\nlet inputFile = URL(fileURLWithPath: \"input.json\") // Change path to your input file\nvar transcript = try AmazonTranscribe.Transcript(file: inputFile)\n\n// Print some statistics\nprint(\"Number of speakers:\", transcript.speakers.count)\nprint(\"Speaker labels:\", transcript.speakers.map { $0.speakerLabel })\nprint(\"Number of segments:\", transcript.segments.count)\nif let speechBegan = transcript.segments.first?.time.lowerBound,\n    let speechEnded = transcript.segments.last?.time.upperBound\n{\n    let formatter = DateComponentsFormatter()\n    formatter.allowedUnits = [.hour, .minute, .second]\n    formatter.unitsStyle = .positional\n    formatter.zeroFormattingBehavior = .pad\n    print(\"Speaking began at:\", formatter.string(from: speechBegan.seconds) ?? \"(unable to format timecode)\")\n    print(\"Speaking ended at:\", formatter.string(from: speechEnded.seconds) ?? \"(unable to format timecode)\")\n}\n\n// Set/change speaker names\ntranscript[speaker: \"spk_0\"]?.name = \"Alice\"\ntranscript[speaker: \"spk_1\"]?.name = \"Bob\"\n\n// Save as Markdown\nlet markdown = transcript.makeMarkdown()\nlet outputFile = URL(fileURLWithPath: \"output.md\") // Change path to your output file\ntry Data(markdown.utf8).write(to: outputFile)\n\n// Save as WebVTT\nlet webvtt = transcript.makeWebVTT()\nlet outputFile = URL(fileURLWithPath: \"output.vtt\") // Change path to your output file\ntry Data(webvtt.utf8).write(to: outputFile)\n```\n\n### On the Command Line\n\nThe command line tool takes an input file and converts it to Markdown and WebVTT.\n\nUsage:\n\n```sh\nswift run -c release TranscribeCLI --json transcript.json\n```\n\nThis will print a [WebVTT](https://en.wikipedia.org/wiki/WebVTT)-formatted transcript to standard out.\n\nRun the command line tool with `--help` for the documentation of all options:\n\n```sh\nswift run TranscribeCLI --help\n```\n\n## Overview of the Main Data Structures\n\nThe base data structure for a transcript. It contains a list of _segments_ and a list of _speakers_:\n\n```swift\nstruct AmazonTranscribe.Transcript {\n    public var segments: [Segment]\n    public var speakers: [Speaker]\n}\n```\n\nA _speaker_ has a label (which identifies the speaker in the transcript) and a name (which can be used when formatting a transcript for output, e.g. to Markdown).\n\n```swift\nstruct AmazonTranscribe.Speaker {\n    /// The speaker label in the original `RawTranscript`\n    public var speakerLabel: String\n    /// The speaker's name as it should appear in the formatted output.\n    public var name: String\n}\n```\n\nA _segment_ is a segment of spoken text, e.g. a sentence or paragraph. A segment has a time range (when it was spoken), a speaker (who spoke), and a list of _fragments_:\n\n```swift\n/// A list of consecutive fragments by the same speaker\nstruct AmazonTranscribe.Segment {\n    var time: Range\u003cTimecode\u003e\n    var speakerLabel: String\n    var fragments: [Fragment]\n}\n```\n\nA _fragment_ is a single unit of speech, like a single word or a punctuation character. The Amazon Transcribe API collects timecodes on this granularity.\n\n```swift\n/// A fragment of transcribed speech. Could be a word or punctuation.\nstruct AmazonTranscribe.Fragment {\n    var kind: Kind\n    var speakerLabel: String\n\n    enum Kind {\n        case pronunciation(Pronunciation)\n        case punctuation(String)\n    }\n\n    struct Pronunciation {\n        var time: Range\u003cTimecode\u003e\n        var content: String\n    }\n}\n```\n\nThere is also a `struct AmazonTranscribe.RawTranscript` type, which is a 1-to-1 mapping between the Amazon Transcribe JSON format and Swift data types. When you call `AmazonTranscribe.Transcript.init(file:)`, we parse the JSON into a `RawTranscript` value and then transform that into the `Transcript` data structures, which are easier to work with. Users of the library shouldn't need to deal with `RawTranscript` directly.\n\n## License\n\n[MIT](LICENSE.txt).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fole%2Ftranscribe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fole%2Ftranscribe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fole%2Ftranscribe/lists"}