https://github.com/ole/transcribe
A Swift parser for output files from automated transcription services. An experiment inspired by the Swift Community Podcast.
https://github.com/ole/transcribe
Last synced: about 1 year ago
JSON representation
A Swift parser for output files from automated transcription services. An experiment inspired by the Swift Community Podcast.
- Host: GitHub
- URL: https://github.com/ole/transcribe
- Owner: ole
- License: mit
- Created: 2019-02-03T17:10:34.000Z (over 7 years ago)
- Default Branch: main
- Last Pushed: 2021-05-03T10:55:13.000Z (about 5 years ago)
- Last Synced: 2025-03-25T02:40:19.321Z (about 1 year ago)
- Language: Swift
- Homepage:
- Size: 597 KB
- Stars: 13
- Watchers: 4
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Transcribe
A Swift parser for output files from automated transcription services.
Created by [Ole Begemann](https://oleb.net), January 2019.
## Status
Very unstable and incomplete. I wrote this as an experiment to transcribe [episode 1 of the Swift Community Podcast](https://github.com/SwiftCommunityPodcast/podcast/issues/15).
## Supported File Formats
The only supported input file format is the JSON produced by the [Amazon Transcribe API](https://aws.amazon.com/transcribe/). It should be possible to add support for other formats (such as [Google Cloud Speech-to-Text](https://cloud.google.com/speech-to-text/)) and come up with a universal data structure that understands multiple input formats.
## Requirements
Swift 4.2. I only tested on macOS 10.14, but it should also run on other platforms supported by Swift.
## Dependencies
None.
## Components
The package consists of two targets:
- `Transcribe`: the library that implements the parsing of transcription files and conversion to other formats.
- `TranscribeCLI`: a command line tool that exposes some of the `Transcribe` functionality on the command line.
## Usage
If you want to create a new transcription, you must first create and run a transcription job on [Amazon Transcribe API](https://aws.amazon.com/transcribe/). This step is not part of this tool. When the transcription job completes, Amazon Transcribe will provide you with a JSON file with the transcription results. This file can be used as the input for this tool.
### In Code
To use the library in a SwiftPM package, add this to your `Package.swift`:
```swift
let package = Package(
...
dependencies: [
.package(url: "https://github.com/ole/transcribe", .branch("master")),
],
targets: [
.target(name: "YOUR_TARGET", dependencies: ["Transcribe"]),
]
)
```
Import the module with `import Transcribe`.
Sample code:
```swift
import Transcribe
let inputFile = URL(fileURLWithPath: "input.json") // Change path to your input file
var transcript = try AmazonTranscribe.Transcript(file: inputFile)
// Print some statistics
print("Number of speakers:", transcript.speakers.count)
print("Speaker labels:", transcript.speakers.map { $0.speakerLabel })
print("Number of segments:", transcript.segments.count)
if let speechBegan = transcript.segments.first?.time.lowerBound,
let speechEnded = transcript.segments.last?.time.upperBound
{
let formatter = DateComponentsFormatter()
formatter.allowedUnits = [.hour, .minute, .second]
formatter.unitsStyle = .positional
formatter.zeroFormattingBehavior = .pad
print("Speaking began at:", formatter.string(from: speechBegan.seconds) ?? "(unable to format timecode)")
print("Speaking ended at:", formatter.string(from: speechEnded.seconds) ?? "(unable to format timecode)")
}
// Set/change speaker names
transcript[speaker: "spk_0"]?.name = "Alice"
transcript[speaker: "spk_1"]?.name = "Bob"
// Save as Markdown
let markdown = transcript.makeMarkdown()
let outputFile = URL(fileURLWithPath: "output.md") // Change path to your output file
try Data(markdown.utf8).write(to: outputFile)
// Save as WebVTT
let webvtt = transcript.makeWebVTT()
let outputFile = URL(fileURLWithPath: "output.vtt") // Change path to your output file
try Data(webvtt.utf8).write(to: outputFile)
```
### On the Command Line
The command line tool takes an input file and converts it to Markdown and WebVTT.
Usage:
```sh
swift run -c release TranscribeCLI --json transcript.json
```
This will print a [WebVTT](https://en.wikipedia.org/wiki/WebVTT)-formatted transcript to standard out.
Run the command line tool with `--help` for the documentation of all options:
```sh
swift run TranscribeCLI --help
```
## Overview of the Main Data Structures
The base data structure for a transcript. It contains a list of _segments_ and a list of _speakers_:
```swift
struct AmazonTranscribe.Transcript {
public var segments: [Segment]
public var speakers: [Speaker]
}
```
A _speaker_ has a label (which identifies the speaker in the transcript) and a name (which can be used when formatting a transcript for output, e.g. to Markdown).
```swift
struct AmazonTranscribe.Speaker {
/// The speaker label in the original `RawTranscript`
public var speakerLabel: String
/// The speaker's name as it should appear in the formatted output.
public var name: String
}
```
A _segment_ is a segment of spoken text, e.g. a sentence or paragraph. A segment has a time range (when it was spoken), a speaker (who spoke), and a list of _fragments_:
```swift
/// A list of consecutive fragments by the same speaker
struct AmazonTranscribe.Segment {
var time: Range
var speakerLabel: String
var fragments: [Fragment]
}
```
A _fragment_ is a single unit of speech, like a single word or a punctuation character. The Amazon Transcribe API collects timecodes on this granularity.
```swift
/// A fragment of transcribed speech. Could be a word or punctuation.
struct AmazonTranscribe.Fragment {
var kind: Kind
var speakerLabel: String
enum Kind {
case pronunciation(Pronunciation)
case punctuation(String)
}
struct Pronunciation {
var time: Range
var content: String
}
}
```
There is also a `struct AmazonTranscribe.RawTranscript` type, which is a 1-to-1 mapping between the Amazon Transcribe JSON format and Swift data types. When you call `AmazonTranscribe.Transcript.init(file:)`, we parse the JSON into a `RawTranscript` value and then transform that into the `Transcript` data structures, which are easier to work with. Users of the library shouldn't need to deal with `RawTranscript` directly.
## License
[MIT](LICENSE.txt).