https://github.com/gaelic-ghost/textforspeech

Text normalization and conditioning for speech-safe Swift workflows.
https://github.com/gaelic-ghost/textforspeech
accessibility swift swift-package text-normalization
Last synced: about 1 month ago
JSON representation
Text normalization and conditioning for speech-safe Swift workflows.
Host: GitHub
URL: https://github.com/gaelic-ghost/textforspeech
Owner: gaelic-ghost
License: apache-2.0
Created: 2026-04-05T14:44:34.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-04-11T18:58:44.000Z (about 2 months ago)
Last Synced: 2026-04-11T20:34:55.565Z (about 2 months ago)
Topics: accessibility, swift, swift-package, text-normalization
Language: Swift
Size: 431 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md
Awesome Lists containing this project

README

          # TextForSpeech

A Swift package for turning code-heavy, path-heavy, and markdown-heavy developer text into speech-safe text before it reaches a speech model.

## Table of Contents

- [Overview](#overview)

- [Quick Start](#quick-start)

- [Usage](#usage)

- [Development](#development)

- [Repo Structure](#repo-structure)

- [Release Notes](#release-notes)

- [License](#license)

## Overview

### Status

`TextForSpeech` is actively available as the shared normalization package used by `SpeakSwiftly`.

### What This Project Is

`TextForSpeech` owns the text-conditioning step that prepares developer-heavy text before speech generation. It ships one semantic built-in core plus selectable built-in styles, then layers persisted custom profiles on top so callers can tune pronunciation without reimplementing the core normalization behavior.

The package currently has three main responsibilities:

- normalize mixed text such as markdown, logs, CLI output, and prose with embedded code or identifiers

- normalize whole-source input through an explicit source lane

- persist and edit named custom profiles while keeping the built-in base layer always on

### Motivation

Speech models do poorly with raw developer text such as file paths, identifiers, markdown links, inline code, repeated separators, repeated-letter runs, currency and measurement forms like `$9.39` or `42 km`, and terse scalar or math-heavy tokens like `f32`, `cosF32`, or `WorkerRuntime.swift:42`. `TextForSpeech` centralizes those cleanup rules so the same behavior can be reused across callers instead of being reimplemented in app code or worker code.

## Quick Start

Add `TextForSpeech` as a Swift Package Manager dependency, import `TextForSpeech`, then call the namespace-first normalization API:

```swift

import TextForSpeech

let normalized = try await TextForSpeech.Normalize.text("stderr: WorkerRuntime.swift:42")

```

Add the package from its GitHub repository:

```swift

dependencies: [

    .package(url: "https://github.com/gaelic-ghost/TextForSpeech.git", from: "0.19.0"),

],

targets: [

    .executableTarget(

        name: "ExampleApp",

        dependencies: [

            .product(name: "TextForSpeech", package: "TextForSpeech"),

        ]

    ),

]

```

## Usage

Normalize mixed text directly when you want the default built-in `.balanced` style, optional input context, and optional request metadata:

```swift

import TextForSpeech

let normalized = try await TextForSpeech.Normalize.text(

    "stderr: /workspace/SpeakSwiftly/Sources/SpeakSwiftly/WorkerRuntime.swift",

    requestContext: TextForSpeech.RequestContext(

        source: "codex",

        app: "SpeakSwiftly",

        project: "TextForSpeech",

        cwd: "/workspace/SpeakSwiftly",

        repoRoot: "/workspace/SpeakSwiftly"

    )

)

```

The mixed-text path detects the likely outer text format before running normalization. Callers do not provide a text-format hint.

If you want a different shipped listening mode, pass `style:`:

```swift

import TextForSpeech

let normalized = try await TextForSpeech.Normalize.source(

    sourceText,

    as: .swift,

    style: .compact

)

```

The shipped styles differ in concrete coding-agent ways:

- `.compact` assumes more visual context and says less. It drops the broad line-based spoken-code expansion, keeps common shapes terse, and keeps `::` silent, such as `foo()` -> `foo`, `#123` -> `123`, and `--help` -> `help`.

- `.balanced` is the default general-purpose mode. It keeps spoken-code expansion for code-like lines, keeps `::` silent, and speaks common references more explicitly, such as `foo()` -> `foo function`, `#123` -> `issue 123`, `--help` -> `double tack help`, `WorkerRuntime.swift:42` -> `Worker Runtime dot swift at line 42`, and `WorkerRuntime.swift:42:7` -> `Worker Runtime dot swift line 42 column 7`.

- `.explicit` is the audio-first mode. It keeps the same line-based spoken-code expansion as `.balanced`, but uses more narrated phrasing for common coding-agent shapes and says `::` as `double colon`, such as `foo()` -> `foo function call`, `#123` -> `issue number 123`, and `--help` -> `long flag help`.

The built-in speech layer also expands common numeric scalar shorthands, currency amounts, and measurement suffixes, so tokens such as `f32` become `float thirty two`, `$9.39` becomes `nine dollars and thirty-nine cents`, `42 km` becomes `forty-two kilometers`, `64Gbps` becomes `sixty four gigabits per second`, and combinations such as `cosF32` become `cosine float thirty two`.

The semantic core also ships extension aliases for especially speech-hostile file types. That includes Xcode-heavy forms such as `.xcodeproj`, `.pbxproj`, `.xcworkspace`, `.xcconfig`, `.xcscheme`, `.xctestplan`, `.xcresult`, `.xcassets`, `.xcstrings`, `.xcprivacy`, and `.dSYM`, plus mixed-stack formats such as `.mdx`, `.tsx`, `.jsx`, `.jsonc`, `.ipynb`, `.wasm`, `.sqlite`, and `.db`.

For repeated file paths in the same utterance, the text path compacts repeated anchors before the built-in path-speaking pass. File-path separators collapse to spacing rather than spoken words, and later repeated mentions can collapse to shorter phrases such as `same directory, Worker Runtime dot swift` or `same path` instead of repeating the full spoken prefix.

Configurable URL, markdown-link, and path handling is planned. The current

defaults are deterministic and always on; future work will review those

behaviors through the existing built-in styles rather than adding a separate

normalization policy type. Path context now lives on `RequestContext`; the

previous `InputContext` type has been removed. Caller-provided text-format and

nested-source hints have been removed in favor of detection and generic

embedded-code fallback. Codex hook payload cleanup will be reviewed with real

examples before deciding whether it belongs in this package or downstream.

Use the source path when the whole input is a source file or editor buffer and the caller already knows the language:

```swift

import TextForSpeech

let normalized = try await TextForSpeech.Normalize.source(

    """

    struct WorkerRuntime {

        let sampleRate: Int

    }

    """,

    as: .swift

)

```

The source path is explicit today but still generic. It normalizes whole-source input more consistently than the mixed-text path, but SwiftSyntax-backed Swift-specific structure is still future roadmap work rather than current behavior.

### Summary-Aware Requests

Normalization is deterministic by default. The normalization entrypoints are async so the same ergonomic call can stay local with `summarize: false` or opt into a model summary with `summarize: true`:

```swift

import TextForSpeech

let normalized = try await TextForSpeech.Normalize.text(

    longDeveloperUpdate,

    summarizationProvider: .openAIResponses,

    summarize: true

)

```

The summarization provider is explicit because each backend option has a different operating surface:

- `.openAIResponses` calls the OpenAI Responses API and reads `OPENAI_API_KEY` from the process environment.

- `.codexExec` runs the local Codex CLI through `codex exec`.

- `.foundationModels` uses Apple's on-device Foundation Models framework when the framework and operating system support it.

- `.test` returns the input unchanged so tests can exercise summary-aware normalization without calling a live provider.

The `summarize` argument defaults to `false`, so deterministic callers do not need a separate convenience method. `TextForSpeech.SummarizationProvider` selects the backend used when `summarize` is `true`.

### Runtime Profiles

Use `TextForSpeech.Runtime` when you need an observable owner for stored custom profiles, one active custom profile id, one selected built-in style, one selected summarization provider, and JSON-backed persistence configured through a small enum:

```swift

import TextForSpeech

let runtime = try TextForSpeech.Runtime(

    builtInStyle: .balanced,

    persistence: .default

)

try runtime.style.setActive(to: .compact)

let logs = try runtime.profiles.create(name: "Logs")

try runtime.profiles.addReplacement(

    TextForSpeech.Replacement("stderr", with: "standard error", id: "stderr-rule"),

    toProfile: logs.id

)

try runtime.profiles.setActive(id: logs.id)

let normalized = try await runtime.normalize.text("stderr and stdout")

try runtime.summarizationProvider.set(.openAIResponses)

let summarized = try await runtime.normalize.text(

    longDeveloperUpdate,

    summarize: true

)

```

The runtime model is intentionally explicit:

- `TextForSpeech.Profile.semanticCore` is the always-on semantic built-in layer.

- `TextForSpeech.Profile.builtInStyle(_:)` returns one shipped style preset.

- `TextForSpeech.Profile.builtInBase(style:)` composes `semanticCore + style preset`.

- `TextForSpeech.Profile.base` is the default `.balanced` built-in base for convenience.

- `TextForSpeech.Profile.default` is the empty default custom profile value.

- `runtime.style.getActive()` returns the currently selected shipped style preset.

- `runtime.style.list()` returns the available built-in style presets with short summaries.

- `runtime.summarizationProvider.get()` returns the provider used by async summary-aware normalization requests.

- `runtime.summarizationProvider.list()` returns the available summarization providers with short summaries.

- `runtime.summarizationProvider.set(_:)` persists the selected summarization provider.

- `runtime.profiles.getActive()` returns the active custom profile's id, a summary, and its replacements.

- `runtime.profiles.getEffective()` returns the active custom profile as merged with the currently selected built-in style.

- `runtime.profiles.get(id:)` reads one stored custom profile summary and its replacements by id.

- `runtime.profiles.create(name:)` creates one stored custom profile and returns its generated id to the caller.

- `runtime.normalize.text(...)` and `runtime.normalize.source(...)` apply `builtInBase(style: style.getActive()) + active custom` without exposing the merged profile value. `summarize` defaults to `false`.

- `try await runtime.normalize.text(..., summarize: true)` and `try await runtime.normalize.source(..., summarize: true)` use the active summarization provider before returning normalized speech-safe text.

Persistence defaults to `.default`. `TextForSpeech.Runtime()` writes to Application Support automatically, namespaced by the host bundle identifier when one is available and falling back to `TextForSpeech` when it is not. Debug builds place the package store under `TextForSpeech-Debug`, including the fallback namespace, so local debug runs do not touch the production package store. Callers that need an explicit location can pass `.file(url)`. The selected built-in style and selected summarization provider are persisted alongside the active custom profile id and stored custom profiles.

## Development

### Setup

`TextForSpeech` is a Swift Package Manager library product targeting iOS 17, macOS 14, and Swift 6 language mode.

No generated project setup is required for ordinary local development. Work from the repository root with SwiftPM.

### Workflow

Use the standard Swift package workflow for code and tests:

```bash

swift build

swift test

```

The repository also uses repo-owned maintainer scripts for validation, shared sync work, and releases:

```bash

sh scripts/repo-maintenance/validate-all.sh

sh scripts/repo-maintenance/sync-shared.sh

sh scripts/repo-maintenance/release.sh --mode standard --version vX.Y.Z

```

For repository workflow expectations, architecture boundaries, and doc-sync rules, see [CONTRIBUTING.md](CONTRIBUTING.md), [ROADMAP.md](ROADMAP.md), and the maintainer notes under [docs/maintainers](docs/maintainers).

### Validation

The baseline verification path for this repository is:

```bash

swift build

swift test

sh scripts/repo-maintenance/validate-all.sh

```

The repository also includes checked-in SwiftFormat and SwiftLint configuration:

```bash

swiftformat --lint --config .swiftformat .

swiftlint lint --config .swiftlint.yml

```

Run those formatter and lint commands when style-tooling changes are in scope or when a change touches enough Swift code that a formatting pass is useful.

## Repo Structure

```text

.

├── Package.swift

├── Sources/TextForSpeech/

│   ├── API/

│   ├── Models/

│   ├── Normalization/

│   └── Runtime/

├── Tests/TextForSpeechTests/

│   ├── Models/

│   ├── Normalization/

│   └── Runtime/

├── docs/

│   ├── maintainers/

│   └── releases/

└── scripts/repo-maintenance/

```

`Sources/TextForSpeech` is organized by responsibility:

- `API/` contains public namespace-first entrypoints such as `Normalize`.

- `Models/` contains core value types such as `Profile`, `Replacement`, `RequestContext`, and `SummarizationProvider`, plus the built-in profile composition surface and semantic-role fragments under `Models/BuiltInProfiles/`.

- `Normalization/` contains the text path, source path, structural markdown parsing, replacement-rule engine, speech helpers, format detection, and summary execution support.

- `Runtime/` contains runtime ownership, grouped profile, style, summary, and persistence handles, persisted state, and runtime-facing errors.

The current source split keeps structural normalization logic separate from durable lexical policy:

- structural work such as markdown parsing, code-span extraction, and format detection stays in code

- durable lexical policy such as built-in aliases, extension aliases, identifier speaking, path speaking, URL speaking, repeated-letter-run handling, and style-specific speaking policy lives in the built-in profile layers

Tests live under `Tests/TextForSpeechTests` and are grouped by role, with focused normalization files for path and identifier behavior, markdown and URL behavior, and broader end-to-end flows.

## Release Notes

Release notes live under [docs/releases](docs/releases). Each release note should stay factual, scoped to the tagged change, and explicit about behavior or API shifts.

Use the repo-owned release command for standard release work:

```bash

sh scripts/repo-maintenance/release.sh --mode standard --version vX.Y.Z

```

## License

This project is licensed under the Apache License 2.0. See [LICENSE](LICENSE) for the full text.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gaelic-ghost/textforspeech

Awesome Lists containing this project

README