https://github.com/fluidinference/text-processing-rs

Rust port of NVIDIA NeMo Text Processing for Inverse Text Normalization
https://github.com/fluidinference/text-processing-rs

Last synced: about 1 month ago
JSON representation

Rust port of NVIDIA NeMo Text Processing for Inverse Text Normalization

Host: GitHub
URL: https://github.com/fluidinference/text-processing-rs
Owner: FluidInference
Created: 2026-02-13T01:34:35.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-04-26T18:00:08.000Z (about 1 month ago)
Last Synced: 2026-04-26T18:22:43.495Z (about 1 month ago)
Language: Rust
Size: 435 KB
Stars: 29
Watchers: 2
Forks: 7
Open Issues: 3
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

          # text-processing-rs

A Rust port of [NVIDIA NeMo Text Processing](https://github.com/NVIDIA/NeMo-text-processing) supporting both **Inverse Text Normalization (ITN)** and **Text Normalization (TN)**.

## What it does

### ITN: Spoken → Written

Converts spoken-form ASR output to written form:

| Input | Output |

|-------|--------|

| two hundred thirty two | 232 |

| five dollars and fifty cents | $5.50 |

| january fifth twenty twenty five | January 5, 2025 |

| quarter past two pm | 02:15 p.m. |

| one point five billion dollars | $1.5 billion |

| seventy two degrees fahrenheit | 72 °F |

### TN: Written → Spoken

Converts written-form text to spoken form (useful for TTS preprocessing):

| Input | Output |

|-------|--------|

| 123 | one hundred twenty three |

| $5.50 | five dollars and fifty cents |

| January 5, 2025 | january fifth twenty twenty five |

| 2:30 PM | two thirty p m |

| 1st | first |

| 200 km/h | two hundred kilometers per hour |

## Usage

### Rust

```rust

use text_processing_rs::{normalize, tn_normalize};

// ITN: spoken → written

let result = normalize("two hundred");

assert_eq!(result, "200");

let result = normalize("five dollars and fifty cents");

assert_eq!(result, "$5.50");

// TN: written → spoken

let result = tn_normalize("$5.50");

assert_eq!(result, "five dollars and fifty cents");

let result = tn_normalize("123");

assert_eq!(result, "one hundred twenty three");

```

### JavaScript (WASM)

Build wasm artifacts:

```bash

npm run wasm:build:node

npm run wasm:build:web

```

Node usage:

```javascript

import * as wasm from "./pkg-node/text_processing_rs.js";

console.log(wasm.normalize("two hundred")); // "200"

console.log(wasm.tnNormalize("$5.50")); // "five dollars and fifty cents"

wasm.addRule("gee pee tee", "GPT");

console.log(wasm.normalize("gee pee tee")); // "GPT"

```

The generated npm package name is `@fluidinference/text-processing-rs`.

Web project usage (Vite / Next.js / webpack):

```bash

npm install @fluidinference/text-processing-rs

```

```javascript

import init, * as wasm from "@fluidinference/text-processing-rs";

async function run() {

  // Loads and initializes the .wasm module (required once at startup)

  await init();

  const itn = wasm.normalize("two hundred");

  const tn = wasm.tnNormalize("$5.50");

  console.log(itn); // "200"

  console.log(tn); // "five dollars and fifty cents"

  wasm.addRule("gee pee tee", "GPT");

  console.log(wasm.normalize("gee pee tee")); // "GPT"

}

run();

```

If your framework supports top-level `await`, you can initialize at module load time:

```javascript

import init, * as wasm from "@fluidinference/text-processing-rs";

await init();

```

Sentence-level normalization scans for normalizable spans within a larger sentence:

```rust

use text_processing_rs::{normalize_sentence, tn_normalize_sentence};

// ITN sentence mode

let result = normalize_sentence("I have twenty one apples");

assert_eq!(result, "I have 21 apples");

// TN sentence mode

let result = tn_normalize_sentence("I paid $5 for 23 items");

assert_eq!(result, "I paid five dollars for twenty three items");

```

### Swift

```swift

import NemoTextProcessing

// ITN: spoken → written

let result = NemoTextProcessing.normalize("two hundred")

// "200"

// TN: written → spoken

let spoken = NemoTextProcessing.tnNormalize("$5.50")

// "five dollars and fifty cents"

// Sentence modes

let itn = NemoTextProcessing.normalizeSentence("I have twenty one apples")

// "I have 21 apples"

let tn = NemoTextProcessing.tnNormalizeSentence("I paid $5 for 23 items")

// "I paid five dollars for twenty three items"

```

### CLI

```bash

# ITN

nemo-itn two hundred thirty two        # → 232

nemo-itn -s "I have twenty one apples" # → I have 21 apples

# TN

nemo-tn 123                            # → one hundred twenty three

nemo-tn '$5.50'                        # → five dollars and fifty cents

nemo-tn -s 'I paid $5 for 23 items'    # → I paid five dollars for twenty three items

# Pipe from stdin

echo "2:30 PM" | nemo-tn               # → two thirty p m

```

## Compatibility

### ITN (Spoken → Written)

**98.6% compatible** with NeMo text processing test suite (1200/1217 tests passing).

| Category | Status |

|----------|--------|

| Cardinal numbers | 100% |

| Ordinal numbers | 100% |

| Decimal numbers | 100% |

| Money | 100% |

| Measurements | 100% |

| Dates | 100% |

| Time | 97% |

| Electronic (email/URL) | 96% |

| Telephone/IP | 96% |

| Whitelist terms | 100% |

### TN (Written → Spoken)

| Category | Examples |

|----------|----------|

| Cardinal numbers | `123` → `one hundred twenty three` |

| Ordinal numbers | `1st` → `first`, `21st` → `twenty first` |

| Decimal numbers | `3.14` → `three point one four` |

| Money | `$5.50` → `five dollars and fifty cents` |

| Measurements | `200 km/h` → `two hundred kilometers per hour` |

| Dates | `January 5, 2025` → `january fifth twenty twenty five` |

| Time | `2:30 PM` → `two thirty p m` |

| Electronic (email/URL) | `test@gmail.com` → `t e s t at g m a i l dot c o m` |

| Telephone | `123-456-7890` → `one two three, four five six, seven eight nine zero` |

| Whitelist terms | `Dr.` → `doctor`, `Mr.` → `mister` |

## Features

- **ITN** (Inverse Text Normalization): spoken → written form for ASR post-processing

- **TN** (Text Normalization): written → spoken form for TTS preprocessing

- Cardinal and ordinal number conversion (both directions)

- Decimal numbers with scale words (million, billion)

- Currency formatting (USD, GBP, EUR, JPY, and more)

- Measurements including temperature (°C, °F, K) and data rates (gbps)

- Date parsing (multiple formats) and decade verbalization (1980s → nineteen eighties)

- Time parsing with AM/PM, 24-hour format, and timezone preservation

- Email and URL normalization

- Phone numbers, IP addresses, SSN

- Case preservation for proper nouns and abbreviations

- Sentence-level normalization with sliding window span matching

- Custom rules for domain-specific terms

- C FFI for integration with Swift, Python, and other languages

## Building

### Rust

```bash

cargo build

cargo test

```

### WASM + JavaScript

```bash

# Build + smoke test (Node) + build browser artifact

npm run wasm:ci

# Create a tarball from the browser package

npm run wasm:pack

# Publish browser package to npm (requires npm auth)

npm run wasm:publish

```

### CLI Tools

```bash

# Build the Rust library (release, with FFI)

cargo build --release --target aarch64-apple-darwin --features ffi

# Build Swift CLI tools

cd swift-test && swift build

```

Binaries are at `swift-test/.build/debug/nemo-itn` and `swift-test/.build/debug/nemo-tn`.

### Swift (XCFramework)

```bash

# Install Rust targets

rustup target add aarch64-apple-darwin x86_64-apple-darwin

rustup target add aarch64-apple-ios aarch64-apple-ios-sim

# Build XCFramework

./build-xcframework.sh

```

Output:

- `output/NemoTextProcessing.xcframework` - Add to Xcode project

- `output/NemoTextProcessing.swift` - Swift wrapper

## License

Apache 2.0

## Acknowledgments

This project is a Rust implementation based on the inverse text normalization grammars from [NVIDIA NeMo Text Processing](https://github.com/NVIDIA/NeMo-text-processing). All credit for the original algorithms and test cases goes to the NVIDIA NeMo team.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fluidinference/text-processing-rs

Awesome Lists containing this project

README