{"id":46365345,"url":"https://github.com/fluidinference/text-processing-rs","last_synced_at":"2026-04-26T21:00:54.662Z","repository":{"id":339036500,"uuid":"1156730545","full_name":"FluidInference/text-processing-rs","owner":"FluidInference","description":"Rust port of NVIDIA NeMo Text Processing for Inverse Text Normalization","archived":false,"fork":false,"pushed_at":"2026-04-26T18:00:08.000Z","size":445,"stargazers_count":29,"open_issues_count":3,"forks_count":7,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-04-26T18:22:43.495Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FluidInference.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-02-13T01:34:35.000Z","updated_at":"2026-04-26T16:22:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"808b89e5-f98c-4f32-9828-9230ec94836c","html_url":"https://github.com/FluidInference/text-processing-rs","commit_stats":null,"previous_names":["fluidinference/text-processing-rs"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/FluidInference/text-processing-rs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FluidInference%2Ftext-processing-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FluidInference%2Ftext-processing-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FluidInference%2Ftext-processing-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FluidInference%2Ftext-processing-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FluidInference","download_url":"https://codeload.github.com/FluidInference/text-processing-rs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FluidInference%2Ftext-processing-rs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32312505,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T19:15:34.056Z","status":"ssl_error","status_checked_at":"2026-04-26T19:15:15.467Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-03-05T02:14:30.395Z","updated_at":"2026-04-26T21:00:54.647Z","avatar_url":"https://github.com/FluidInference.png","language":"Rust","readme":"# text-processing-rs\n\nA Rust port of [NVIDIA NeMo Text Processing](https://github.com/NVIDIA/NeMo-text-processing) supporting both **Inverse Text Normalization (ITN)** and **Text Normalization (TN)**.\n\n## What it does\n\n### ITN: Spoken → Written\n\nConverts spoken-form ASR output to written form:\n\n| Input | Output |\n|-------|--------|\n| two hundred thirty two | 232 |\n| five dollars and fifty cents | $5.50 |\n| january fifth twenty twenty five | January 5, 2025 |\n| quarter past two pm | 02:15 p.m. |\n| one point five billion dollars | $1.5 billion |\n| seventy two degrees fahrenheit | 72 °F |\n\n### TN: Written → Spoken\n\nConverts written-form text to spoken form (useful for TTS preprocessing):\n\n| Input | Output |\n|-------|--------|\n| 123 | one hundred twenty three |\n| $5.50 | five dollars and fifty cents |\n| January 5, 2025 | january fifth twenty twenty five |\n| 2:30 PM | two thirty p m |\n| 1st | first |\n| 200 km/h | two hundred kilometers per hour |\n\n## Usage\n\n### Rust\n\n```rust\nuse text_processing_rs::{normalize, tn_normalize};\n\n// ITN: spoken → written\nlet result = normalize(\"two hundred\");\nassert_eq!(result, \"200\");\n\nlet result = normalize(\"five dollars and fifty cents\");\nassert_eq!(result, \"$5.50\");\n\n// TN: written → spoken\nlet result = tn_normalize(\"$5.50\");\nassert_eq!(result, \"five dollars and fifty cents\");\n\nlet result = tn_normalize(\"123\");\nassert_eq!(result, \"one hundred twenty three\");\n```\n\n### JavaScript (WASM)\n\nBuild wasm artifacts:\n\n```bash\nnpm run wasm:build:node\nnpm run wasm:build:web\n```\n\nNode usage:\n\n```javascript\nimport * as wasm from \"./pkg-node/text_processing_rs.js\";\n\nconsole.log(wasm.normalize(\"two hundred\")); // \"200\"\nconsole.log(wasm.tnNormalize(\"$5.50\")); // \"five dollars and fifty cents\"\n\nwasm.addRule(\"gee pee tee\", \"GPT\");\nconsole.log(wasm.normalize(\"gee pee tee\")); // \"GPT\"\n```\n\nThe generated npm package name is `@fluidinference/text-processing-rs`.\n\nWeb project usage (Vite / Next.js / webpack):\n\n```bash\nnpm install @fluidinference/text-processing-rs\n```\n\n```javascript\nimport init, * as wasm from \"@fluidinference/text-processing-rs\";\n\nasync function run() {\n  // Loads and initializes the .wasm module (required once at startup)\n  await init();\n\n  const itn = wasm.normalize(\"two hundred\");\n  const tn = wasm.tnNormalize(\"$5.50\");\n\n  console.log(itn); // \"200\"\n  console.log(tn); // \"five dollars and fifty cents\"\n\n  wasm.addRule(\"gee pee tee\", \"GPT\");\n  console.log(wasm.normalize(\"gee pee tee\")); // \"GPT\"\n}\n\nrun();\n```\n\nIf your framework supports top-level `await`, you can initialize at module load time:\n\n```javascript\nimport init, * as wasm from \"@fluidinference/text-processing-rs\";\nawait init();\n```\n\nSentence-level normalization scans for normalizable spans within a larger sentence:\n\n```rust\nuse text_processing_rs::{normalize_sentence, tn_normalize_sentence};\n\n// ITN sentence mode\nlet result = normalize_sentence(\"I have twenty one apples\");\nassert_eq!(result, \"I have 21 apples\");\n\n// TN sentence mode\nlet result = tn_normalize_sentence(\"I paid $5 for 23 items\");\nassert_eq!(result, \"I paid five dollars for twenty three items\");\n```\n\n### Swift\n\n```swift\nimport NemoTextProcessing\n\n// ITN: spoken → written\nlet result = NemoTextProcessing.normalize(\"two hundred\")\n// \"200\"\n\n// TN: written → spoken\nlet spoken = NemoTextProcessing.tnNormalize(\"$5.50\")\n// \"five dollars and fifty cents\"\n\n// Sentence modes\nlet itn = NemoTextProcessing.normalizeSentence(\"I have twenty one apples\")\n// \"I have 21 apples\"\n\nlet tn = NemoTextProcessing.tnNormalizeSentence(\"I paid $5 for 23 items\")\n// \"I paid five dollars for twenty three items\"\n```\n\n### CLI\n\n```bash\n# ITN\nnemo-itn two hundred thirty two        # → 232\nnemo-itn -s \"I have twenty one apples\" # → I have 21 apples\n\n# TN\nnemo-tn 123                            # → one hundred twenty three\nnemo-tn '$5.50'                        # → five dollars and fifty cents\nnemo-tn -s 'I paid $5 for 23 items'    # → I paid five dollars for twenty three items\n\n# Pipe from stdin\necho \"2:30 PM\" | nemo-tn               # → two thirty p m\n```\n\n## Compatibility\n\n### ITN (Spoken → Written)\n\n**98.6% compatible** with NeMo text processing test suite (1200/1217 tests passing).\n\n| Category | Status |\n|----------|--------|\n| Cardinal numbers | 100% |\n| Ordinal numbers | 100% |\n| Decimal numbers | 100% |\n| Money | 100% |\n| Measurements | 100% |\n| Dates | 100% |\n| Time | 97% |\n| Electronic (email/URL) | 96% |\n| Telephone/IP | 96% |\n| Whitelist terms | 100% |\n\n### TN (Written → Spoken)\n\n| Category | Examples |\n|----------|----------|\n| Cardinal numbers | `123` → `one hundred twenty three` |\n| Ordinal numbers | `1st` → `first`, `21st` → `twenty first` |\n| Decimal numbers | `3.14` → `three point one four` |\n| Money | `$5.50` → `five dollars and fifty cents` |\n| Measurements | `200 km/h` → `two hundred kilometers per hour` |\n| Dates | `January 5, 2025` → `january fifth twenty twenty five` |\n| Time | `2:30 PM` → `two thirty p m` |\n| Electronic (email/URL) | `test@gmail.com` → `t e s t at g m a i l dot c o m` |\n| Telephone | `123-456-7890` → `one two three, four five six, seven eight nine zero` |\n| Whitelist terms | `Dr.` → `doctor`, `Mr.` → `mister` |\n\n## Features\n\n- **ITN** (Inverse Text Normalization): spoken → written form for ASR post-processing\n- **TN** (Text Normalization): written → spoken form for TTS preprocessing\n- Cardinal and ordinal number conversion (both directions)\n- Decimal numbers with scale words (million, billion)\n- Currency formatting (USD, GBP, EUR, JPY, and more)\n- Measurements including temperature (°C, °F, K) and data rates (gbps)\n- Date parsing (multiple formats) and decade verbalization (1980s → nineteen eighties)\n- Time parsing with AM/PM, 24-hour format, and timezone preservation\n- Email and URL normalization\n- Phone numbers, IP addresses, SSN\n- Case preservation for proper nouns and abbreviations\n- Sentence-level normalization with sliding window span matching\n- Custom rules for domain-specific terms\n- C FFI for integration with Swift, Python, and other languages\n\n## Building\n\n### Rust\n\n```bash\ncargo build\ncargo test\n```\n\n### WASM + JavaScript\n\n```bash\n# Build + smoke test (Node) + build browser artifact\nnpm run wasm:ci\n\n# Create a tarball from the browser package\nnpm run wasm:pack\n\n# Publish browser package to npm (requires npm auth)\nnpm run wasm:publish\n```\n\n### CLI Tools\n\n```bash\n# Build the Rust library (release, with FFI)\ncargo build --release --target aarch64-apple-darwin --features ffi\n\n# Build Swift CLI tools\ncd swift-test \u0026\u0026 swift build\n```\n\nBinaries are at `swift-test/.build/debug/nemo-itn` and `swift-test/.build/debug/nemo-tn`.\n\n### Swift (XCFramework)\n\n```bash\n# Install Rust targets\nrustup target add aarch64-apple-darwin x86_64-apple-darwin\nrustup target add aarch64-apple-ios aarch64-apple-ios-sim\n\n# Build XCFramework\n./build-xcframework.sh\n```\n\nOutput:\n- `output/NemoTextProcessing.xcframework` - Add to Xcode project\n- `output/NemoTextProcessing.swift` - Swift wrapper\n\n## License\n\nApache 2.0\n\n## Acknowledgments\n\nThis project is a Rust implementation based on the inverse text normalization grammars from [NVIDIA NeMo Text Processing](https://github.com/NVIDIA/NeMo-text-processing). All credit for the original algorithms and test cases goes to the NVIDIA NeMo team.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffluidinference%2Ftext-processing-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffluidinference%2Ftext-processing-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffluidinference%2Ftext-processing-rs/lists"}