{"id":36990609,"url":"https://github.com/wendylabsinc/tensorrt-swift","last_synced_at":"2026-02-01T20:14:29.832Z","repository":{"id":329095112,"uuid":"1117118779","full_name":"wendylabsinc/tensorrt-swift","owner":"wendylabsinc","description":"TensorRT Swift 6.2 Bindings for Linux","archived":false,"fork":false,"pushed_at":"2026-01-01T20:37:25.000Z","size":356,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-14T00:05:40.906Z","etag":null,"topics":["cuda","nvidia","swift","tensor","tensorrt"],"latest_commit_sha":null,"homepage":"https://wendy.sh/docs/","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wendylabsinc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-15T21:27:25.000Z","updated_at":"2026-01-12T00:43:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/wendylabsinc/tensorrt-swift","commit_stats":null,"previous_names":["wendylabsinc/tensorrt-swift"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/wendylabsinc/tensorrt-swift","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wendylabsinc%2Ftensorrt-swift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wendylabsinc%2Ftensorrt-swift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wendylabsinc%2Ftensorrt-swift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wendylabsinc%2Ftensorrt-swift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wendylabsinc","download_url":"https://codeload.github.com/wendylabsinc/tensorrt-swift/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wendylabsinc%2Ftensorrt-swift/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28988635,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T18:17:03.387Z","status":"ssl_error","status_checked_at":"2026-02-01T18:16:57.287Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","nvidia","swift","tensor","tensorrt"],"created_at":"2026-01-13T23:38:21.440Z","updated_at":"2026-02-01T20:14:29.808Z","avatar_url":"https://github.com/wendylabsinc.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TensorRT Swift (Linux)\n\n[![CI](https://github.com/wendylabsinc/tensorrt-swift/actions/workflows/ci.yml/badge.svg)](https://github.com/wendylabsinc/tensorrt-swift/actions/workflows/ci.yml)\n![Swift 6.2+](https://img.shields.io/badge/Swift-6.2%2B-F05138?logo=swift\u0026logoColor=white)\n![Linux](https://img.shields.io/badge/Platform-Linux-FCC624?logo=linux\u0026logoColor=black)\n![TensorRT](https://img.shields.io/badge/TensorRT-10.x-76B900?logo=nvidia\u0026logoColor=white)\n![CUDA](https://img.shields.io/badge/CUDA-12.6-76B900?logo=nvidia\u0026logoColor=white)\n\nSwift Package that provides Swift-first APIs for working with NVIDIA TensorRT on Linux, with a separate TensorRTLLM product for LLM-specific extensions.\n\n\u003e **Note**: The `TensorRT` product wraps the **TensorRT** inference engine. The `TensorRTLLM` product is a thin extension layer today; full TensorRT-LLM integration (in-flight batching, KV-cache management, tensor parallelism) is planned for future releases.\n\nThis repository is **work in progress** and **subject to breaking changes** while the low-level foundations are being established.\n\nSwift 6.2 features are used aggressively where feasible:\n- `InlineArray` to keep common small metadata (like shapes/strides) allocation-free\n- `Span` / `MutableSpan` / `Data.bytes` for safer, more composable views over contiguous memory\n- Actor-based `ExecutionContext` for thread-safe inference\n\n## System Requirements\n\n### Required Libraries\n\nThe package links against the following system libraries at **build time** and **runtime**:\n\n| Library | Package | Purpose |\n|---------|---------|---------|\n| `libnvinfer.so` | TensorRT | Core inference engine |\n| `libnvinfer_plugin.so` | TensorRT | Built-in plugins |\n| `libnvonnxparser.so` | TensorRT | ONNX model import |\n| `libcuda.so` | CUDA Driver | GPU access |\n\n### Installation\n\n#### Option 1: NVIDIA Container (Recommended)\n\nUse the official TensorRT container which includes all dependencies:\n\n```bash\ndocker run --gpus all -it nvcr.io/nvidia/tensorrt:24.08-py3\n```\n\n#### Option 1b: Jetson Container (Orin Nano, AGX Thor)\n\nJetson uses aarch64 containers and must match the host JetPack/L4T release. See\n`docs/jetson-container.md` for a full recipe.\n\n#### Option 2: System Installation (Ubuntu/Debian)\n\n```bash\n# 1. Install CUDA 12.6\nwget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get install -y cuda-toolkit-12-6\n\n# 2. Install TensorRT 10.x\nsudo apt-get install -y libnvinfer10 libnvinfer-plugin10 libnvonnxparser10 libnvinfer-dev\n\n# 3. Add CUDA to your path\nexport PATH=/usr/local/cuda-12.6/bin:$PATH\nexport LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH\n```\n\n#### Option 3: From NVIDIA Developer Downloads\n\n1. Download [CUDA Toolkit 12.6](https://developer.nvidia.com/cuda-downloads)\n2. Download [TensorRT 10.x](https://developer.nvidia.com/tensorrt) (requires NVIDIA Developer account)\n3. Follow NVIDIA's installation guides\n\n### Verifying Installation\n\n```bash\n# Check CUDA\nnvcc --version\n\n# Check TensorRT\ndpkg -l | grep nvinfer\n# or\nls /usr/lib/x86_64-linux-gnu/libnvinfer*\n```\n\n### Swift Installation\n\nInstall Swift 6.2+ via [Swiftly](https://swiftlang.github.io/swiftly/):\n\n```bash\ncurl -L https://swiftlang.github.io/swiftly/swiftly-install.sh | bash\nswiftly install 6.2\n```\n\n### Development Workflow (macOS/Windows)\n\nYou can write code on macOS or Windows, but **building and running must happen on Linux** with\nTensorRT/CUDA libraries available. The recommended workflow is:\n\n- Develop locally on macOS/Windows.\n- Build/test inside a Linux container (Option 1 / 1b) or on a Linux host.\n\nCross-compiling from macOS/Windows to Linux is possible but fragile and not recommended.\n\n## What Works Today\n\n### Core APIs\n\n| API | Description |\n|-----|-------------|\n| `TensorRTRuntime.buildEngine(onnxURL:options:)` | Build TensorRT engine from ONNX |\n| `TensorRTRuntime.deserializeEngine(from:)` | Load serialized engine plan |\n| `Engine.save(to:)` / `Engine.load(from:)` | Persist/load engines to disk |\n| `ExecutionContext.enqueue(_:)` | Execute inference (host buffers) |\n| `ExecutionContext.enqueueDevice(...)` | Execute with device pointers |\n| `ExecutionContext.warmup(iterations:)` | Warmup for stable latency |\n\n### GPU \u0026 Device APIs\n\n| API | Description |\n|-----|-------------|\n| `TensorRTSystem.cudaDeviceCount()` | Number of available GPUs |\n| `TensorRTSystem.deviceProperties(device:)` | GPU name, compute capability, memory |\n| `TensorRTSystem.memoryInfo(device:)` | Free/total GPU memory |\n| `TensorRTSystem.CUDAStream` | RAII stream wrapper |\n| `TensorRTSystem.CUDAEvent` | RAII event wrapper |\n\n### Dynamic Shapes \u0026 Profiles\n\n| API | Description |\n|-----|-------------|\n| `ExecutionContext.reshape(bindings:)` | Set input shapes at runtime |\n| `ExecutionContext.setOptimizationProfile(named:)` | Switch optimization profiles |\n| `OptimizationProfile` | Define min/opt/max shapes |\n\n### LLM Extensions (TensorRTLLM)\n\n| API | Description |\n|-----|-------------|\n| `ExecutionContext.stream(...)` | Streaming inference (AsyncSequence) |\n| `StreamingConfiguration` | Configure token-by-token generation |\n| `StreamingInferenceStep` | Per-step metadata and outputs |\n\n### Swift-y Conveniences\n\n```swift\n// TensorShape with array literal\nlet shape: TensorShape = [1, 3, 224, 224]\nprint(shape)        // \"TensorShape[1, 3, 224, 224]\"\nprint(shape[0])     // 1\n\n// Engine persistence\ntry engine.save(to: URL(fileURLWithPath: \"model.engine\"))\nlet loaded = try Engine.load(from: URL(fileURLWithPath: \"model.engine\"))\n\n// Query GPU before loading\nlet mem = try TensorRTSystem.memoryInfo()\nprint(\"Free GPU memory: \\(mem.free / 1_000_000_000) GB\")\n```\n\n## Quick Start\n\n### Add the package to your `Package.swift`\n\n```swift\n// swift-tools-version: 6.2\nimport PackageDescription\n\nlet package = Package(\n    name: \"MyApp\",\n    dependencies: [\n        .package(url: \"https://github.com/wendylabsinc/tensorrt-swift\", from: \"0.0.1\"),\n    ],\n    targets: [\n        .executableTarget(\n            name: \"MyApp\",\n            dependencies: [\n                .product(name: \"TensorRT\", package: \"tensorrt-swift\"),\n            ]\n        ),\n    ]\n)\n```\n\nTo use the LLM extension module for streaming inference and other LLM utilities:\n\n```swift\n.product(name: \"TensorRTLLM\", package: \"tensorrt-swift\")\n```\n\n### Query GPU and TensorRT version\n\n```swift\nimport TensorRT\n// Check TensorRT version\nlet version = try TensorRTRuntimeProbe.inferRuntimeVersion()\nprint(\"TensorRT version: \\(version)\")\n\n// Check GPU\nlet props = try TensorRTSystem.deviceProperties()\nprint(\"GPU: \\(props.name)\")\nprint(\"Compute Capability: \\(props.computeCapability)\")\nprint(\"Memory: \\(props.totalMemory / 1_000_000_000) GB\")\n\nlet mem = try TensorRTSystem.memoryInfo()\nprint(\"Free: \\(mem.free / 1_000_000_000) GB / \\(mem.total / 1_000_000_000) GB\")\n```\n\n### Build an engine from ONNX and run inference\n\n```swift\nimport TensorRT\nlet runtime = TensorRTRuntime()\nlet engine = try runtime.buildEngine(\n    onnxURL: URL(fileURLWithPath: \"model.onnx\"),\n    options: EngineBuildOptions(\n        precision: [.fp32],\n        workspaceSizeBytes: 1 \u003c\u003c 28\n    )\n)\n\n// Save for later use (avoid rebuild)\ntry engine.save(to: URL(fileURLWithPath: \"model.engine\"))\n\nlet ctx = try engine.makeExecutionContext()\n\n// Warmup for stable latency\nlet warmup = try await ctx.warmup(iterations: 10)\nprint(\"Warmup avg: \\(warmup.average ?? .zero)\")\n\n// Run inference\nlet inputDesc = engine.description.inputs[0].descriptor\nlet input: [Float] = (0..\u003cinputDesc.shape.elementCount).map(Float.init)\nlet inputBytes = input.withUnsafeBufferPointer { Data(buffer: $0) }\n\nlet batch = InferenceBatch(inputs: [\n    inputDesc.name: TensorValue(descriptor: inputDesc, storage: .host(inputBytes))\n])\nlet result = try await ctx.enqueue(batch)\n```\n\n### Streaming inference (for LLMs)\n\n```swift\nimport TensorRTLLM\nlet stream = context.stream(\n    initialBatch: promptBatch,\n    configuration: StreamingConfiguration(maxSteps: 100)\n) { previousResult in\n    // Transform previous output into next input (e.g., append generated token)\n    return makeNextBatch(from: previousResult)\n}\n\nfor try await step in stream {\n    print(\"Step \\(step.stepIndex), final: \\(step.isFinal)\")\n    // Process each step as it arrives\n    if step.isFinal { break }\n}\n```\n\n### Dynamic shapes with optimization profiles\n\n```swift\nimport TensorRT\nlet profile = OptimizationProfile(\n    name: \"batch_range\",\n    axes: [:],\n    bindingRanges: [\n        \"input\": .init(\n            min: TensorShape([1, 512]),\n            optimal: TensorShape([8, 512]),\n            max: TensorShape([32, 512])\n        ),\n    ]\n)\n\nlet engine = try TensorRTRuntime().buildEngine(\n    onnxURL: URL(fileURLWithPath: \"dynamic.onnx\"),\n    options: EngineBuildOptions(precision: [.fp32], profiles: [profile])\n)\n\nlet ctx = try engine.makeExecutionContext()\ntry await ctx.reshape(bindings: [\"input\": TensorShape([16, 512])])\nlet result = try await ctx.enqueue(batch)\n```\n\n## Examples\n\nThe package includes 17 examples organized by difficulty level. Run any example with `./scripts/swiftw run \u003cExampleName\u003e`.\nThe wrapper keeps build artifacts in `/tmp` by default; override with `SWIFT_BUILD_PATH` if needed.\n\n### Beginner Examples\n\n| Example | Description | Command |\n|---------|-------------|---------|\n| **HelloTensorRT** | Minimal \"hello world\" - probe version, build identity engine, run inference | `./scripts/swiftw run HelloTensorRT` |\n| **ONNXInference** | Load ONNX model, build engine, run inference with throughput measurement | `./scripts/swiftw run ONNXInference` |\n| **BatchProcessing** | Process multiple batches, latency statistics (p50/p95/p99) | `./scripts/swiftw run BatchProcessing` |\n\n### Intermediate Examples\n\n| Example | Description | Command |\n|---------|-------------|---------|\n| **DynamicBatching** | Dynamic shapes for variable batch sizes at runtime | `./scripts/swiftw run DynamicBatching` |\n| **MultiProfile** | Multiple optimization profiles for different workloads | `./scripts/swiftw run MultiProfile` |\n| **AsyncInference** | Non-blocking inference with CUDA streams and events | `./scripts/swiftw run AsyncInference` |\n| **ImageClassifier** | End-to-end pipeline: preprocess → inference → postprocess | `./scripts/swiftw run ImageClassifier` |\n| **DeviceMemoryPipeline** | Keep tensors on GPU, avoid H2D/D2H transfers | `./scripts/swiftw run DeviceMemoryPipeline` |\n\n### LLM Examples (TensorRTLLM)\n\nLLM examples live under `ExamplesLLM/`.\n\n| Example | Description | Command |\n|---------|-------------|---------|\n| **StreamingLLM** | Token-by-token generation with KV-cache pattern | `./scripts/swiftw run StreamingLLM` |\n\n### Advanced Examples\n\n| Example | Description | Command |\n|---------|-------------|---------|\n| **MultiGPU** | Distribute inference across multiple GPUs | `./scripts/swiftw run MultiGPU` |\n| **CUDAEventPipelining** | Overlap compute with data transfer using events | `./scripts/swiftw run CUDAEventPipelining` |\n| **BenchmarkSuite** | Comprehensive throughput/latency measurement | `./scripts/swiftw run BenchmarkSuite` |\n| **FP16Quantization** | Compare FP32 vs FP16 precision and performance | `./scripts/swiftw run FP16Quantization` |\n\n### Real-World Examples\n\n| Example | Description | Command |\n|---------|-------------|---------|\n| **TextEmbedding** | Sentence transformer for semantic search | `./scripts/swiftw run TextEmbedding` |\n| **ObjectDetection** | YOLO-style detection with NMS postprocessing | `./scripts/swiftw run ObjectDetection` |\n| **WhisperTranscription** | Audio transcription pipeline (encoder pattern) | `./scripts/swiftw run WhisperTranscription` |\n| **VisionTransformer** | ViT image classification with patch embeddings | `./scripts/swiftw run VisionTransformer` |\n\n### Example Output: BenchmarkSuite\n\n```\n=== TensorRT Benchmark Suite ===\n\n┌──────────┬────────────┬────────────┬────────────┬────────────┐\n│ Elements │ Throughput │ p50        │ p95        │ p99        │\n├──────────┼────────────┼────────────┼────────────┼────────────┤\n│ 64       │ 91.0K      │ 10.4 µs    │ 12.5 µs    │ 22.8 µs    │\n│ 1024     │ 75.5K      │ 11.5 µs    │ 22.1 µs    │ 23.1 µs    │\n│ 16384    │ 31.3K      │ 31.8 µs    │ 33.2 µs    │ 37.1 µs    │\n└──────────┴────────────┴────────────┴────────────┴────────────┘\n```\n\n## Tests\n\nRun:\n\n```bash\n./scripts/swiftw test\n```\n\nThis wrapper keeps build artifacts in `/tmp` by default to avoid `.build` permission issues. Override with\n`SWIFT_BUILD_PATH=/your/path ./scripts/swiftw test` if needed.\n\nThe test suite includes end-to-end GPU tests that build engines (TensorRT builder and `nvonnxparser`),\ndeserialize them, and run inference (host buffers, device pointers, external streams, and CUDA events).\n\n## Troubleshooting\n\n### `libnvinfer.so: cannot open shared object file`\n\nTensorRT libraries are not in your library path. Add them:\n\n```bash\nexport LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH\n# or wherever TensorRT is installed\n```\n\n### `CUDA driver version is insufficient`\n\nYour NVIDIA driver is too old for CUDA 12.6. Update your driver:\n\n```bash\nsudo apt-get install nvidia-driver-550  # or newer\n```\n\n### Swift can't find CUDA headers\n\nEnsure CUDA is installed and the include path is correct:\n\n```bash\nls /usr/local/cuda/include/cuda.h\n# If not found, create symlink or adjust Package.swift\n```\n\n## License\n\nSee [LICENSE.txt](LICENSE.txt).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwendylabsinc%2Ftensorrt-swift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwendylabsinc%2Ftensorrt-swift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwendylabsinc%2Ftensorrt-swift/lists"}