{"id":50549900,"url":"https://github.com/rifkybujana/sam3.c","last_synced_at":"2026-06-04T02:30:30.868Z","repository":{"id":350920548,"uuid":"1200440096","full_name":"rifkybujana/sam3.c","owner":"rifkybujana","description":"Efficient SAM3 (Segment Anything Model 3) inference from scratch in pure C — Metal GPU + multithreaded CPU, no Python dependencies","archived":false,"fork":false,"pushed_at":"2026-05-10T19:32:05.000Z","size":13683,"stargazers_count":9,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-22T23:28:40.377Z","etag":null,"topics":["apple-silicon","c","computer-vision","from-scratch","ggml","image-segmentation","inference","machine-learning","metal","pure-c","sam3","segment-anything"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rifkybujana.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-03T12:17:17.000Z","updated_at":"2026-05-18T06:07:35.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/rifkybujana/sam3.c","commit_stats":null,"previous_names":["rifkybujana/sam3.c"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rifkybujana/sam3.c","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rifkybujana%2Fsam3.c","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rifkybujana%2Fsam3.c/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rifkybujana%2Fsam3.c/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rifkybujana%2Fsam3.c/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rifkybujana","download_url":"https://codeload.github.com/rifkybujana/sam3.c/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rifkybujana%2Fsam3.c/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33887124,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-04T02:00:06.755Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","c","computer-vision","from-scratch","ggml","image-segmentation","inference","machine-learning","metal","pure-c","sam3","segment-anything"],"created_at":"2026-06-04T02:30:30.065Z","updated_at":"2026-06-04T02:30:30.859Z","avatar_url":"https://github.com/rifkybujana.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sam3.c — Efficient SAM3 Inference From Scratch in Pure C\n\nA lightweight, dependency-free C11 implementation of [Segment Anything Model 3 (SAM3)](https://github.com/facebookresearch/sam3) built from scratch for efficient inference on Apple Silicon and x86 CPUs.\n\nInspired by [ggml](https://github.com/ggerganov/ggml) and [llama.cpp](https://github.com/ggerganov/llama.cpp), sam3.c implements the full SAM3 pipeline — image encoder, prompt encoder, mask decoder — in ~57K lines of portable C with zero Python dependencies.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/sam3_segmentation.png\" alt=\"sam3.c multi-object segmentation — four masks with distinct colors on a street scene\" width=\"540\"/\u003e\n  \u003cbr\u003e\n  \u003cem\u003eFour-mask segmentation output from sam3.c — each object highlighted in a distinct color\u003c/em\u003e\n\u003c/p\u003e\n\n## Why sam3.c?\n\n| | sam3.c | Official SAM3 (Python) |\n|---|---|---|\n| **Language** | C11, no dependencies | Python + PyTorch |\n| **Binary size** | Single static binary | GB-scale environment |\n| **GPU support** | Apple Metal (native) | CUDA |\n| **Precision** | FP32, FP16, BF16 | FP32 |\n| **Memory** | Arena allocator, mmap weights | PyTorch allocator |\n| **Startup** | Instant (mmap) | Seconds (model load) |\n\n## Features\n\n- **Built from scratch in pure C** — no PyTorch, no ONNX, no wrappers. Every tensor op, every layer, written by hand.\n- **Metal GPU backend** — hardware-accelerated inference on Apple Silicon (M1/M2/M3/M4).\n- **Multithreaded CPU backend** — optimized SIMD kernels with thread pool for x86 and ARM.\n- **FP16 and BF16 support** — run inference in half precision for lower memory and faster compute.\n- **Custom `.sam3` weight format** — mmap-friendly binary format with O(1) tensor lookup via hash table.\n- **Full SAM3 pipeline** — image encoder (Hiera, EfficientViT, TinyViT), prompt encoder (points, boxes, masks), mask decoder, text encoder, and tokenizer.\n- **Video object tracking** — memory-based frame-by-frame propagation with point, box, and mask prompts. Supports MPEG video files and frame directories.\n- **Multiple backbones** — Hiera (full accuracy), EfficientViT-B1 (lightweight, 512px), and TinyViT-21M (128x128 masks at 1008px input).\n- **Unified CLI** — single `sam3_cli` binary with `segment`, `convert`, and `info` subcommands. Supports stdin/stdout piping, JSON output, and multi-mask color overlays.\n- **48 unit tests** — comprehensive test suite covering numerical operators, memory management, and end-to-end inference.\n- **Built-in profiling** — latency tracing subsystem to identify bottlenecks.\n\n## Supported Models\n\n| Backbone | Input Size | Mask Resolution | Parameters | Encode (ms) | Segment (ms) | Use Case |\n|---|---|---|---|---:|---:|---|\n| **Hiera** | 1008x1008 | 288x288 | 1.6B | 2336 | 1224 | Full accuracy |\n| **TinyViT-21M** | 1008x1008 | 128x128 | 0.8B | 487 | 363 | Balanced quality/speed |\n| **EfficientViT-B1** | 512x512 | 64x64 | 0.8B | 70 | 177 | Fastest, interactive |\n\nTimings on Apple M4 (10-core GPU, Metal backend, Release build). Encode = `sam3_set_image`, Segment = `sam3_segment` with a box prompt. See [BENCHMARK.md](BENCHMARK.md) for full results.\n\nAll backbones share the same prompt encoder, mask decoder, and text encoder. The backbone is selected automatically based on the checkpoint.\n\n## Quick Start\n\n### Build\n\n```bash\ngit clone https://github.com/rifkybujana/sam3.c.git\ncd sam3.c\nmkdir build \u0026\u0026 cd build\ncmake .. -DCMAKE_BUILD_TYPE=Release\nmake -j$(nproc)\n```\n\n\u003e **First build note:** the build fetches and statically compiles\n\u003e FFmpeg, openh264, and libvpx into `build/external/`. Expect ~10-15\n\u003e minutes on first configure; subsequent incremental builds are fast.\n\u003e The resulting binary has no runtime dependency on system ffmpeg.\n\n### Convert Weights\n\nDownload a SAM3 checkpoint in SafeTensors format, then convert to the optimized `.sam3` format:\n\n```bash\n# Hiera (default backbone)\n./sam3_cli convert -i models/sam3.safetensors -o models/sam3.sam3\n\n# TinyViT or EfficientViT (specify backbone)\n./sam3_cli convert -i models/tinyvit.safetensors -o models/tinyvit.sam3 --backbone tinyvit\n./sam3_cli convert -i models/evit.safetensors -o models/evit.sam3 --backbone efficientvit\n```\n\n### SAM 3.1\n\nSAM 3.1 ships as a PyTorch `.pt` checkpoint only\n(`sam3.1_multiplex.pt`, ~3.3 GB). Convert in two steps:\n\n```bash\n# 1. Normalize into .safetensors (unwraps {\"model\": ...}, remaps\n#    sam3_model.* -\u003e detector.* and sam2_predictor.* -\u003e tracker.*)\npython tools/pt_to_safetensors.py \\\n    models/sam3.1_multiplex.pt \\\n    models/sam3.1_multiplex.safetensors\n\n# 2. Convert to .sam3 with the SAM 3.1 variant flag\n./sam3_cli convert \\\n    -i models/sam3.1_multiplex.safetensors \\\n    -o models/sam3.1.sam3 \\\n    --variant sam3.1\n\n# 3. Use it\n./sam3_cli segment -m models/sam3.1.sam3 -i img.jpg -t \"cat\"\n```\n\nOnly the image-detector path is wired in this release. The SAM 3.1\nmultiplex tracker and joint multi-object video pass are planned for\nfollow-up work — SAM 3 continues to handle all video tracking today.\n\n### Run Inference\n\n```bash\n# Point prompt (foreground point at x=500, y=375)\n./sam3_cli segment -m models/sam3.sam3 -i photo.jpg -p 500,375,1 --overlay\n\n# Text prompt\n./sam3_cli segment -m models/sam3.sam3 -i photo.jpg -t \"person\" --overlay\n\n# Box prompt\n./sam3_cli segment -m models/sam3.sam3 -i photo.jpg -b 100,100,400,400 --all\n```\n\n### Video tracking\n\nTrack an object across frames of a video:\n\n```bash\n./sam3_cli track --model models/sam3.sam3 --video clip.mp4 \\\n    --point 504,504,1 --frame 0 --output out/\n```\n\nOutput: `out/frame_NNNNN.png` binary mask per frame.\n\n### Inspect a Model\n\n```bash\n./sam3_cli info models/sam3.sam3\n```\n\n### Run Tests\n\n```bash\nctest --output-on-failure\n```\n\n## Language bindings\n\nsam3.c ships bindings for multiple languages under `bindings/`:\n\n- **Python** — `bindings/python/`, CFFI-based. Requires Python ≥ 3.9.\n- **Rust** — `bindings/rust/`. Cargo workspace with `sam3-sys` (FFI) and\n  `sam3` (safe API: owned `Ctx`, typed prompt enum, RAII result cleanup,\n  `SegmentResult::nms` matching the CLI's post-processing). See\n  `bindings/rust/README.md`.\n\n\u003e **Python cache-API gap:** the [caching API](#caching) is exposed in the\n\u003e Rust binding but not yet in Python; see\n\u003e `docs/superpowers/plans/2026-04-22-cache-api-bindings.md`.\n\n### Python\n\n```bash\npip install -e bindings/python\n```\n\n`setup.py` runs CMake under the hood: it configures the project with\n`-DSAM3_SHARED=ON`, builds `libsam3.{dylib,so}` into `build-python/`,\nand copies the shared library into the installed package. **No manual\n`cmake` step, no `DYLD_LIBRARY_PATH`/`LD_LIBRARY_PATH` setup** — the\npackage ships its own libsam3 next to `sam3/_lib.py` and loads it by\nrelative path. First install takes ~10-15 minutes because CMake also\ncompiles FFmpeg from source; reinstalls are fast.\n\nRequirements:\n\n- Python ≥ 3.9\n- CMake ≥ 3.20 and a C11 toolchain on `$PATH`\n- `cffi\u003e=1.15`, `numpy\u003e=1.21` (installed automatically)\n- On macOS, Xcode command-line tools; on Linux, a recent GCC/Clang\n\nMinimal usage:\n\n```python\nimport sam3\n\nwith sam3.Model(\"models/sam3.sam3\") as model:\n    model.set_image(\"photo.jpg\")\n    result = model.segment(text=\"person\")\n    print(result.masks.shape, result.iou_scores[:3])\n```\n\nRun the test suite:\n\n```bash\npip install -e 'bindings/python[dev]'\npytest bindings/python/tests -v\n```\n\nTroubleshooting — if `import sam3` raises\n`OSError: libsam3 not found at .../sam3/libsam3.dylib`, the package\nwas installed without the bundled shared library (usually a stale\ninstall from before `setup.py` was updated). Force a rebuild with:\n\n```bash\npip install --force-reinstall --no-deps -e bindings/python\n```\n\n### Rust\n\nBuild `libsam3` shared once, then build/test the crate against it:\n\n```bash\n# 1. Build libsam3.{dylib,so}\ncmake -S . -B build -DSAM3_SHARED=ON -DCMAKE_BUILD_TYPE=Release\ncmake --build build --parallel\n\n# 2. Build + test the Rust workspace (point the loader at build/)\ncd bindings/rust\nDYLD_LIBRARY_PATH=../../build cargo test --release   # macOS\nLD_LIBRARY_PATH=../../build  cargo test --release   # Linux\n```\n\nInstall `libsam3` system-wide (`cmake --install build`) to skip the env\nvar step; the loader will then find it on the default search path. See\n`bindings/rust/README.md` for the three supported runtime resolution\nworkflows (env var, system install, rpath-shipped).\n\nMinimal Rust usage:\n\n```rust\nuse sam3::{Ctx, Prompt};\n\nlet mut ctx = Ctx::new()?;\nctx.load_model(\"models/efficient.sam3\")?;      // auto-loads co-located BPE\nctx.set_image_file(\"photo.jpg\")?;\nctx.set_text(\"person\")?;\n\nlet raw = ctx.segment(\u0026[Prompt::Text(\"person\")])?;\nlet hits = raw.nms(0.5, 0.5, 0.0)?;            // 200 candidates → ~N detections\nprintln!(\"found {} objects, top score {:.3}\",\n         hits.n_masks(), hits.iou_scores()[0]);\n```\n\n## Architecture\n\n```\nsam3.c\n├── include/sam3/        Public API headers\n├── src/\n│   ├── core/            Tensor ops, arena allocator, compute graph, weight loader\n│   ├── backend/\n│   │   ├── cpu/         Multithreaded CPU kernels (SIMD-optimized)\n│   │   └── metal/       Apple Metal GPU backend\n│   ├── model/           SAM3 layers\n│   │   ├── image_encoder   Vision transformer (Hiera, EfficientViT, TinyViT)\n│   │   ├── prompt_encoder  Point, box, and mask prompts\n│   │   ├── mask_decoder    Lightweight mask prediction head\n│   │   ├── text_encoder    Text prompt encoding\n│   │   └── tokenizer       BPE tokenizer\n│   └── util/            Logging, error codes\n├── tools/               Unified CLI (sam3_cli: segment, convert, info)\n└── tests/               48 test files\n```\n\n## Performance\n\nOn an Apple M4 with the Metal backend, EfficientViT delivers end-to-end\nimage-to-mask in ~250 ms (4 FPS), making interactive point-and-click\nsegmentation practical. Once the image is encoded, each additional prompt on\nthe same image resolves in under 200 ms.\n\nHiera-Large trades speed for accuracy at 3.6 s per image with 5184 patches and\n32 transformer blocks. Multi-prompt workflows amortize the 2.3 s encode cost.\n\nThe Metal backend achieves 90.8% of theoretical F32 peak (3086 / 3400 GFLOPS)\non matmul microbenchmarks and up to 149x speedup over CPU for F16 workloads.\n\nFull kernel-level and pipeline benchmark results are in\n[BENCHMARK.md](BENCHMARK.md).\n\n## Caching\n\nsam3.c caches encoded image and text features so repeated prompts on the\nsame inputs resolve in microseconds instead of re-running the encoders.\nThe caches are enabled by default with tunable slot counts, an LRU RAM\nbudget, and optional disk spill.\n\n**Defaults** — override via `sam3_cache_opts` + `sam3_init_ex()`:\n\n| Tunable | Default | Purpose |\n|---|---|---|\n| `n_image_slots` | 8 | Max cached image entries |\n| `n_text_slots` | 16 | Max cached text entries |\n| `image_mem_budget_bytes` | 1 GiB (~4 hot slots at 256 MiB) | RAM ceiling; excess slots spill to disk |\n| `image_spill_dir` | auto-created `/tmp` dir | Where spilled bundles live |\n\n### Pre-warm\n\nPopulate the cache while the app is idle so the user's first prompt only\npays segment latency:\n\n```c\nsam3_precache_image_file(ctx, \"photo.jpg\");   /* runs image encoder now */\nsam3_precache_text(ctx, \"person\");            /* runs text encoder now */\n\n/* Later: these hit the cache and return in microseconds */\nsam3_set_image_file(ctx, \"photo.jpg\");\nsam3_set_text(ctx, \"person\");\n```\n\n### Persist across runs\n\nSerialize an encoded entry to a `.sam3cache` file and reload it on the\nnext startup. Files are model-signature-gated — loading into a different\nmodel is rejected with `SAM3_EMODEL`:\n\n```c\nsam3_cache_save_image(ctx, pixels, w, h, \"photo.sam3cache\");\n/* Next run, after sam3_load_model(): */\nsam3_cache_load_image(ctx, \"photo.sam3cache\");\nsam3_set_image(ctx, pixels, w, h);            /* cache hit */\n```\n\n### Inspect and flush\n\n```c\nstruct sam3_cache_stats s;\nsam3_cache_stats(ctx, \u0026s);\n/* s.image_hits, s.image_misses, s.image_evictions, and text_* */\n\nsam3_cache_clear(ctx, SAM3_CACHE_IMAGE | SAM3_CACHE_TEXT);\n```\n\n### Video frame cache\n\nVideo tracking has its own two-tier cache for per-frame backbone\nfeatures, tuned via `sam3_video_start_opts`:\n\n- `frame_cache_backend_budget` — resident RAM (default 4 GiB)\n- `frame_cache_spill_budget` — disk spill (default 16 GiB; `SIZE_MAX` disables spill)\n\nSee `include/sam3/sam3.h` for the full cache API.\n\n## Weight Format\n\nModel weights use the `.sam3` binary format — a compact, mmap-friendly layout designed for instant loading:\n\n- 48-byte header + 176-byte tensor descriptors + page-aligned data blob\n- FNV-1a hash table for O(1) tensor lookup by name\n- Supports FP32, FP16, BF16, I32, I8, and Q8_0 (block-quantized int8)\n- Converted from SafeTensors via `sam3_cli convert`\n\nSee [docs/weight-format.md](docs/weight-format.md) for the full specification.\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frifkybujana%2Fsam3.c","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frifkybujana%2Fsam3.c","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frifkybujana%2Fsam3.c/lists"}