{"id":47875955,"url":"https://github.com/eren23/synapse","last_synced_at":"2026-04-04T01:14:19.907Z","repository":{"id":347068617,"uuid":"1189954991","full_name":"eren23/synapse","owner":"eren23","description":"Modular LLM inference engine in Rust + Zig SIMD kernels. Runs on desktop (Metal GPU), browser (WASM), and ESP32. INT8/Q4 quantization, speculative decoding, multi-model support.","archived":false,"fork":false,"pushed_at":"2026-04-03T15:42:31.000Z","size":78255,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-03T19:00:31.188Z","etag":null,"topics":["apple-silicon","edge-ai","embedded","esp32","inference","llm","local-inference","local-llm","machine-learning","metal","metal-gpu","quantization","rust","simd","transformer","wasm","zig"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eren23.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-23T20:39:57.000Z","updated_at":"2026-04-03T15:42:36.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/eren23/synapse","commit_stats":null,"previous_names":["eren23/synapse"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eren23/synapse","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fsynapse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fsynapse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fsynapse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fsynapse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eren23","download_url":"https://codeload.github.com/eren23/synapse/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fsynapse/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31383919,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T23:20:52.058Z","status":"ssl_error","status_checked_at":"2026-04-03T23:20:51.675Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","edge-ai","embedded","esp32","inference","llm","local-inference","local-llm","machine-learning","metal","metal-gpu","quantization","rust","simd","transformer","wasm","zig"],"created_at":"2026-04-04T01:14:16.598Z","updated_at":"2026-04-04T01:14:19.895Z","avatar_url":"https://github.com/eren23.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Synapse\n\n\u003c!-- status:root-positioning:start --\u003e\nEdge-native inference stack for local ML across native and browser targets.\n\n- Native builds use Rust orchestration with Zig SIMD kernels and optional Metal acceleration.\n- Browser builds use a pure-Rust WASM runtime for portability and client-side demos.\n- Public benchmark rows are measured on Apple Silicon and synced from status/benchmark_matrix.json.\n\u003c!-- status:root-positioning:end --\u003e\n\n## Benchmarks\n\n\u003c!-- status:root-benchmark:start --\u003e\n| Family | Configuration | Prompt | Prefill (tok/s) | Decode (tok/s) | Notes |\n|--------|---------------|--------|-----------------|----------------|-------|\n| Qwen3 | f32 CPU | hello | 11 | 7.3 | Runtime backend=cpu_simd; prompt=hello |\n| Qwen3 | INT8 CPU | hello | 23 | 27.3 | Runtime backend=cpu_simd; prompt=hello |\n| LLaMA 3.2 | f32 CPU | hello | 1 | 2.1 | Runtime backend=cpu_simd; prompt=hello |\n| LLaMA 3.2 | INT8 CPU | hello | 8 | 9.7 | Runtime backend=cpu_simd; prompt=hello |\n| Reference | llama.cpp Q4_K_M | reference_only | 5518 | 173 | Reference only, not a parity claim |\n\u003c!-- status:root-benchmark:end --\u003e\n\n\u003e Measured end-to-end on Apple Silicon. Full matrix in [`synapse/status/benchmark_matrix.md`](synapse/status/benchmark_matrix.md).\n\n## Deployment Targets\n\n\u003c!-- status:root-profiles:start --\u003e\n| Runtime Profile | Support | Targets | Backends | Quantization |\n|-----------------|---------|---------|----------|--------------|\n| Native Performance | Stable | aarch64-apple-darwin, x86_64-unknown-linux-gnu | cpu_simd, metal | f32, f16, int8, q4_0, q4_k, q6_k, q8_0 |\n| ARM Compact | Beta | aarch64-unknown-linux-musl, aarch64-unknown-linux-gnu | cpu_simd | f32, int8, q4_0, q4_k |\n| WASM Portable | Stable | wasm32-unknown-unknown | pure_rust_wasm | f32 |\n\u003c!-- status:root-profiles:end --\u003e\n\n\u003c!-- status:root-artifacts:start --\u003e\n| Artifact | Current | Budget | Status |\n|----------|---------|--------|--------|\n| WASM core | ~519 KB | ~160 KB | over |\n| WASM JS wrapper | ~43 KB | ~32 KB | over |\n\u003c!-- status:root-artifacts:end --\u003e\n\n## Supported Models\n\n| Model Family | Type | Status |\n|--------------|------|--------|\n| **Qwen3** | LLM (GQA) | Validated — benchmarked, logits verified |\n| **LLaMA 3.2** | LLM (GQA) | Validated — benchmarked locally |\n| **Mistral 7B** | LLM (Sliding Window) | Config ready — synthetic tests passing |\n| **Phi-3** | LLM (GQA) | In progress |\n| **Gemma** | LLM (MHA, GeGLU) | Config ready — synthetic tests passing |\n| **ViT** | Vision | Validated |\n| **CLIP** | Vision+Text | Supported |\n| **DINOv2** | Vision | Supported |\n| **LEWM** | World Model | Validated — runs on all 3 targets |\n| **Mamba** | SSM | Validated — 130M/370M, INT8+Q4, browser WASM |\n| **RWKV-7** | SSM | Validated — 0.1B/0.4B, value residuals, pre-LayerNorm |\n\nAdding a new model = write a config JSON + weight mapper. No engine changes.\n\n## Component Registry\n\nEvery architectural element is a pluggable trait with config-driven instantiation:\n\n| Component | Variants |\n|-----------|----------|\n| Attention | GQA, MHA, MQA, SlidingWindow |\n| Normalization | RMSNorm, LayerNorm |\n| FFN | SwiGLU, GELU, GeGLU |\n| Position | RoPE, Learned, Sinusoidal |\n| Quantization | f32, f16, INT8, Q4_0, Q4_K, Q6_K, Q8_0 |\n| Weights | safetensors, GGUF |\n\n## Quick Start\n\n```bash\n# Build (Zig kernels auto-rebuild)\ncd synapse \u0026\u0026 cargo build --release\n\n# Download a model\nhuggingface-cli download Qwen/Qwen3-0.6B --local-dir /tmp/qwen3-0.6b\n\n# Chat\ncargo run --example qwen3_chat --release -- --model-dir /tmp/qwen3-0.6b\n\n# Chat with INT8 quantization\ncargo run --example qwen3_chat --release -- --model-dir /tmp/qwen3-0.6b --quantize\n\n# With Metal GPU (macOS)\ncargo run --example qwen3_chat --release --features metal -- --model-dir /tmp/qwen3-0.6b --quantize\n\n# Demo mode (random weights, no downloads)\ncargo run --example qwen3_chat --release -- --demo\n\n# Build for browser\nwasm-pack build -p synapse-wasm --release\n\n# Build for ESP32\ncargo build -p synapse-esp32\n```\n\n## World Models (LEWM)\n\nLatent Emergent World Model — ViT encoder + DiT predictor for latent state prediction.\n\n| Operation | Latency (Apple Silicon) |\n|-----------|------------------------|\n| Encode (224x224 -\u003e 192d) | 26.9ms |\n| Predict (single step) | 12.8ms |\n| Rollout (50 steps) | 609ms |\n\n- **Browser**: 69MB checkpoint, interactive trajectory rollouts (`synapse/web/index.html`)\n- **ESP32-P4**: Phone camera -\u003e WiFi HTTP -\u003e LEWM inference -\u003e JSON response\n- **Quantization**: INT8 (~4x smaller), Q4 (~6.4x compression, ~7MB weights)\n\n### Compression Results (First-Ever JEPA Quantization)\n\n| Config | Size | Quality (cos@20) |\n|--------|------|-------------------|\n| f32 baseline | 52.1 MB | 1.000 |\n| INT8 predictor | 21.4 MB | 0.9998 |\n| Q4 predictor | 17.4 MB | 0.998 |\n| Full Q4 (enc+pred) | 9.4 MB | 0.93 |\n\nNo published work on JEPA quantization exists — these are first-of-kind results.\n\n**Browser demos**: [Main hub](synapse/web/) · [Compression benchmark](synapse/web/lewm-compress-demo/) · [SSM chat](synapse/web/ssm-demo/)\n\n## Roadmap\n\n| Goal | Status |\n|------|--------|\n| Sub-8MB LEWM at cos \u003e0.95 | Current best: 9.4 MB, cos 0.93. Next: structured pruning, mixed Q4/Q8, Hadamard rotation |\n| ESP32-P4 hardware deployment | Code ready (25 tests passing), awaiting hardware for video demo |\n| WASM pre-quantized binaries | Skip the 69 MB f32 download — load ~10 MB Q4 directly |\n| npm package for WASM widget | Package synapse-wasm as embeddable `\u003cscript\u003e` module |\n\n### Why Synapse?\n\n| Capability | Synapse | Alternatives |\n|-----------|---------|-------------|\n| JEPA/LEWM quantization | Q4: 9.4 MB, cos 0.93 (first published) | None exist |\n| WASM binary | 491 KB (133 KB brotli) | Candle: 2-5 MB |\n| SSM inference | Mamba + RWKV-7 via Zig SIMD | Candle: Mamba v1 only |\n| Edge deployment | ESP32-P4 ready | TFLite Micro (no world models) |\n| Model surgery | Wanda + channel + layer pruning | None in compiled languages |\n\n## Architecture\n\n```\nsynapse/\n├── crates/\n│   ├── synapse-inference/    # Models, generation, quantization, chat templates\n│   │   ├── model/            # CausalLM, DecoderLayer, ModelBuilder\n│   │   ├── generation/       # Pipeline, sampler, speculative decoding\n│   │   ├── weight_loading/   # safetensors + GGUF, per-model weight mappers\n│   │   ├── tokenizer/        # BPE tokenizer (HuggingFace format)\n│   │   ├── kv_cache/         # Pre-allocated KV cache\n│   │   ├── quantization/     # INT8 per-channel quantization\n│   │   ├── metal/            # Metal GPU backend (13 shaders, zero-roundtrip forward)\n│   │   ├── lewm/             # World model (ViT encoder + DiT predictor)\n│   │   └── diffusion/        # Diffusion pipeline (scaffolding)\n│   ├── synapse-core/         # FFI wrappers for Zig tensor ops\n│   ├── synapse-sys/          # Raw C bindings (auto-rebuild via build.rs)\n│   ├── synapse-nn/           # Neural network modules\n│   ├── synapse-autograd/     # Tape-based autodiff\n│   ├── synapse-optim/        # SGD, Adam, RMSProp + schedulers\n│   ├── synapse-data/         # DataLoader, Dataset, Sampler\n│   ├── synapse-graph/        # Graph IR + optimization passes\n│   └── synapse-train/        # Training loop + callbacks\n├── synapse-wasm/             # Browser WASM runtime (pure Rust, zero FFI)\n├── synapse-esp32/            # ESP32-P4 edge target (WiFi HTTP server)\n├── zig/src/ops/              # SIMD kernels: matmul, qmatmul, attention, RoPE, RMSNorm\n├── configs/                  # Model configs (Qwen3, LLaMA, Mistral, Phi-3, Gemma)\n├── scripts/                  # Benchmark suite + logit verification\n└── web/                      # Browser LEWM demo\n```\n\n## Testing\n\n```bash\ncargo test -p synapse-inference --lib      # 332 unit tests\ncargo test --test multi_model_validation   # 17 multi-architecture tests\ncargo test --release                       # Full suite including benchmarks\n```\n\n## Development History\n\n| Phase | What |\n|-------|------|\n| 1 | Zig SIMD tensor engine, Rust autograd, training framework |\n| 2 | Transformer stack, attention kernels, RoPE |\n| 3 | Inference engine, component registry, INT8 quantization, Qwen3 |\n| 4 | SIMD kernel wiring, KV cache, Metal GPU shaders |\n| 5 | Multi-model support (LLaMA, Mistral, Phi-3, Gemma), GGUF loading, Q4 quantization |\n| 6 | LEWM world models, WASM runtime, ESP32 target, speculative decoding |\n| 7 | SSM inference (Mamba, RWKV-7), model surgery/pruning, LEWM Q4 compression, WASM demos |\n\n## Built With\n\n- **Rust** — inference engine, autograd, training framework\n- **Zig** — SIMD kernels (ARM NEON + AVX2), C ABI FFI\n- **Metal Shading Language** — GPU compute shaders for Apple Silicon\n- **Swarm development** — built using [attoswarm](https://github.com/attocode) parallel agent orchestration\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feren23%2Fsynapse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feren23%2Fsynapse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feren23%2Fsynapse/lists"}