{"id":49899062,"url":"https://github.com/cactus-compute/needle","last_synced_at":"2026-06-08T03:06:10.760Z","repository":{"id":357445717,"uuid":"1165361576","full_name":"cactus-compute/needle","owner":"cactus-compute","description":"26m function call model that runs on incredibly small devices","archived":false,"fork":false,"pushed_at":"2026-05-12T19:18:53.000Z","size":3019,"stargazers_count":160,"open_issues_count":9,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-05-12T21:17:33.449Z","etag":null,"topics":["cactus","gemini","gemma","llm","on-device-ai"],"latest_commit_sha":null,"homepage":"https://cactuscompute.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cactus-compute.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-24T04:50:47.000Z","updated_at":"2026-05-12T21:17:30.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cactus-compute/needle","commit_stats":null,"previous_names":["cactus-compute/needle"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/cactus-compute/needle","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cactus-compute%2Fneedle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cactus-compute%2Fneedle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cactus-compute%2Fneedle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cactus-compute%2Fneedle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cactus-compute","download_url":"https://codeload.github.com/cactus-compute/needle/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cactus-compute%2Fneedle/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34046009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cactus","gemini","gemma","llm","on-device-ai"],"created_at":"2026-05-16T02:00:23.677Z","updated_at":"2026-06-08T03:06:10.754Z","avatar_url":"https://github.com/cactus-compute.png","language":"Python","funding_links":[],"categories":["Python","\u003cimg src=\"./assets/cpu.svg\" width=\"16\" height=\"16\" style=\"vertical-align: middle;\"\u003e Backends"],"sub_categories":[],"readme":"# Needle\n\n\u003cimg src=\"assets/banner.png\" alt=\"Logo\" style=\"border-radius: 30px; width: 100%;\"\u003e\n\nWe distilled Gemini 3.1 into a 26m parameter \"[Simple Attention Network](docs/simple_attention_networks.md)\" that you can even finetune locally on your Mac/PC.\nIn production, Needle runs on [Cactus](https://github.com/cactus-compute/cactus) at 6000 toks/sec prefill and 1200 decode speed. \nWeights are fully open on [Cactus-Compute/needle](https://huggingface.co/Cactus-Compute/needle), as well as the dataset generation. \n\n```\nd=512, 8H/4KV, BPE=8192\n                                  ┌──────────────┐\n                                  │  Tool Call   │\n                                  └──────┬───────┘\n                                        ┌┴──────────┐\n                                        │  Softmax  │\n                                        └─────┬─────┘\n                                        ┌─────┴─────┐\n                                        │ Linear (T)│  ← tied\n                                        └─────┬─────┘\n                                        ┌─────┴─────┐\n                                        │ ZCRMSNorm │\n                                        └─────┬─────┘\n                                     ┌────────┴────────┐\n                                     │ Decoder x 8     │\n                                     │┌───────────────┐│\n                                     ││ ZCRMSNorm     ││\n                                     ││ Masked Self   ││\n                                     ││ Attn + RoPE   ││\n                                     ││ Gated Residual││\n                                     │├───────────────┤│\n  ┌──────────────┐                   ││ ZCRMSNorm     ││\n  │ Encoder x 12 │──────────────────────▶Cross Attn   ││\n  │              │                   ││ Gated Residual││\n  │ ┌──────────┐ │                   │└───────────────┘│\n  │ │ZCRMSNorm │ │                   └────────┬────────┘\n  │ │Self Attn │ │                      ┌─────┴─────┐\n  │ │ GQA+RoPE │ │                      │ Embedding │  ← shared\n  │ │Gated Res │ │                      └─────┬─────┘\n  │ │          │ │                    ┌───────┴───────-┐\n  │ │ (no FFN) │ │                    │[EOS]\u003ctool_call\u003e│\n  │ └──────────┘ │                    │ + answer       │\n  │              │                    └───────────────-┘\n  └──────┬───────┘\n         │\n    ┌────┴──────┐\n    │ Embedding │\n    └────┬──────┘\n         │\n    ┌────┴──────┐\n    │   Text    │\n    │  query    │\n    └───────────┘\n```\n\n- Pretrained on 16 TPU v6e for 200B tokens (27hrs). \n- Post-trained on 2B tokens of single-shot function call dataset (45mins). \n\nNeedle is an experimental run for Simple Attention Networks, geared at redefining tiny AI for consumer devices (phones, watches, glasses...).\nSo while it beats FunctionGemma-270m, Qwen-0.6B, Graninte-350m, LFM2.5-350m on single-shot function call for personal AI,\nThose model are have more scope/capacity and excel in conversational settings. Also, small models can be finicky. \nPlease use the UI in the next section to test on your own tools, and finetune accordingly, at the click of a button. \n\n## Quickstart\n\n```bash\ngit clone https://github.com/cactus-compute/needle.git\ncd needle \u0026\u0026 source ./setup\nneedle playground\n```\n\nOpens a web UI at http://127.0.0.1:7860 where you can test and finetune on your own tools. Weights are auto-downloaded.\n\n## Usage (Python)\n\n```python\nfrom needle import SimpleAttentionNetwork, load_checkpoint, generate, get_tokenizer\n\nparams, config = load_checkpoint(\"checkpoints/needle.pkl\")\nmodel = SimpleAttentionNetwork(config)\ntokenizer = get_tokenizer()\n\nresult = generate(\n    model, params, tokenizer,\n    query=\"What's the weather in San Francisco?\",\n    tools='[{\"name\":\"get_weather\",\"description\":\"Get current weather for a city.\",\"parameters\":{\"location\":{\"type\":\"string\",\"description\":\"City name.\",\"required\":true}}}]',\n    stream=False,\n)\nprint(result)\n# [{\"name\":\"get_weather\",\"arguments\":{\"location\":\"San Francisco\"}}]\n```\n\n## Finetuning\n\n```bash\n# Playground (generates data via Gemini, trains, evaluates, bundles result)\nneedle playground\n\n# CLI (auto-downloads weights if not local)\nneedle finetune data.jsonl\n```\n\n### Data format\n\nEach line in the JSONL file has three fields: `query`, `tools`, and `answers`.\n\n**Tool schema:**\n```json\n{\n  \"name\": \"get_weather\",\n  \"description\": \"Get current weather for a city.\",\n  \"parameters\": {\n    \"location\": { \"type\": \"string\", \"description\": \"City name.\", \"required\": true }\n  }\n}\n```\n\n**Answer schema:**\n```json\n{ \"name\": \"get_weather\", \"arguments\": { \"location\": \"Paris\" } }\n```\n\n**Full JSONL example** (each line is one training example, `tools` and `answers` are JSON-encoded strings):\n```jsonl\n{\"query\": \"What's the weather in Paris?\", \"tools\": \"[{\\\"name\\\":\\\"get_weather\\\",\\\"description\\\":\\\"Get current weather for a city.\\\",\\\"parameters\\\":{\\\"location\\\":{\\\"type\\\":\\\"string\\\",\\\"description\\\":\\\"City name.\\\",\\\"required\\\":true}}}]\", \"answers\": \"[{\\\"name\\\":\\\"get_weather\\\",\\\"arguments\\\":{\\\"location\\\":\\\"Paris\\\"}}]\"}\n{\"query\": \"Turn off the lights\", \"tools\": \"[{\\\"name\\\":\\\"get_weather\\\",\\\"description\\\":\\\"Get current weather for a city.\\\",\\\"parameters\\\":{\\\"location\\\":{\\\"type\\\":\\\"string\\\",\\\"description\\\":\\\"City name.\\\",\\\"required\\\":true}}},{\\\"name\\\":\\\"toggle_lights\\\",\\\"description\\\":\\\"Toggle smart lights on or off.\\\",\\\"parameters\\\":{\\\"state\\\":{\\\"type\\\":\\\"string\\\",\\\"description\\\":\\\"on or off.\\\",\\\"required\\\":true}}}]\", \"answers\": \"[{\\\"name\\\":\\\"toggle_lights\\\",\\\"arguments\\\":{\\\"state\\\":\\\"off\\\"}}]\"}\n```\n\nProvide at least **120 examples per tool** (100 train / 10 val / 10 test). Fewer examples will overfit — you'll see perfect training metrics but the model won't generalize. Vary query phrasing and include examples with multiple tools available.\n\n### Using a finetuned model\n\nFinetuning saves the best checkpoint as `checkpoints/needle_finetuned_\u003cid\u003e_best.pkl`:\n\n```bash\nneedle run --checkpoint checkpoints/needle_finetuned_*_best.pkl \\\n  --query \"What's the weather?\" --tools '[{\"name\":\"get_weather\",\"description\":\"Get current weather for a city.\",\"parameters\":{\"location\":{\"type\":\"string\",\"description\":\"City name.\",\"required\":true}}}]'\n```\n\n```python\nparams, config = load_checkpoint(\"checkpoints/needle_finetuned_\u003cid\u003e_best.pkl\")\nmodel = SimpleAttentionNetwork(config)\nresult = generate(model, params, get_tokenizer(), query=\"...\", tools='[...]', stream=False)\n```\n\n## CLI\n\n```\nneedle playground                  Test and finetune via web UI\nneedle finetune \u003cdata.jsonl\u003e       Finetune on your own data\nneedle run --query \"...\" --tools   Single inference\nneedle train                       Full training run\nneedle pretrain                    Pretrain on PleIAs/SYNTH\nneedle eval --checkpoint \u003cpath\u003e    Evaluate a checkpoint\nneedle tokenize                    Tokenize dataset\nneedle generate-data               Synthesize training data via Gemini\nneedle tpu \u003caction\u003e                TPU management (see docs/tpu.md)\n```\n\n```\n@misc{ndubuaku2026needle,\n  title={Needle},\n  author={Henry Ndubuaku, Jakub Mroz,  Karen Mosoyan, Roman Shemet, Parkirat Sandhu, Satyajit Kumar, Noah Cylich, Justin H. Lee},\n  year={2026},\n  url={https://github.com/cactus-compute/needle}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcactus-compute%2Fneedle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcactus-compute%2Fneedle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcactus-compute%2Fneedle/lists"}