{"id":51137668,"url":"https://github.com/yurvon-screamo/smos","last_synced_at":"2026-06-26T20:00:27.173Z","repository":{"id":366609817,"uuid":"1276372861","full_name":"yurvon-screamo/smos","owner":"yurvon-screamo","description":"Semantic Memory OS — universal OpenAI-compatible memory proxy for AI agents. Native Rust, ort+ONNX NLI, SurrealDB embedded.","archived":false,"fork":false,"pushed_at":"2026-06-25T17:45:15.000Z","size":1072,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-25T19:32:32.494Z","etag":null,"topics":["agent-memory","ai-agents","ai-memory","ai-proxy","claude-code","coding-agent","cursor","llm-memory","local-first","long-term-memory","mem0-alternative","memory","openai-api","openai-compatible","persistent-context","proxy","rust","self-hosted","semantic-memory","surrealdb"],"latest_commit_sha":null,"homepage":"https://semantic-memory-os.vercel.app","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yurvon-screamo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-06-21T22:40:07.000Z","updated_at":"2026-06-25T18:39:09.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/yurvon-screamo/smos","commit_stats":null,"previous_names":["yurvon-screamo/smos"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/yurvon-screamo/smos","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yurvon-screamo%2Fsmos","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yurvon-screamo%2Fsmos/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yurvon-screamo%2Fsmos/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yurvon-screamo%2Fsmos/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yurvon-screamo","download_url":"https://codeload.github.com/yurvon-screamo/smos/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yurvon-screamo%2Fsmos/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34831250,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-26T02:00:06.560Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-memory","ai-agents","ai-memory","ai-proxy","claude-code","coding-agent","cursor","llm-memory","local-first","long-term-memory","mem0-alternative","memory","openai-api","openai-compatible","persistent-context","proxy","rust","self-hosted","semantic-memory","surrealdb"],"created_at":"2026-06-25T19:30:42.921Z","updated_at":"2026-06-26T20:00:27.126Z","avatar_url":"https://github.com/yurvon-screamo.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# SMOS — Semantic Memory Operating System\n\n**An OpenAI-compatible memory proxy that gives any AI coding agent persistent long-term memory — without code changes, without an MCP server, without a framework.**\n\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![Rust](https://img.shields.io/badge/rust-1.96%20edition%202024-orange.svg)](https://www.rust-lang.org)\n[![crates.io](https://img.shields.io/crates/v/smos.svg)](https://crates.io/crates/smos)\n[![npm](https://img.shields.io/npm/v/@yurvon_screamo/smos.svg)](https://www.npmjs.com/package/@yurvon_screamo/smos)\n[![Release](https://github.com/yurvon-screamo/smos/actions/workflows/release.yml/badge.svg)](https://github.com/yurvon-screamo/smos/actions/workflows/release.yml)\n\n\u003c/div\u003e\n\n## Quick start\n\n```bash\nnpm install -g @yurvon_screamo/smos   # or: cargo binstall smos\nsmos init                              # one-time: downloads ~4 GB of local models\nsmos serve                             # starts on http://localhost:8888\n```\n\nPoint Cursor (or Claude Code, opencode, Cline, Aider, Continue.dev) at\n`http://localhost:8888/v1` and use `bob` as the model name. That assistant\nnow remembers across sessions.\n\n**One prerequisite:** [`llama-server`](https://github.com/ggerganov/llama.cpp)\non your `PATH`. SMOS uses it to run three tiny models locally — extraction,\nembeddings, reranking. The largest is 4B parameters. These run on a laptop\nCPU with integrated graphics — no GPU, no API keys, no cloud bills, no data\nleaving your machine. Prefer cloud providers instead? SMOS supports that\ntoo — see [Configure](#configure).\n\n---\n\nOpen a new chat in Cursor and your assistant starts from scratch. Switch to\nClaude Code or opencode and you re-explain why the cache TTL is 10 seconds,\nnot 60 — your architecture, your conventions, every decision you already\nmade. The model is stateless. The tool is replaceable. The memory should\nnot be.\n\nSMOS fixes this. It is a transparent proxy that sits between your AI client\nand the upstream LLM. Every response is mined for facts automatically — the\nagent does nothing, the agent forgets nothing. Point any OpenAI-compatible\nclient at SMOS and your assistant remembers across sessions, across tools,\nacross model swaps. Works with local llama.cpp, OpenAI, OpenRouter, vLLM —\nany OpenAI-compatible upstream. Run fully local for privacy, or point it\nat your existing cloud provider.\n\n---\n\n## How it works\n\n```\nClient ──▶ SMOS ──▶ upstream LLM (GPT-4o, Claude, local, …)\n              │\n              ├── 1. ENRICH    inject relevant facts into the request\n              ├── 2. FORWARD   stream response back at full LLM speed\n              ├── 3. EXTRACT   mine the response for facts (after delivery)\n              └── 4. FINALIZE  DeBERTa NLI resolves merges and conflicts\n                                (after delivery)\n```\n\nSteps 3 and 4 run **off the request path** — the client receives the\nresponse as soon as the upstream LLM finishes. Extraction and consolidation\nnever add latency. If any step fails, the system degrades gracefully: the\nrequest forwards unenriched, facts stay pending for the next cycle, HTTP\nkeeps serving.\n\nFor the full pipeline, memory lifecycle, and NLI internals, see\n[`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md).\n\n---\n\n## Why SMOS\n\n- **Memory is part of the API, not a tool.** Every response is mined for\n  facts automatically. The agent cannot forget to save, because the agent\n  is not involved in saving. Extraction runs off the request path — zero\n  added latency.\n- **No external database.** Embedded SurrealDB (RocksDB + HNSW vector\n  index). No Postgres, no Neo4j, no Qdrant, no Docker. One binary, one\n  directory.\n- **Contradictions are detected, not overwritten.** A DeBERTa-v3 NLI model\n  evaluates each merge candidate. Both sides of a contradiction are\n  preserved and surfaced to the LLM — not silently overwritten. The\n  theoretical basis: [\"The Price of Meaning\"](https://arxiv.org/abs/2603.27116)\n  (2026) proves vector-only retrieval degrades through semantic\n  interference; external verification is necessary.\n- **Multi-persona isolation.** Bob for Rust, Alice for ML, Charlie for\n  DevOps — each a separate memory namespace. One SMOS instance, N isolated\n  assistants.\n- **Runs on any laptop.** Three tiny local models (4 GB total) handle\n  extraction, embeddings, and reranking on CPU. Tested on a laptop with\n  integrated Intel graphics — no GPU, no API keys, no cloud bills. Your\n  conversations never leave your machine.\n\n---\n\n## Persons: name your assistant\n\nEvery AI client sends a `model` field in the request. SMOS uses that field\nas a **person name** — and each person is a memory namespace, a routing\ntarget, and an optional persona.\n\nWhen Cursor sends `{\"model\": \"bob\", ...}`, SMOS:\n\n1. Uses `\"bob\"` as the memory isolation key\n2. Rewrites `model` to the upstream model declared for Bob\n3. Routes the request to Bob's provider\n4. Injects Bob's persona as a system message\n5. Enriches the request with facts from Bob's memory namespace\n\nCreate **Alice** for ML engineering and **Charlie** for DevOps — each with\nits own memory, provider, and persona. Alice never mixes your Rust types\nwith your Python pipelines. Swap GPT-4o for a local model and Bob stays\nBob — identity lives at the OS layer, not in a chat log you rebuild by\nhand.\n\nSee [Configure → Agents (persons)](#agents-persons) for the TOML.\n\n---\n\n## What you need\n\n- **~5 GB disk** for local models (one-time download: 4 GB GGUF + 643 MB\n  DeBERTa NLI).\n- **`llama-server`** on your `PATH` — get it from\n  [llama.cpp releases](https://github.com/ggerganov/llama.cpp/releases)\n  or build from source. Runs on any modern laptop; GPU is optional, not\n  required.\n- **Any OpenAI-compatible AI client** — Cursor, Claude Code, opencode,\n  Cline, Continue.dev, Aider, Windsurf, or plain `curl`.\n\nNo Postgres. No Neo4j. No Docker. No cloud account. No API key (unless you\nchoose to use a cloud provider as your upstream).\n\n---\n\n## Install\n\n### Prebuilt binary (recommended)\n\n```bash\ncargo binstall smos\n```\n\nThe same binary runs on CPU and (when one is detected) on the host GPU.\nSMOS probes the hardware at startup, downloads the matching ONNX Runtime\nshared library into `~/.smos/models/ort/` on first use (~5–300 MB\ndepending on the device), and falls back to CPU if no GPU is available.\nNo feature flags, no per-vendor rebuild.\n\nDetected devices:\n\n- **Windows** — CUDA (NVIDIA only), DirectML (Intel Arc, AMD, NVIDIA via\n  DirectX 12), CPU fallback.\n- **Linux** — CUDA (NVIDIA), CPU fallback.\n- **macOS** — Metal / CoreML on Apple Silicon, CPU fallback.\n\nOverride the probe by setting `[nli_backend].device = \"cpu\" | \"directml\"\n| \"cuda\" | \"metal\"` in `~/.smos/config.toml`.\n\n### From source\n\n```bash\ncargo install smos\n```\n\n### npm\n\n```bash\nnpm install -g @yurvon_screamo/smos\n```\n\nVerify:\n\n```bash\nsmos --version\n```\n\n---\n\n## Setup\n\n### Step 1 — Get llama-server\n\nSMOS uses [llama.cpp](https://github.com/ggerganov/llama.cpp) to run three\ntiny models locally — a 4B extraction LLM, an embedding model, and a\nreranker. These are small enough to run on a laptop CPU with integrated\ngraphics. No GPU required.\n\n**Quickest path:**\n\n- Download a prebuilt binary from\n  [llama.cpp releases](https://github.com/ggerganov/llama.cpp/releases)\n  (look for `llama-server` in the assets for your platform).\n- Or build from source:\n  `git clone https://github.com/ggerganov/llama.cpp \u0026\u0026 cd llama.cpp \u0026\u0026 cmake -B build \u0026\u0026 cmake --build build --config Release`\n- Ensure `llama-server` is on your `PATH` (`llama-server --help` should\n  work from any directory).\n\nThe GGUF model weights for the three roles are downloaded automatically in\nthe next step — you do not need to fetch them by hand.\n\n\u003e **Prefer cloud?** Skip llama-server entirely. Set `[llama_cpp].auto_launch\n\u003e = false` in `~/.smos/config.toml` and point `[llm_extraction]`,\n\u003e `[embedding]`, and `[reranker]` at any OpenAI-compatible cloud provider.\n\u003e See [Configure](#configure).\n\n### Step 2 — Initialize\n\n```bash\nsmos init\n```\n\nThis single command:\n\n- Creates `~/.smos/` with a default `config.toml`, working directories\n  (`db/`, `models/`, `persons/`, `logs/`, `reports/`), and a stub persona at\n  `persons/bob.md`.\n- Checks for `llama-server` on `PATH`.\n- Downloads the GGUF models (~4 GB total) into `~/.smos/models/`:\n  - `nemotron-3-nano-4b.gguf` — extraction + chat LLM.\n  - `jina-embeddings-v5.gguf` — embedding model.\n  - `qwen3-reranker.gguf` — cross-encoder reranker.\n- Probes `/health` on the three configured ports (28081 embedding, 28082\n  extraction, 28181 reranker).\n- Initializes the database (SurrealDB migrations).\n- Reports what is ready and what still needs attention.\n\nAlready-downloaded models are skipped, so re-running `smos init` only retries\nthe failed ones. Fix any `✗` items shown, then run `smos init` again to verify.\nFor a deeper audit (NLI cache, stats, a Markdown report), run `smos doctor`.\n\n### Step 3 — Start\n\n```bash\nsmos serve\n```\n\nWith `auto_launch = true` (the default), SMOS spawns the three `llama-server`\nprocesses itself on first start — an already-running server on the same port is\nreused. The first start also downloads the DeBERTa NLI model (~643 MB) into\n`~/.smos/models/`; subsequent starts are instant.\n\nVerify it works:\n\n```bash\ncurl http://localhost:8888/health\n# → {\"status\":\"ok\",\"version\":\"0.1.7\"}\n```\n\n### Step 4 — Install as a service (optional)\n\n```bash\nsmos service install      # auto-starts at boot\nsmos service start        # start now\nsmos service status       # current state\nsmos service stop         # stop\nsmos service uninstall    # remove\n```\n\nRegistered as systemd (Linux), launchd (macOS), or a Windows Service.\n\n---\n\n## Configure\n\nAll configuration lives in `~/.smos/config.toml`. `smos init` creates it with\nsafe defaults; edit the file by hand from there. Any section omitted falls back\nto the built-in default.\n\n### Inspect current configuration\n\n```bash\nsmos config show          # full resolved config as TOML (defaults merged in)\nsmos config providers     # list providers: name → URL\nsmos config persons       # list agents: name → provider / model\n```\n\nThese commands are read-only. To change configuration, edit the TOML.\n\n### Providers\n\nA **provider** is one upstream OpenAI-compatible endpoint (`llama-server`,\nOpenRouter, OpenAI, vLLM…). One entry per upstream; there is no round-robin or\nfailover — routing is per-agent.\n\n```toml\n[[providers]]\nname = \"llama-local\"\nurl = \"http://localhost:28082/v1/chat/completions\"\napi_key_env = \"\"                       # env var name; empty = no auth header\n\n# Cloud example — uncomment and set OPENROUTER_API_KEY in the environment\n# [[providers]]\n# name = \"openrouter\"\n# url = \"https://openrouter.ai/api/v1/chat/completions\"\n# api_key_env = \"OPENROUTER_API_KEY\"\n```\n\n### Agents (persons)\n\nA **person** bundles a memory namespace, a routing target, and an optional\npersona. When a client sends `{\"model\": \"bob\", ...}`, SMOS uses `\"bob\"` as the\nmemory isolation key, rewrites `model` to the upstream model, and routes to the\ndeclared provider.\n\n```toml\n[persons.bob]\nprovider = \"llama-local\"               # must match a [[providers]].name\nmodel = \"nemotron-3-nano-4b\"           # upstream model id\npersona = \"~/.smos/persons/bob.md\"     # optional; ~ expands to user home\n\n# [persons.alice]\n# provider = \"openrouter\"\n# model = \"z-ai/glm-5.2\"\n# persona = \"~/.smos/persons/alice.md\"\n```\n\nA model name that is not a configured person returns HTTP 400 — every request\nmust name a real `[persons.*]` entry.\n\n### Persona files\n\n`~/.smos/persons/bob.md` is plain markdown, injected once per conversation as a\n`system` message:\n\n```markdown\nYou are Bob, a Rust systems programming assistant.\nFocus on memory safety and performance.\nBe concise. Prefer code over long explanations.\nReply in English.\n```\n\n### Git memory sync (optional)\n\nDual-write every extracted fact to a local git repo as markdown files — backup,\nversioning, and re-hydration onto another machine. Empty `repo_url` disables\nsync.\n\n```toml\n[git]\nrepo_url = \"git@github.com:user/smos-memory.git\"\nbranch = \"main\"\nauto_push = true\nlocal_path = \"~/.smos/git/memory\"\ndisable_gpg_sign = true\n```\n\nOn a second machine, re-hydrate the facts with `smos import-git \u003curl\u003e`. Provider\nAPI keys are read from the env var named in `api_key_env`, so secrets never land\nin TOML.\n\n### Advanced: llama.cpp auto-launch\n\nBy default, `smos serve` spawns the three `llama-server` processes itself and\nreuses any server already bound to the configured port. Override the binary,\nports, model paths, or extra CLI args here; flip `auto_launch = false` if you\nlaunch `llama-server` yourself or use a remote / cloud provider.\n\n```toml\n[llama_cpp]\nbinary = \"llama-server\"\nauto_launch = true\n# Unload models from VRAM after this many seconds idle (5 min default).\n# Set to 0 to disable. Only appended when llama-server supports the flag.\nidle_timeout_seconds = 300\n\n[llama_cpp.embedding]\nmodel_path = \"~/.smos/models/jina-embeddings-v5.gguf\"\nport = 28081\nextra_args = [\"--ctx-size\", \"2048\", \"--embeddings\"]\n\n[llama_cpp.reranker]\nmodel_path = \"~/.smos/models/qwen3-reranker.gguf\"\nport = 28181\nextra_args = [\"--ctx-size\", \"8192\"]\n\n[llama_cpp.extraction]\nmodel_path = \"~/.smos/models/nemotron-3-nano-4b.gguf\"\nport = 28082\nextra_args = [\"--ctx-size\", \"4096\"]\n```\n\n### Full configuration reference\n\nSee [`smos.toml`](smos.toml) for the canonical, fully-commented example.\n\n| Section | Purpose |\n|---|---|\n| `[[providers]]` | OpenAI-compatible chat-completion endpoints. One per upstream. |\n| `[persons.\u003cname\u003e]` | Person = memory key + provider + upstream model + optional persona. |\n| `[git]` | Git-backed memory sync (`repo_url`, `branch`, `auto_push`). |\n| `[llama_cpp]` | Auto-launch config for `llama-server` processes (ports, model paths). |\n| `[llm_extraction]` | Fact-extraction LLM (model, temperature, seed, timeout). |\n| `[embedding]` | Vector embedding model (model, dimensions, timeout). |\n| `[reranker]` | Cross-encoder reranker URL (`/v1/rerank`). |\n| `[retrieval]` | top-K initial/final, `min_topic_chars`, `min_confidence`. |\n| `[merge]` | Cosine threshold for merge candidate selection. |\n| `[confidence]` | Base + multi-source/no-contradiction bonuses, accept/pending cut. |\n| `[nli]` | Verdict thresholds (contradiction/entailment). |\n| `[nli_backend]` | Native ONNX model id + cache directory + device selection. |\n| `[extraction]` | Semantic dedup cosine threshold. |\n| `[heat]` | Decay rate, min threshold (boosts recently-active facts). |\n| `[session]` | Timeout, pending overflow, watcher scan interval. |\n| `[audit]` | Optional dreaming agent (schedule, model, mutation caps). |\n| `[surreal]` | Embedded RocksDB path + namespace/database. |\n| `[server]` | Bind host/port, shutdown grace, log format. |\n\n---\n\n## Connect your AI client\n\nAny client that speaks the OpenAI Chat Completions API works — Cursor,\nClaude Code, opencode, Cline, Continue.dev, Aider, Windsurf, and anything\nelse that lets you set a custom base URL. Point it at SMOS and use the\n**person name** as the model.\n\n### opencode\n\n```bash\nexport OPENAI_BASE_URL=http://localhost:8888/v1\nexport OPENAI_API_KEY=smos\nopencode --model bob\n```\n\n### Cursor\n\nSettings → Models → OpenAI API Base URL: `http://localhost:8888/v1`\nModel name: `bob`\n\n### curl\n\n```bash\ncurl http://localhost:8888/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\":\"bob\",\"messages\":[{\"role\":\"user\",\"content\":\"hello\"}]}'\n```\n\nFor other OpenAI-compatible clients, the pattern is the same: set the\nbase URL to `http://localhost:8888/v1`, set any API key (SMOS does not\nvalidate it by default), and use the person name as the model.\n\n---\n\n## Commands\n\n| Command | Description |\n|---|---|\n| `smos init` | One-command setup: bootstrap `~/.smos`, download GGUF models, probe `llama-server`, run DB migrations. Idempotent. |\n| `smos serve` | Start the HTTP proxy (auto-launches `llama-server` processes). |\n| `smos doctor` | Validate environment + show SurrealDB stats. |\n| `smos doctor --stats` | Quick memory stats (no model round-trips). |\n| `smos doctor --report \u003cpath\u003e` | Generate a Markdown health report. |\n| `smos doctor --skip-llama` | Skip the `llama-server` + reranker probes. |\n| `smos config show` | Print the full resolved configuration as TOML. |\n| `smos config providers` | List configured providers (name → URL). |\n| `smos config persons` | List configured agents (name → provider / model). |\n| `smos import --from-file \u003cf\u003e` | Import an opencode transcript JSON into memory. |\n| `smos import --list` | List discoverable opencode sessions. |\n| `smos import-dir \u003cpath\u003e` | Bulk import from a directory (`*.md`, `*.txt`, `*.json`, …). |\n| `smos import-git \u003curl\u003e` | Re-hydrate facts from a git-synced memory repo. |\n| `smos import raw \"\u003ctext\u003e\"` | Extract facts from arbitrary free-form text. |\n| `smos import raw --stdin` | Same, reading the text body from stdin. |\n| `smos finalize \u003csession\u003e` | Manually trigger memory consolidation for one session. |\n| `smos audit` | Run the dreaming agent once (memory cleanup / merges / pruning). |\n| `smos service install` | Install SMOS as a system service (auto-starts at boot). |\n\nGlobal flag: `--config \u003cpath\u003e` to point at a non-default config file.\n\n---\n\n## Known limitations\n\nHonest scope, not marketing hedging:\n\n- **643 MB DeBERTa-v3 ONNX download on first start.** Subsequent starts\n  are instant. The model is cached under `~/.smos/models/`.\n- **`llama-server` on `PATH` for local inference.** SMOS auto-launches\n  the three `llama-server` processes (extraction, embedding, reranker)\n  when `auto_launch = true`. The models are tiny (4 GB total) and run on\n  CPU. To use cloud providers instead, set `auto_launch = false` and\n  point the extraction / embedding / reranker URLs at your provider.\n- **Extraction model is English-optimized.** Nemotron-3-Nano-4B is\n  multilingual, but accuracy is highest on English. The DeBERTa NLI model\n  is English-only.\n- **Single-process SurrealDB lock.** One SMOS instance per database path.\n  No built-in horizontal scaling. Multi-machine sync via the git backend.\n- **Not benchmarked on LOCOMO.** The NLI contradiction detection is the\n  architectural choice, not a benchmark number.\n\n---\n\n## Inspiration\n\nSMOS builds on academic research in AI agent memory:\n\n- **[MemoryOS: Memory OS of AI Agent](https://arxiv.org/abs/2506.06326)**\n  (Kang et al., 2025, EMNLP 2025 Oral) — hierarchical memory management\n  for AI agents. SMOS adopts a similar lifecycle\n  (`pending → accepted → conflict-flagged`) driven by natural-language\n  inference rather than hand-tuned heuristics.\n- **[The Price of Meaning: Why Every Semantic Memory System Forgets](https://arxiv.org/abs/2603.27116)**\n  (Ray Barman et al., 2026) — interference is fundamental in semantic\n  memory: every store that decides what to keep also decides what to\n  lose, and pure vector retrieval is mathematically proven to degrade.\n  SMOS sidesteps this by preserving both sides of a contradiction and\n  flagging them, instead of picking a winner — and by layering DeBERTa\n  NLI on top of cosine retrieval as the external verification the paper\n  calls necessary.\n\n---\n\n## License\n\nMIT — see [`LICENSE`](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyurvon-screamo%2Fsmos","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyurvon-screamo%2Fsmos","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyurvon-screamo%2Fsmos/lists"}