{"id":50454168,"url":"https://github.com/mizcausevic-dev/embedding-drift-graph","last_synced_at":"2026-06-01T01:05:42.838Z","repository":{"id":357459196,"uuid":"1236289371","full_name":"mizcausevic-dev/embedding-drift-graph","owner":"mizcausevic-dev","description":"Track how entity embeddings drift across encoder model versions. SQLite store + Strawberry GraphQL API. Cosine drift events computed automatically on every record. Reference impl for RAG/eval pipelines re-encoding their corpus.","archived":false,"fork":false,"pushed_at":"2026-05-12T21:39:30.000Z","size":53,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-12T22:30:07.307Z","etag":null,"topics":["ai-governance","drift-detection","embeddings","graphql","llm","numpy","python","rag","sqlite","strawberry-graphql","vector-search"],"latest_commit_sha":null,"homepage":"https://github.com/mizcausevic-dev/embedding-drift-graph","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mizcausevic-dev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-12T05:47:18.000Z","updated_at":"2026-05-12T21:39:34.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mizcausevic-dev/embedding-drift-graph","commit_stats":null,"previous_names":["mizcausevic-dev/embedding-drift-graph"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/mizcausevic-dev/embedding-drift-graph","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fembedding-drift-graph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fembedding-drift-graph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fembedding-drift-graph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fembedding-drift-graph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mizcausevic-dev","download_url":"https://codeload.github.com/mizcausevic-dev/embedding-drift-graph/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fembedding-drift-graph/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33755379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-governance","drift-detection","embeddings","graphql","llm","numpy","python","rag","sqlite","strawberry-graphql","vector-search"],"created_at":"2026-06-01T01:05:42.758Z","updated_at":"2026-06-01T01:05:42.832Z","avatar_url":"https://github.com/mizcausevic-dev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# embedding-drift-graph\n\nTrack how entity embeddings drift across model/encoder versions. Every embedding you record is automatically compared to every prior embedding for the same entity, with the cosine distance materialized as a drift event you can query through a GraphQL API.\n\nUse it when:\n- You re-encoded your knowledge base after an encoder upgrade and need to see which entities moved the most\n- Your RAG pipeline switched models and quality regressed; you want to know which concepts the new model \"thinks differently\" about\n- You're benchmarking encoders and want a real metric for \"semantic stability per entity\"\n\n## Install\n\n```bash\npip install -e .[dev]\n```\n\n## Quickstart\n\n```bash\nembedding-drift seed       # build an in-memory store with 3 entities × 3 model versions\nembedding-drift query      # run a sample GraphQL query against the seeded store\nembedding-drift schema     # print the GraphQL SDL\n```\n\n`embedding-drift query` output (deterministic, seed 42):\n\n```json\n{\n  \"stats\": { \"entityCount\": 3, \"embeddingCount\": 9, \"driftEventCount\": 9 },\n  \"drift\": [\n    {\n      \"entityId\": \"entity_a\", \"fromModel\": \"encoder-v1\", \"toModel\": \"encoder-v3\",\n      \"cosineDistance\": 0.2134830375679505\n    },\n    {\n      \"entityId\": \"entity_a\", \"fromModel\": \"encoder-v2\", \"toModel\": \"encoder-v3\",\n      \"cosineDistance\": 0.18909304125940885\n    },\n    {\n      \"entityId\": \"entity_c\", \"fromModel\": \"encoder-v2\", \"toModel\": \"encoder-v3\",\n      \"cosineDistance\": 0.12445277619347739\n    }\n    ...\n  ]\n}\n```\n\nNote that `entity_a` shows substantially more drift between `encoder-v2` and `encoder-v3` than `entity_b` or `entity_c` — the seed simulates a \"concept rename\" for entity_a during the v3 transition, which is exactly the kind of regression you want detected.\n\n## Library usage\n\n```python\nfrom embedding_drift import Store\n\nstore = Store(\"drift.db\")          # or omit path for in-memory\nstore.upsert_entity(\"concept_x\", \"Concept X\")\n\nemb_v1, drifts = store.record_embedding(\"concept_x\", \"encoder-v1\", [0.12, -0.04, ...])\n# drifts is empty (no prior versions)\n\nemb_v2, drifts = store.record_embedding(\"concept_x\", \"encoder-v2\", [0.10, -0.03, ...])\n# drifts has one DriftEvent comparing v1 -\u003e v2\n\n# Query all drift events above a threshold\nfor d in store.drift_events(min_distance=0.10):\n    print(d.entity_id, d.from_model, \"-\u003e\", d.to_model, \"cos_dist\", d.cosine_distance)\n```\n\n## GraphQL\n\nThe package exposes a [Strawberry GraphQL](https://strawberry.rocks) schema. Embed it in any ASGI server:\n\n```python\nfrom strawberry.asgi import GraphQL\nfrom embedding_drift import schema, Store\nfrom embedding_drift.schema import set_active_store\n\nset_active_store(Store(\"drift.db\"))\napp = GraphQL(schema)\n```\n\nThen run with `uvicorn yourmodule:app --reload` and POST queries to `/`.\n\n### Available queries\n\n```graphql\n{\n  stats { entityCount embeddingCount driftEventCount }\n  entities { id name createdAt }\n  embeddings(entityId: \"concept_x\") { modelVersion vector recordedAt }\n  drift(entityId: \"concept_x\", minDistance: 0.10) {\n    fromModel toModel cosineDistance\n  }\n}\n```\n\n### Available mutations\n\n```graphql\nmutation {\n  upsertEntity(id: \"concept_x\", name: \"Concept X\") { id name }\n  recordEmbedding(\n    entityId: \"concept_x\",\n    modelVersion: \"encoder-v2\",\n    vector: [0.10, -0.03]\n  ) { fromModel toModel cosineDistance }\n}\n```\n\n## Why SQLite (and not pgvector)?\n\nThis is a reference implementation that runs anywhere Python runs, with no extra infrastructure. The drift math is pure numpy. For production scale you can swap the storage backend without touching the GraphQL surface — `Store` is a thin layer over four SQL tables you'd recognize in any RDBMS.\n\n## Data model\n\n| Table | Purpose |\n|---|---|\n| `entities` | Canonical entities (id + display name + created_at) |\n| `embeddings` | One row per `(entity_id, model_version)`; vector stored as JSON |\n| `drift_events` | Computed on insert: cosine distance between every pair of model versions per entity |\n\n## Development\n\n```bash\npip install -e .[dev]\npytest -v\npython -m embedding_drift seed\npython -m embedding_drift query\npython -m embedding_drift schema   # prints SDL\n```\n\n## Dependencies\n\n- [numpy](https://numpy.org/) ≥ 1.26 — vector math\n- [strawberry-graphql](https://strawberry.rocks/) ≥ 0.220 — schema definition\n- Python `sqlite3` (stdlib) — storage\n\n## License\n\nAGPL-3.0.\n\n---\n\n**Connect:** [LinkedIn](https://www.linkedin.com/in/mirzacausevic/) · [Kinetic Gain](https://kineticgain.com) · [Medium](https://medium.com/@mizcausevic/) · [Skills](https://mizcausevic.com/skills/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizcausevic-dev%2Fembedding-drift-graph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmizcausevic-dev%2Fembedding-drift-graph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizcausevic-dev%2Fembedding-drift-graph/lists"}