{"id":42294723,"url":"https://github.com/johnsonlee/graphite","last_synced_at":"2026-04-11T22:53:07.168Z","repository":{"id":334512965,"uuid":"1141070800","full_name":"johnsonlee/graphite","owner":"johnsonlee","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-04T12:59:58.000Z","size":425,"stargazers_count":5,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-04T13:12:20.782Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/johnsonlee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-24T07:28:51.000Z","updated_at":"2026-02-01T11:24:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/johnsonlee/graphite","commit_stats":null,"previous_names":["johnsonlee/graphite"],"tags_count":16,"template":false,"template_full_name":"johnsonlee/kotlin-library-template","purl":"pkg:github/johnsonlee/graphite","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsonlee%2Fgraphite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsonlee%2Fgraphite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsonlee%2Fgraphite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsonlee%2Fgraphite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/johnsonlee","download_url":"https://codeload.github.com/johnsonlee/graphite/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsonlee%2Fgraphite/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31433044,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T08:13:15.228Z","status":"ssl_error","status_checked_at":"2026-04-05T08:13:11.839Z","response_time":75,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-27T09:55:10.582Z","updated_at":"2026-04-11T22:53:07.160Z","avatar_url":"https://github.com/johnsonlee.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Graphite\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)\n\n**Structured codebase context for LLMs.** Graphite turns JVM bytecode into a queryable program graph — so AI agents can understand your codebase without reading every file.\n\n## The Problem\n\nLLMs working with code face a fundamental constraint: **context windows are finite, but codebases are not.**\n\nDumping source files into a prompt is wasteful. Most tokens describe boilerplate, imports, and formatting — not the relationships that matter. An LLM trying to understand \"what calls this method?\" or \"what constants flow into this API?\" must read hundreds of files to answer questions that a graph can answer in milliseconds.\n\n## The Solution\n\nGraphite builds a **program graph** from compiled bytecode — nodes are program elements (methods, fields, constants, call sites), edges are relationships (dataflow, calls, type hierarchy). LLMs query the graph instead of reading source code.\n\n**Before Graphite:** Feed 500 source files (~2M tokens) to find AB test IDs.\n**With Graphite:** Query `graph.callSites(pattern)` → get 23 constants in 12 tokens.\n\n### What the Graph Captures\n\n| Relationship | Example | LLM Use Case |\n|-------------|---------|---------------|\n| **Dataflow** | `x = 42; foo(x)` → constant 42 flows to `foo` | Track config values, feature flags, API keys |\n| **Call graph** | `UserService.save()` calls `Repository.insert()` | Understand execution paths without reading source |\n| **Type hierarchy** | `AdminUser extends User implements Auditable` | Resolve polymorphism, find implementations |\n| **Annotations** | `@GetMapping(\"/api/users\")` on `listUsers()` | Discover endpoints, serialization rules, DI config |\n| **Lambda/method ref** | `items.stream().map(User::getName)` | Trace functional pipelines |\n| **Resources** | `config/application.yml` inside a fat JAR | Cross-reference code with config files |\n\n### Token Efficiency\n\n| Task | Raw Source | Graphite Query | Reduction |\n|------|-----------|----------------|-----------|\n| Find all AB test IDs | ~500 files, 2M tokens | `callSites` + `backwardSlice` → 23 results | **99.99%** |\n| Map REST endpoints | ~200 controllers, 800K tokens | `memberAnnotations` scan → structured list | **99.9%** |\n| Find dead code | Entire codebase, 5M tokens | `branchScopes` + `callSites` → dead paths | **99.99%** |\n| Resolve type hierarchy | ~100 files per type chain | `supertypes` / `subtypes` → direct answer | **99%** |\n\nGraphite uses **Cypher** (the industry-standard graph query language) for querying. The Cypher engine is in the `graphite-cypher` module, powered by an ANTLR-based openCypher parser.\n\n## Why Not Tree-sitter?\n\nTools like [GitNexus](https://github.com/nicobailon/gitnexus), Aider, and most LLM code assistants use [Tree-sitter](https://tree-sitter.github.io/) for codebase understanding. Tree-sitter parses syntax — it sees **text structure**, not **program semantics**.\n\n| Capability | Tree-sitter | Graphite |\n|-----------|-------------|----------|\n| \"What type is this variable?\" | No — sees `var x = foo()`, can't resolve `foo`'s return type | Yes — full type resolution from bytecode |\n| \"What values flow into this parameter?\" | No — can't cross method boundaries | Yes — inter-procedural backward slice |\n| \"Does this interface have implementations?\" | Heuristic grep for class names | Yes — complete type hierarchy from class metadata |\n| \"What does this lambda actually call?\" | No — `invokedynamic` is invisible in source | Yes — MethodHandle extraction from bootstrap args |\n| \"Is this field used via reflection/DI?\" | No — annotation semantics are opaque | Yes — annotation values are queryable data |\n| \"What's the real type of `Object` fields?\" | No — requires dataflow across methods | Yes — cross-method field assignment tracking |\n| Controller inheritance | No — can't resolve inherited annotations | Yes — walks type hierarchy for endpoint discovery |\n\n**The fundamental issue:** Tree-sitter operates on **syntax** (one file at a time, no type resolution, no cross-file dataflow). Graphite operates on **semantics** (compiled bytecode with full type information, inter-procedural analysis, resolved generics).\n\nFor LLMs, this difference is critical. A syntax tree tells you what code *looks like*. A program graph tells you what code *does*.\n\n## Quick Start\n\n```bash\n# Install via Homebrew\nbrew tap johnsonlee/tap\nbrew install graphite graphite-explore\n\n# Build a graph from your JAR\ngraphite build app.jar -o /data/app-graph --include com.example\n\n# Query with Cypher\ngraphite query /data/app-graph \\\n  \"MATCH (c:IntConstant)-[:DATAFLOW*]-\u003e(cs:CallSiteNode)\n   WHERE cs.callee_class =~ 'com.example.*'\n   RETURN c.value, cs.callee_name\"\n\n# JSON output (for LLM consumption)\ngraphite query --format json /data/app-graph \\\n  \"MATCH (n:CallSiteNode) RETURN n.callee_name LIMIT 10\"\n\n# Launch the web UI\ngraphite-explore /data/app-graph --port 8080\n```\n\n## Kotlin API\n\n### Build \u0026 Query\n\n```kotlin\n// Build graph from bytecode\nval graph = JavaProjectLoader(LoaderConfig(\n    includePackages = listOf(\"com.example\")\n)).load(Path.of(\"/path/to/app.jar\"))\n\n// Cypher query\nval result = graph.query(\"\"\"\n    MATCH (c:IntConstant)-[:DATAFLOW*]-\u003e(cs:CallSiteNode)\n    WHERE cs.callee_class =~ 'com.example.*'\n    RETURN c.value, cs.callee_name\n\"\"\")\nresult.rows.forEach { row -\u003e\n    println(\"${row[\"c.value\"]} -\u003e ${row[\"cs.callee_name\"]}\")\n}\n\n// Programmatic query DSL\nval results = Graphite.from(graph).query {\n    findArgumentConstants {\n        method {\n            declaringClass = \"com.example.ab.AbClient\"\n            name = \"getOption\"\n        }\n        argumentIndex = 0\n    }\n}\n\n// Annotations, dataflow analysis\nval annotations = graph.memberAnnotations(\"com.example.User\", \"name\")\nval slice = DataFlowAnalysis(graph).backwardSlice(nodeId)\nslice.constants()  // all constant values that reach this node\n```\n\n### Persist \u0026 Load\n\n```kotlin\n// Save to disk (WebGraph compressed format)\nGraphStore.save(graph, Path.of(\"/data/app-graph\"))\n\n// Load — auto-adaptive based on graph size:\n//   \u003c 1M nodes → eager (all in heap, fastest queries)\n//   \u003e= 1M nodes → mmap (nodes off heap, 75% less memory)\nval graph = GraphStore.load(Path.of(\"/data/app-graph\"))\n\n// Or force a specific strategy\nval graph = GraphStore.load(dir, GraphStore.LoadMode.EAGER)   // always in-heap\nval graph = GraphStore.load(dir, GraphStore.LoadMode.MAPPED)  // always mmap\n```\n\n### Access Resources\n\n```kotlin\ngraph.resources.list(\"**/*.xml\").forEach { entry -\u003e\n    println(entry.path)  // e.g., \"config/application.yml\"\n}\n```\n\n## Architecture\n\n```\ngraphite/\n├── graphite-core/          # Graph interface, nodes, edges, analysis\n├── graphite-cypher/        # Cypher query engine (ANTLR parser + executor)\n├── graphite-sootup/        # SootUp bytecode → graph builder\n├── graphite-webgraph/      # WebGraph disk persistence (BVGraph + LAW tools)\n├── graphite-query/         # CLI: build, query, Cypher\n└── graphite-explore/       # CLI: web visualization\n```\n\n### Storage Format\n\nGraphs are persisted using the [WebGraph](https://webgraph.di.unimi.it/) ecosystem:\n\n| Data | Format |\n|------|--------|\n| Adjacency | BVGraph (2-4 bits/edge) |\n| Edge labels | Byte array in BVGraph order |\n| Strings | FrontCodedStringList (prefix compression) |\n| Node data | Compact binary with string table indices |\n| Metadata | Compact binary with string table indices |\n\n## Analysis Capabilities\n\n| Capability | Description |\n|-----------|-------------|\n| Constant tracking | Direct, local variable, field, cross-class, enum |\n| Auto-boxing | `Integer.valueOf()` transparent handling |\n| Lambda / method ref | `invokedynamic` → actual target resolution |\n| Functional dispatch | Callbacks, return values, fields, varargs, conditionals |\n| Controller inheritance | Endpoint discovery follows class hierarchy |\n| Generic type analysis | `ApiResponse\u003cPageData\u003cUser\u003e\u003e` nested structure |\n| Branch reachability | Dead code via condition constant analysis |\n| Annotations | Generic `memberAnnotations()` for any framework |\n| Cypher queries | `graph.query(\"MATCH ...\")` -- full openCypher read grammar |\n| Resource access | Files inside JAR/WAR/fat JAR (nested JARs) |\n\n## Extension Mechanism\n\nPluggable via `GraphiteExtension` SPI (ServiceLoader):\n\n```kotlin\nclass MyExtension : GraphiteExtension {\n    override fun visit(sootClass: SootClass, context: GraphiteContext) {\n        // Extract domain-specific metadata during graph building\n        context.addMemberAnnotation(className, memberName, annotationFqn, values)\n    }\n}\n```\n\nRegister in `META-INF/services/io.johnsonlee.graphite.sootup.GraphiteExtension`.\n\n## Installation\n\n```kotlin\nrepositories {\n    maven {\n        url = uri(\"https://maven.pkg.github.com/johnsonlee/graphite\")\n        credentials {\n            username = project.findProperty(\"gpr.user\") as String? ?: System.getenv(\"GITHUB_ACTOR\")\n            password = project.findProperty(\"gpr.key\") as String? ?: System.getenv(\"GITHUB_TOKEN\")\n        }\n    }\n}\n\ndependencies {\n    implementation(\"io.johnsonlee.graphite:graphite-core:0.1.0-rc.5\")\n    implementation(\"io.johnsonlee.graphite:graphite-sootup:0.1.0-rc.5\")\n    // Optional: Cypher query support (graph.query(\"MATCH ...\"))\n    implementation(\"io.johnsonlee.graphite:graphite-cypher:0.1.0-rc.5\")\n    // Optional: disk persistence (WebGraph format)\n    implementation(\"io.johnsonlee.graphite:graphite-webgraph:0.1.0-rc.5\")\n}\n```\n\n## License\n\n```\nCopyright 2026 Johnson Lee\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnsonlee%2Fgraphite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohnsonlee%2Fgraphite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnsonlee%2Fgraphite/lists"}