{"id":50960498,"url":"https://github.com/vlsi/pgjdbc-codec-api-review","last_synced_at":"2026-06-18T13:02:28.014Z","repository":{"id":364293315,"uuid":"1267252940","full_name":"vlsi/pgjdbc-codec-api-review","owner":"vlsi","description":"Comparing LLM architecture reviews of the pgjdbc Codec API through claims, evidence, and adjudication (RU + EN)","archived":false,"fork":false,"pushed_at":"2026-06-12T12:21:40.000Z","size":182,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-12T13:25:47.839Z","etag":null,"topics":["anthropic","claude","code-review","gpt","jdbc","llm","llm-evaluation","openai","pgjdbc","postgresql","prompt-engineering"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vlsi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-12T11:15:47.000Z","updated_at":"2026-06-12T12:21:45.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/vlsi/pgjdbc-codec-api-review","commit_stats":null,"previous_names":["vlsi/pgjdbc-codec-api-review"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/vlsi/pgjdbc-codec-api-review","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vlsi%2Fpgjdbc-codec-api-review","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vlsi%2Fpgjdbc-codec-api-review/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vlsi%2Fpgjdbc-codec-api-review/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vlsi%2Fpgjdbc-codec-api-review/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vlsi","download_url":"https://codeload.github.com/vlsi/pgjdbc-codec-api-review/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vlsi%2Fpgjdbc-codec-api-review/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34491239,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic","claude","code-review","gpt","jdbc","llm","llm-evaluation","openai","pgjdbc","postgresql","prompt-engineering"],"created_at":"2026-06-18T13:02:26.697Z","updated_at":"2026-06-18T13:02:27.994Z","avatar_url":"https://github.com/vlsi.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Comparing LLM architecture reviews of the pgjdbc Codec API\n\nThis repository is an experiment in comparing architecture reviews that several LLM agents produced for the public Codec API in pgjdbc.\n\nThe interesting part is the comparison process itself, not just what the models concluded:\n\n* how the original engineering task was framed for an architecture review;\n* what problems the different models found;\n* which findings agreed, diverged, or turned out to be unsupported;\n* how the raw engineering brief was focused into a design-review prompt;\n* how to turn several long LLM answers into a checkable matrix of claims;\n* what a final adjudication pass looks like when two independent comparisons nearly agree.\n\nEvery model request — the primary reviews, the comparisons, and the final comparison — ran at maximum reasoning effort. The models were Fable 5, GPT 5.5, and Opus 4.8.\n\nThe reviewed code is the state of the `vlsi/pgjdbc` fork pinned by the tag [`codec-api-review-2026-06-12`](https://github.com/vlsi/pgjdbc/tree/codec-api-review-2026-06-12) (commit `4b2df19`). The work grew out of pgjdbc PR [#3062](https://github.com/pgjdbc/pgjdbc/pull/3062).\n\nРусская версия: [`ru/README.md`](ru/README.md).\n\n## What's here\n\nThis is the English version. This README and the files under `en/` are translated from the Russian originals in [`ru/`](ru/README.md), which are the source. The files follow the pipeline stages, from the original task to the final comparison.\n\n### The original task\n\n* [`en/1-review-prompt-creation/design-review-prompt.md`](en/1-review-prompt-creation/design-review-prompt.md) — the prompt for the first architecture review of the Codec API: arrays, structs, user-defined types, standalone encode/decode, JDBC adapters, registry, metadata, performance, and migration away from `ArrayEncoding` / `ArrayDecoding`.\n* [`en/1-review-prompt-creation/initial-task.md`](en/1-review-prompt-creation/initial-task.md) — the original task statement, before refinement.\n* [`en/1-review-prompt-creation/refinement-dialogue.md`](en/1-review-prompt-creation/refinement-dialogue.md) — a condensed transcript of the refinement that turned the original statement into the final prompt.\n\nThe original statement already carried plenty of technical context, but it had not yet pinned down the important forks: code review or design review, whether the Codec API is a public SPI, which PostgreSQL types are in scope, and whether to design for a standalone encode/decode API.\n\nWhat helped wasn't an LLM 'improving the prompt'. It was the iteration that surfaced the hidden goals of the work. After the refinement, the prompt was no longer a request to look at `Int4ArrayLeafCodec`; it had become an architecture review of a public codec system for every PostgreSQL type.\n\n### The primary architecture reviews\n\n* [`en/2-review-execution/fable5.md`](en/2-review-execution/fable5.md) — Fable 5's review.\n* [`en/2-review-execution/gpt55.md`](en/2-review-execution/gpt55.md) — GPT 5.5's review.\n* [`en/2-review-execution/opus48.md`](en/2-review-execution/opus48.md) — Opus 4.8's review.\n\nAll three answer the same prompt. Read them as independent attempts to find the architectural risks in one codebase.\n\n### The comparison procedure\n\n* [`en/3-comparison/comparison-prompt.md`](en/3-comparison/comparison-prompt.md) — the prompt for comparing the primary reviews.\n\nThis prompt sets the procedure: break each answer into atomic claims, check the substantive ones against the code, separate facts from opinions, flag hallucinations, build a matrix of agreement, and propose a practical next-step plan.\n\n### Results of the comparison\n\n* [`en/3-comparison/gpt55.md`](en/3-comparison/gpt55.md) — the comparison by GPT 5.5.\n* [`en/3-comparison/opus48.md`](en/3-comparison/opus48.md) — the comparison by Opus 4.8.\n\nBoth compare the same primary reviews, independently. That is useful in itself: you can see how stable the comparison procedure turns out to be.\n\n### Comparing the comparisons\n\n* [`en/4-adjudication/adjudication-prompt.md`](en/4-adjudication/adjudication-prompt.md) — the prompt for the final adjudication pass.\n* [`en/4-adjudication/gpt55.md`](en/4-adjudication/gpt55.md) — the final comparison of the two comparisons, by GPT 5.5.\n* [`en/4-adjudication/opus48.md`](en/4-adjudication/opus48.md) — the final comparison of the two comparisons, by Opus 4.8.\n\nThe two adjudication results nearly matched. That is a good sign: the substantive claims and the practical conclusions held up when the model doing the comparison changed.\n\n## How to read this\n\nFor the result in a hurry:\n\n1. Start with [`en/4-adjudication/gpt55.md`](en/4-adjudication/gpt55.md) or [`en/4-adjudication/opus48.md`](en/4-adjudication/opus48.md).\n2. Open [`en/3-comparison/gpt55.md`](en/3-comparison/gpt55.md) and [`en/3-comparison/opus48.md`](en/3-comparison/opus48.md) to see how the final claims were reached.\n3. Go back to the primary reviews if you want to know which model first spotted a given problem.\n4. Open [`en/3-comparison/comparison-prompt.md`](en/3-comparison/comparison-prompt.md) for the comparison method itself.\n5. Open [`en/1-review-prompt-creation/design-review-prompt.md`](en/1-review-prompt-creation/design-review-prompt.md) for the full engineering context.\n\nIf you care about the methodology rather than pgjdbc:\n\n1. Read [`en/1-review-prompt-creation/initial-task.md`](en/1-review-prompt-creation/initial-task.md) for the original statement.\n2. Read [`en/1-review-prompt-creation/refinement-dialogue.md`](en/1-review-prompt-creation/refinement-dialogue.md) to see which goals were clarified before the reviews ran.\n3. Read the final [`en/1-review-prompt-creation/design-review-prompt.md`](en/1-review-prompt-creation/design-review-prompt.md).\n4. Read one primary review.\n5. Read the comparison prompt.\n6. Compare the two comparison results.\n7. Read the final adjudication prompt and one final result.\n\n## Method\n\nThe experiment runs in several stages.\n\n0. First, the raw engineering statement is refined into a design-review prompt. The point of this step is to surface the hidden decisions, not to reword the text: the type of review, the boundaries of the public API, the type scope, standalone encode/decode, and what counts as a useful result.\n1. Several models then run an architecture review of the same code, independently.\n2. Other models do not redo the review; they compare the results, extracting claims, checking them against the code, and labelling each with a status.\n3. Finally, the two comparisons are themselves compared, to see where the adjudication results already agree.\n\nThe key idea is to distrust a confident statement that comes without evidence.\n\nEvery substantive claim lands in one of these statuses:\n\n* `confirmed` — backed by the code or the spec;\n* `partially confirmed` — broadly right, but stated more widely than the facts support;\n* `unclear` — not enough data;\n* `false / hallucinated` — contradicted by the code or the spec;\n* `design trade-off` — not a bug, but a choice between reasonable options;\n* `opinion` — a recommendation with no hard criterion.\n\nThis process helps separate:\n\n* real architectural risks;\n* debatable design trade-offs;\n* unsupported claims;\n* hallucinations;\n* useful but non-urgent recommendations.\n\n## What the experiment showed\n\nThe most stable conclusions:\n\n* the public Codec SPI still leaks pgjdbc's internal types;\n* the registry and lookup rules need a more explicit model of type identity, override, and fallback;\n* the array path is not yet a single codec-based hot path;\n* range and multirange metadata need their own model, not heuristics via `typelem`;\n* the JDBC compatibility gaps matter as much as the internal codec architecture;\n* the primitive fast path has to be designed explicitly, or the general container model becomes too boxing-heavy;\n* some of the models' claims turned out to be design trade-offs rather than bugs.\n\nThe most useful output was not a single 'best' answer, but the overlap of the independent results plus a list of the divergences that could be checked against the code.\n\n## How to reproduce the process\n\n1. Give several models [`en/1-review-prompt-creation/design-review-prompt.md`](en/1-review-prompt-creation/design-review-prompt.md) and access to the same code.\n2. Save their answers as separate Markdown files.\n3. Give another model [`en/3-comparison/comparison-prompt.md`](en/3-comparison/comparison-prompt.md), the original prompt, and all the primary answers.\n4. Repeat step 3 with a different model.\n5. Give a third model [`en/4-adjudication/adjudication-prompt.md`](en/4-adjudication/adjudication-prompt.md) and the two comparison results.\n6. Check the short list of unresolved and high-severity claims by hand.\n\nFor a new project, you only need to swap in the original architecture-review prompt and the primary model answers. The comparison procedure barely depends on pgjdbc.\n\nTo reproduce the prompt preparation as well as the comparison pipeline, start from the raw task statement and pin down the answers to a few questions separately:\n\n* which type of review you need;\n* what counts as the public API;\n* which entities are in scope;\n* which performance, correctness, and usability goals matter;\n* what results count as useful once the review is done.\n\n## Disclaimer\n\nThis is a research artefact, not an official pgjdbc document.\n\nLLM answers can contain mistakes. Check the important claims against the code, the tests, the JDBC documentation, and PostgreSQL's behaviour.\n\nThe experiment's value isn't that one model is 'right'. It's that independent answers can be reduced to a checkable form: claims, evidence, status, unresolved questions, and next steps.\n\n## License\n\nLicensed under [CC BY 4.0](LICENSE). Share and adapt with attribution, including commercially.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvlsi%2Fpgjdbc-codec-api-review","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvlsi%2Fpgjdbc-codec-api-review","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvlsi%2Fpgjdbc-codec-api-review/lists"}