{"id":18361354,"url":"https://github.com/greghendershott/pdb","last_synced_at":"2025-07-17T02:43:13.956Z","repository":{"id":138586942,"uuid":"350777826","full_name":"greghendershott/pdb","owner":"greghendershott","description":"Multi-file check-syntax database","archived":false,"fork":false,"pushed_at":"2024-08-20T14:27:54.000Z","size":596,"stargazers_count":14,"open_issues_count":1,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-06-04T21:54:43.294Z","etag":null,"topics":["racket"],"latest_commit_sha":null,"homepage":"","language":"Racket","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greghendershott.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-23T16:14:47.000Z","updated_at":"2025-01-17T19:39:26.000Z","dependencies_parsed_at":"2024-11-05T22:35:18.510Z","dependency_job_id":"4575916d-55eb-40e3-951b-ffdf3181beb5","html_url":"https://github.com/greghendershott/pdb","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/greghendershott/pdb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greghendershott%2Fpdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greghendershott%2Fpdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greghendershott%2Fpdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greghendershott%2Fpdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greghendershott","download_url":"https://codeload.github.com/greghendershott/pdb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greghendershott%2Fpdb/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265560994,"owners_count":23788291,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["racket"],"created_at":"2024-11-05T22:33:31.058Z","updated_at":"2025-07-17T02:43:13.931Z","avatar_url":"https://github.com/greghendershott.png","language":"Racket","funding_links":[],"categories":[],"sub_categories":[],"readme":"This is WIP exploring the idea of storing, for multiple source files,\nthe result of running drracket/check-syntax, plus some more analysis.\n\nThe main motivation is to support **multi-file** flavors of things\nlike \"find references\" and \"rename\".\n\nThe intent is this could enhance Racket Mode, as well as Dr Racket\nand other tools.\n\n# Database\n\nFor each analyzed source file:\n\n1. Fully expand, accumulating some information even if expansion\n   fails (as used by e.g. Typed Racket):\n\n  - direct calls to `error-display-handler`\n  - `online-check-syntax` logger messages\n\n2. Run [check-syntax], recording the values from various\n[`syncheck-annotations\u003c%\u003e`] methods.\n\nAfter accumulating information in various fields of a struct, finally\nthe struct is serialized, compressed, and stored in a sqlite table.\n\n[check-syntax]: https://docs.racket-lang.org/drracket-tools/Accessing_Check_Syntax_Programmatically.html\n[`syncheck-annotations\u003c%\u003e`]: https://docs.racket-lang.org/drracket-tools/Accessing_Check_Syntax_Programmatically.html#%28def._%28%28lib._drracket%2Fcheck-syntax..rkt%29._syncheck-annotations~3c~25~3e%29%29\n\nWe extend the check-syntax analysis in various ways:\n\n- In addition to `syncheck:add-definition-target`, which identifies\n  definitions, we identify and record _exports_ from fully-expanded\n  `#%provide` forms.\n  \n- In addition to `syncheck:add-arrow/name-dup/pxpy`, which identifies\n  lexical and import arrows, we identify and record some other flavors\n  of arrows:\n  \n  - import-rename-arrows, as from `rename-in` etc.\n  - export-rename-arrows, as from `rename-out`, etc.\n\n  Also we enhance the check-syntax import-arrows to store the \"from\"\n  and \"nominal-from\" information from [identifier-binding]. Following\n  the nominal-from values to the exports in other files, and vice\n  versa, is how we can identify rename-sites across multiple files.\n\n[identifier-binding]:https://docs.racket-lang.org/reference/stxcmp.html#%28def._%28%28quote._~23~25kernel%29._identifier-binding%29%29\n\n- We build a map from positions to submodule name paths (where `()`\n  means no submodule, i.e. the outermost, file module) as well as\n  whether the module sees its parent's bindings (as with `module+`).\n  This map supports various functionality:\n\n  - A tool can implement a \"run/enter submodule at current position\"\n    command without assuming (as does \"classic\" Racket Mode)\n    s-expression surface syntax to discover the submodule name path.\n\n  - Knowing from which module(s) to itemize imported symbols for\n    completion candidates (as described in the next bullet point).\n\n- For each module, we record each `#%require` in a normalized format\n  (module path, whether it is the module language, any prefix, and any\n  exceptions). Later this can be \"cashed in\" for a list of the\n  imported symbols, to be used by a tool like a source code editor for\n  completion candidates. In some cases we can obtain the export\n  symbols from our own database; as a fallback we use\n  `module-\u003eexports`.\n\n## You want to jump where, in what size steps?\n\nIn Racket a definition can be exported and imported an arbitrary\nnumber of times before it is used -- and can be renamed at each such\nstep.\n\nIn general, the definition graph elides that and expresses \"big,\ndirect jumps\" among files. Which is wonderful when you want to e.g.\n\"visit/find/jump to definition\" in another file.\n\nBy contrast the \"name introduction and use\" graph cares about the\nchain of exports and imports, and considers steps where a rename\noccurs. A motivation is to support multi-file rename commands. For\nthat to work, every occurrence of the \"same\" name must be known,\nincluding uses in `provide` and `require` forms, and considering\nclauses like `rename-out`, `prefix-ix`, `rename-in`, `prefix-out`, and\nso on.\n\nFor example, if user wants `foo` to be renamed `bar`, then sites like\n`(provide foo)` must be changed. Furthermore, sites like `(provide\n(rename-out [foo xxx]))` are inflection points where the graph ends.\nIf some other file does `(require (rename-in mod [xxx foo]))`, _that_\n\"foo\" is not the same and should not be in the same set of sites to be\nrenamed as the \"foo\" in the exporting file.\n\n## use-\u003edef vs. def-\u003euses\n\nFor either type of graph, it is simple to proceed from a use to its\nsource. When the source is in some other file, we know _which_ other\nfile: The `identifier-binding` \"from\" or \"nominal-from\" information\nalways says in which other file to look. If that file isn't yet in the\ndatabase (or is outdated), we analyze it, and so on transitively.\nFurthermore it is a 1:1 relation; even when there are multiple steps\n(such as hopping through a contract wrapper to the wrapped\ndefinition), each step is 1:1.\n\nOn the other hand, proceeding from a definition to its uses is a\n1:many relation, transitively (each of the many uses may in turn have\nmany uses). Furthermore we can't discover absolutely all uses --\nunless absolutely all using files have already been analyzed. There\nexists only a set of _known_ uses, which is limited by the set of\nalready-analyzed files.\n\nThis is another motivation to save analysis results for multiple files\nin a database. One or more directory trees, each for some project the\nuser cares about, can be analyzed proactively. (Thereafter a digest\nmismatch can trigger an automatic re-analysis of a changed file.) This\nenables discovering all uses, at least within the scope of those\nprojects.\n\n# Disposition\n\n## Racket Mode\n\nStatus quo, Racket Mode's back end runs check-syntax and returns to\nthe front end `racket-xp-mode` the full results for each file. The\nentire Emacs buffer is re-propertized. For example mouse-overs become\n`help-echo` text properties.\n\nHow exactly would Racket Mode's back end use this `pdb` project.\n\n### Roadmap step 1: Still all results at once\n\nInitially, Racket Mode's back end could use this pdb project the same\nway: Get the full analysis results, and re-propertize the entire\nbuffer.\n\nThat alone is no improvement. But we could add new Racket Mode\ncommands that query the db, such as multi-file xref-find-references or\nrenaming.\n\nFurthermore, I think we could eliminate the back end's cache of fully\nexpanded syntax. For example find-definition no longer needs to walk\nfully-expanded syntax looking for a site. We already did that, for all\ndefinitions, and saved the results; now it's just a db query.\n\n(I'm not sure about find-signature: Maybe we could add a pass to walk\npre-expanded surface syntax, finding all signatures, as the status quo\nback end does one by one.)\n\n---\n\n**Status**: Done as an initial sanity check, then discarded. I\nmodified `racket-xp-mode` and the Racket Mode back end to use pdb when\navailable, and use the same propertize-all-buffer approach. It\nperformed about the same as before; having multi-file rneame was nice.\nAlthough that's still in the commit history, I wanted to move on past\nthat to the next step.\n\n### Step 2: Query results JIT for spans\n\nA bigger change: The front end would query just for various spans of\nthe buffer, as-needed.\n\nThis would improve how we handle larger files like\n[class-internal.rkt], not to mention eenormous files like the [example\nprovided by samth].\n\n[example provided by samth]: https://github.com/greghendershott/racket-mode/issues/522\n[class-internal.rkt]: https://github.com/racket/racket/blob/master/racket/collects/racket/private/class-internal.rkt\n\nStatus quo, Emacs doesn't block while the analysis is underway, but\nafter it completes, for a sufficiently large buffer and analysis\nresults, it takes a very long time to marshal the results and to\nre-propertize the entire buffer; Emacs can noticeably freeze.\n\nAdmittedly doing limited, JIT queries doesn't magically transform\ndrracket/check-syntax itself to a \"streaming\" or incremental approach.\nThe _entire_ analysis would still need to complete (still taking about\n10 seconds for [class-internal.rkt], and 60 for the [example provided\nby samth]!) before _any_ new results were available. However the\nresults could be retrieved in vastly smaller batches. IOW there would\nstill be a large delay until any new results were available, but no\nupdate freezes.\n\n---\n\n**Status:** Done. Still dog-fooding. I quickly realized that modifying\n`racket-xp-mode` to work in both the \"classic\" and new ways was going\nto be messy. Instead I made a fresh `racket-pdb-mode`. This works by\ndoing a query to the db whenever point (Emacs jargon, a.k.a. the\ncaret) moves. The back end and pdb return values only pertaining to\npoint and the currently visible span (the window-start through\nwindow-end positions, in Emacs jargon). I'm still dog-fooding this,\nlooking for problems or mis-features.\n\n## Other tools\n\nOf course this could become a package to be used in various other\nways.\n\nWe could offer any of:\n\n- A CLI (e.g. a new `raco` tool).\n\n- A stable API for Racket programs.\n\n- An equivalent API via HTTP.\n\nOne issue here is that some tools might prefer or need line:column\ncoordinates instead of positions. [Effectively drracket/check-syntax\nand our own analysis use `syntax-position` and `syntax-span`, ignoring\n`syntax-line` and `syntax-column`.] Either we could try to store\nline:col-denominated spans, also, in the db when we analyze (at some\ncost in space). Or we could just synthesize these as/when needed by\nsuch an API, by running through the file using `port-count-lines!` (at\nsome cost in time).\n\n# Known limitations and to-do\n\n- The `#%provide` clauses `all-defined`, `all-defined-except`,\n  `prefix-all-defined`, and `prefix-all-defined-except` are not yet\n  supported by our analysis that finds exports. (Note that `provide`\n  clauses like `all-defined-out` do not actually expand into these,\n  and _are_ supported. So this limitation isn't as big as it seems.\n  But if some handwritten code or other macro expansion uses these\n  specific `#%provide` clauses, the exports won't be identified.)\n\n- The `rename-sites` command currently returns a hash-table value with\n  all results. For renames involving a huge number of files and sites,\n  a for-each flavor might be preferable.\n\n# Tire kicking\n\nIf you want to kick the tires on this in its current state, I\nrecommend looking at the tests in `example.rkt`, as called from the\n`tests` submodule.\n\nAs the functions work in terms of 1-based positions, just like Racket\n`syntax-position` and Emacs buffer positions, it's annoying to keep\ntyping \u003ckbd\u003eC-x =\u003c/kbd\u003e to see the position at point while in the\nexample files. You might find it handy to add something like the\nfollowing to your Emacs `mode-line-position` definition:\n\n```elisp\n(:propertize (:eval (format \"%s\" (point)))\n             face (:slant italic))\n```\n\nAlso remember that \u003ckbd\u003eM-g c\u003c/kbd\u003e will let you jump to a position.\n\n---\n\nYou probably want to avoid, however, the `very-many-files-example`\nsubmodule -- unless you want to wait hours for very many files to be\nanalyzed:\n\n```racket\n  (require pdb)\n  (for ([d (in-list (list* (get-pkgs-dir 'installation)\n                           (get-pkgs-dir 'user)\n                           (current-library-collection-paths)))])\n    (when (directory-exists? d)\n      (time (add-directory d #:import-depth 32767))))\n  (require (submod pdb/store maintenance))\n  (displayln (db-stats))\n```\n\nOn my system -- with the non-minimal Racket distribution installed,\nand about a dozen other packages:\n\n```\n--------------------------------------------------------------------------\nAnalysis data for 8124 source files: 183.5 MiB.\n\n596394 nominal imports of 149866 exports: 3.2 MiB.\n7667 interned paths: 0.6 MiB.\n\nTotal: 187.2 MiB.\nDoes not include space for integer key columns or indexes.\n\n/home/greg/.racket/pdb/pdb-main.sqlite: 219.4 MiB.\nActual space on disk may be much larger due to deleted items: see VACUUM.\n-------------------------------------------------------------------------\n```\n\nAlso, if you use Emacs, you _could_ try the new `pdb` branch from the\n`racket-mode` repo. In this case you probably to change your\n`racket-mode-hook` to use `racket-pdb-mode` instead of\n`racket-xp-mode`. Be aware that sometimes you'll need to `git pull`\nfrom both this `pdb` repo as well as the `pdb` branch on the\n`racket-mode` repo -- in other words sometimes I'll make a breaking\nchange that requires you to pull from both repos. At this stage things\nare still evolving, sometimes drastically, so unfortunately it's not\nyet worth preserving backward compatibility.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreghendershott%2Fpdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreghendershott%2Fpdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreghendershott%2Fpdb/lists"}