{"id":25201344,"url":"https://github.com/haxscramper/hcparse","last_synced_at":"2025-05-12T13:21:00.137Z","repository":{"id":43104443,"uuid":"289942812","full_name":"haxscramper/hcparse","owner":"haxscramper","description":"High-level nim bindings for parsing C/C++ code","archived":false,"fork":false,"pushed_at":"2022-09-22T14:47:51.000Z","size":2328,"stargazers_count":37,"open_issues_count":23,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-31T22:33:03.323Z","etag":null,"topics":["cpp","libclang"],"latest_commit_sha":null,"homepage":"https://haxscramper.github.io/hcparse-doc/src/hcparse/libclang.html","language":"Nim","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/haxscramper.png","metadata":{"files":{"readme":"readme.org","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-24T13:57:19.000Z","updated_at":"2023-08-14T07:31:35.000Z","dependencies_parsed_at":"2023-01-18T14:30:26.495Z","dependency_job_id":null,"html_url":"https://github.com/haxscramper/hcparse","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haxscramper%2Fhcparse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haxscramper%2Fhcparse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haxscramper%2Fhcparse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haxscramper%2Fhcparse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/haxscramper","download_url":"https://codeload.github.com/haxscramper/hcparse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253745197,"owners_count":21957320,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","libclang"],"created_at":"2025-02-10T04:37:27.981Z","updated_at":"2025-05-12T13:21:00.115Z","avatar_url":"https://github.com/haxscramper.png","language":"Nim","readme":"Note: this project's development is temporarily paused due to my work on\nthe https://github.com/nim-works/nimskull project. In the future I will\ncome back to it, because I still think tooling like this is necessary,\nbut there might be a delay in development for quite some time.\n\n------\n\nNote: work in progress - features and descriptions are largely accurate,\nbut large chunks of intended functionality is yet to be implemented. To see\nthe current state of development process please see [[https://github.com/haxscramper/hcparse/projects/2][alpha version project]]\n\nThis project provides two types of wrapper generators -\n\n1. Command-line application for rough translation of the C and C++ code to\n   nim, including actual code translation (actual library implementation in\n   addition to top-level declarations). Based on simple translation using\n   [[https://github.com/tree-sitter/tree-sitter][tree-sitter]] for parsing and [[https://www.boost.org/doc/libs/1_76_0/libs/wave/doc/preface.html][boost wave]] for macro expansion.\n2. Fully automatic for handling extermely large libraries (like Qt), where\n   any sort of manual editing is completely infeasible. Based on libclang\n   and has full understanding of the code, but requires more sophisticated\n   setup.\n\nIn addition to predefined wrapping logic API for user-implemented tooling\nis provided.\n  - Supports generation of the ~.json~ files that contain all available\n    information about processed headers, which makes it possible to create\n    own wrapper generation tooling (using any programming language that can\n    parse ~.json~, so this not even a nim-only solution), or create\n    these files elsewhere.\n  # - [[https://github.com/haxscramper/hnimast][hnimast]] provides a macro for manually creating wrappers for a library.\n  #   It is placed in a separate package because hcparse itself is a\n  #   relatively heavy dependency (uses htsparse which contains a lot of\n  #   auto-generated code for C++ parsers). Type definitions for ~.json~\n  #   reprsentation are also placed in hnimast for that reason -\n  #   ~hnimast/interop/wrap_store~.\n\n\n\n** Tree-sitter \u0026 boost wave\n\nCommand-line tool to either generate wrappers for C(++) code, or do full\nconversion of the project into nim. Based on tree-sitter and boost wave, and\ndoes not require complicated configuration to work. Is focused on first 90%\nof the wrapper implementation - remaining parts can be tweaked manually\nwhen initial wrapper generation is done.\n\n** Libclang-based wrapper-generation\n\nLibclang-based wrapper is not a finished command-line application like\nc2nim or nimterop, but rather a /framework/ for implementing custom wrapper\nscripts. It can be used as one-off tool that you can tweak manually, but it\nis mainly designed to provide *fully automatic* wrapper generators for\ncases where it is not realistically possible to do it by hand. Re-wrap\nwhole Qt library on each patch release? Whole Posix API? That's what this\nproject tries to give you. Sophisiticated tool for tackling complex\nwrapping problems, with built-in support for documentation, nep-1 style\nguide and comprehensive collection of automatic code generation tools.\n\nIt is an open secret that C and C++ libraries lack consistent styling, code\npolicies and more. Sometimes exceptions are completely banned (or even\nsimply unaccessible as in C case), different naming styles. Heavy reliance\non the templates or OOP-style C++. All of that forces Nim wrapper authors\nto spend more time in order to provide higher-level interfaces that take\nadvantage of the rich Nim features (~distinct~ types, exceptions, side\neffect tracking and ~enums~).\n\nHcparse provides a framework for adressing this problems in automated way,\nusing user-provided or built-in tools, that allows you to\n\n- Convert 'out' arguments for C functions to nim ~tuple[]~ returns\n- Wrap 'raw' C procedures that return exit codes to raising ones\n- Declare callback-based override for C++ classes. No more need to inherit\n  from ~DelegatePainter~ just to override a single method - you can just set\n  a callback for it.\n- Naming fully compliant with nep-1 style guide. No more awkward\n  ~XMapRaised~ that can be confused with type name or ~unordered_set~\n- Declare overloads for all constructors, including aggregate\n  initialization and 'placement new', that makes it possible to reuse Nim\n  memory management for C++ objects.\n- Convert 'macro enum groups' into full Nim enums (~#define PAPI_OK 0~,\n  ~#define PAPI_EINVAL -1~)\n- Detect and solve import cycles caused by forward declarations and badly\n  structured header dependencies.\n- Support for default template parameters\n- Partial support with varying degree of control for complex C++ 'inner\n  typedefs'. Provide graceful fallback for some C++ templating features\n  that nim is unable to handle.\n- Extensive interoperability with [[https://github.com/haxscramper/haxdoc][haxdoc]] - adapt original documentation to\n  your wrappers. No longer user would have to dig through C++ docs in order\n  to make sense of what part of the wrapper they need.\n\n** Why have multiple different ways of wrapping libraries?\n\n# https://discord.com/channels/371759389889003530/371759389889003532/880807906335948840\n\nWhy is it necessary to have multiple different approaches to code wrapping?\nHaving single entry point would make it much easier for new users,\nsimplify documentation and explanation and so on.\n\nMain reason for providing two solutions is very simple - each has its own\ndownsides (for the end user), and it is not possible to create a tool where\nboth techniques are used, as they have a large number of mutually exclusive\nrequirements.\n\n- tree-sitter \u0026 boost wave ::\n  - advantages ::\n    - Does not require valid translation unit or even /valid code/ - it\n      [[https://tree-sitter.github.io/tree-sitter/#underlying-research][uses]] LR parser with built-in support for error recovery, which means\n      I'm able to provide the /best possible/ solution in case of malformed\n      code. This is important, because most of the C code you can find is\n      actually not *valid C*, it becomes valid after you use the preprocessor.\n      But with tree-sitter it is not required.\n    - Can override behavior of the preprocessor - ~include~ statements in\n      code might be ignored for initial processing, making it possible to\n      provide a 1:1 mapping of the original source file.\n    - Can provide /some/ level of automatic code enhancement - fixing\n      identifiers, providing enum wrappers etc.\n    - Can be used for syntax-directed translation. It is not possible to\n      automatically map C or C++ code to Nim /in general/, but automating\n      manual code conversion is still helpful. Of course generated code\n      requires a lot of manual correction (especially for cases that are\n      syntactically identical, but /semantically/ different), but it is\n      better than nothing.\n  - disadvantages ::\n    - Does not really understand C++ code. In cases like ~using namespace\n      std;~ followed by ~string getStr() {};~ there is no way to correctly\n      track /actually used types/ - doing so would require reimplementing\n      all of the C++ bookkeeping - ~using~ declarations, type aliases,\n      active namespaces and so on.\n  - extra ::\n    - Why not use clang preprocessor callbacks? TODO explain\n- libclang ::\n  - advantages ::\n    - Expands all macros itself, operate on stable AST, so no code\n      modification is needed *at all*. This is especially important for\n      large libraries, where manual modification is out of the question.\n    - Has full understanding of the C++ code -\n      ~getTypeDeclaration().getSemanticParent()~, all bookkeeping, namespace\n      tracking, type aliases and so on.\n    - Can provide more powerful automatic code enhacement features ehanced\n      with the type declaration knowledge.\n  - disadvantages ::\n    - Requires fully valid translation unit to work with - all includes\n      must be resolved, all defines must be specified. Much harder to use\n      in libraries that use non-standard build system (e.g. cmake that\n      executes codegen, merges together multiple files and compiles\n      everything at once)\n- manual, using macros ::\n  - advantanges ::\n    - Implementation controlled by the end user - no intermediate code\n      generation steps (even though they are not embedded in final\n      compilation process like nimterop does, it might be somewhat annoying\n      to deal with).\n    - Much simpler to provide convenience wrappers - no need to manage\n      multiple files or somehow annotate entries to differentiate between\n      generated and non-generated ones. You just write some DSL, and\n      immediately start adding convenience\n  - disadvantages ::\n    - As with any manual wrapping - for large libraries it is not really\n      possible.\n    - It is not possible to put documentation comments on some of the\n      generated types - macros does not have full access to the comment\n      fields.\n\nAs you can see, each approach has its own powerful sides, but it is\nfundamentally impossible to merge two of them, since they have completely\nopposite requirements - one does not understand C++ code, and *does not\nneed to*, while for second one it is absolutely mandatory. Manual wrapping\nwas added for the sake of completeness, since implementation reuses the\nsame IR.\n\n** Difference from existing projects and approaches\n\nNote: Main difference between other projects and hcparse is that they\n/already exist/, while hcparse is work-in-progress. For now, you can\nconsider this section as an answer to more practical question - \"why\nreimplement the already existing tooling?\" and \"how is it going to be\ndifferent from the existing tools?\"\n\n- [[https://github.com/nim-lang/c2nim][c2nim]]\n  - reimplements own C and C++ parser as well as preprocessor, resulting in\n    an extremely fragile tool that usually requires a lot of manual tweaking and\n    hacks.\n  - By default does not try to generate nep1-compliant wrappers, requires\n    passing ~--nep1~ flag (which is not really difficult to), but does not\n    track renames, simply squashing all identifiers into single style:\n    ~name~ and ~name_~ gets converted into ~name~.\n  - Requires converting ~#define~ to ~#def~ for used macros, which is,\n    again, pretty annoying to do manually.\n- [[https://github.com/nimterop/nimterop][nimterop]]\n  - Runs when code is [[https://github.com/nimterop/nimterop#wrapping][compiled]], which makes it hard to inspect the generated\n    headers. Having generated ~.nim~ wrapper files also have several\n    important advantages, including\n    - You have source code that you can put documentation on\n    - No implicit magic and intermediate compile-time actions between your\n      call to wrappers and actual library code.\n    - Because there exists a dumb wrapper file that can be viewed we can\n      get a lot more creative with actually mapping library code to nim.\n      Make all identifiers nep1-conformant, generate wrappers that turn\n      error codes into exceptions and so on (see list for libclang wrapper\n      generator)\n    - No need to have a wrapper generator as a dependency for your library,\n      which means I don't have to test whether the /generator/ works on all\n      possible systems, I just have to make sure wrappers make sense.\n  - Does not reimplement the C++ parser, and instead uses the tree-sitter (just\n    like hcparse), but invokes C compiler to do the macro expansion, which\n    merges all headers into a single file, and completely ignores any\n    ~#include~ declarations. Boost wave, on the other hand [[https://www.boost.org/doc/libs/1_76_0/libs/wave/doc/class_reference_ctxpolicy.html#opened_include_file][allows]] to\n    intercept include directives, which makes it possible to provide a more\n    compact wrappers that don't touch included parts from the external\n    libraries.\n- [[https://github.com/pmunch/futhark][futhark]]\n  - I haven't tried futhark yet, but at least it seems notably simpler\n    compared to nimterop, and it might be more than enough for someone\n    else.\n  - Uses the same approach for wrapper generation - everything is wrapped\n    when compiled. This is a major drawback (this appies to nimterop as\n    well) that does not allow to properly peform project-wide analysis when\n    needed.\n\nNOTE: the project is still considered work-in-progress, but all the\nfeatures mentioned above have already been implemented at least in\nproof-of-concept quality.\n\n** Using hcparse as a library or writing own code generation tools\n\nnote: this section describes unstable functionality that might potentially\nbe changed in the future.\n\n[[./it_works.jpg]]\n\nhcparse is built on top of several C and C++ code processing tools,\nspecifically ~boost::wave~, ~libclang~ and ~tree-sitter~ C++ parser.\nConvenience wrappers for all of these libraries are provided as a part of\nhcparse library - full wrapper for the libclang API, *C* API for large\nsection of the boost wave (not constrained to the C++ backed!).\n\nIn addition to the wrappers for lower-level C analysis tools ~hcparse~ also\nprovides parse for the doxygen XML format (to be able to automatically port\ndocumentation without losing important semantic information).\n\nInternal IR for the code is fully convertible to json (does not contain any\nlower-level details related to the libclang or tree-sitter processing), and\ncan theoretically be generated using other frontends. Code generation\nfacility can also be decoupled into separate tool that provides different\nfeatures, or even generates code for the different languages if needed\n(note that original implementation is fully focused on nim, and as of right\nnow there is no plans to make hcparse fully source *and* target-agnostic).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaxscramper%2Fhcparse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhaxscramper%2Fhcparse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaxscramper%2Fhcparse/lists"}