{"id":13643860,"url":"https://github.com/mity/md4c","last_synced_at":"2025-05-15T00:13:02.301Z","repository":{"id":11511542,"uuid":"69898189","full_name":"mity/md4c","owner":"mity","description":"C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.","archived":false,"fork":false,"pushed_at":"2024-08-09T22:26:38.000Z","size":1362,"stargazers_count":982,"open_issues_count":46,"forks_count":163,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-04-21T08:11:42.197Z","etag":null,"topics":["c","commonmark","markdown","markdown-parser","mit-license","parser"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mity.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-10-03T18:23:22.000Z","updated_at":"2025-04-21T05:17:47.000Z","dependencies_parsed_at":"2024-06-18T21:20:44.453Z","dependency_job_id":"472dbc9f-768a-4562-89c6-51bfc3ea50c1","html_url":"https://github.com/mity/md4c","commit_stats":{"total_commits":669,"total_committers":31,"mean_commits":"21.580645161290324","dds":0.06278026905829592,"last_synced_commit":"481fbfbdf72daab2912380d62bb5f2187d438408"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mity%2Fmd4c","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mity%2Fmd4c/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mity%2Fmd4c/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mity%2Fmd4c/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mity","download_url":"https://codeload.github.com/mity/md4c/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254249206,"owners_count":22039029,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","commonmark","markdown","markdown-parser","mit-license","parser"],"created_at":"2024-08-02T01:01:53.883Z","updated_at":"2025-05-15T00:12:57.283Z","avatar_url":"https://github.com/mity.png","language":"C","readme":"\n# MD4C Readme\n\n* Home: http://github.com/mity/md4c\n* Wiki: http://github.com/mity/md4c/wiki\n* Issue tracker: http://github.com/mity/md4c/issues\n\nMD4C stands for \"Markdown for C\" and that's exactly what this project is about.\n\n\n## What is Markdown\n\nIn short, Markdown is the markup language this `README.md` file is written in.\n\nThe following resources can explain more if you are unfamiliar with it:\n* [Wikipedia article](http://en.wikipedia.org/wiki/Markdown)\n* [CommonMark site](http://commonmark.org)\n\n\n## What is MD4C\n\nMD4C is Markdown parser implementation in C, with the following features:\n\n* **Compliance:** Generally, MD4C aims to be compliant to the latest version of\n  [CommonMark specification](http://spec.commonmark.org/). Currently, we are\n  fully compliant to CommonMark 0.31.\n\n* **Extensions:** MD4C supports some commonly requested and accepted extensions.\n  See below.\n\n* **Performance:** MD4C is [very fast](https://talk.commonmark.org/t/2520).\n\n* **Compactness:** MD4C parser is implemented in one source file and one header\n  file. There are no dependencies other than standard C library.\n\n* **Embedding:** MD4C parser is easy to reuse in other projects, its API is\n  very straightforward: There is actually just one function, `md_parse()`.\n\n* **Push model:** MD4C parses the complete document and calls few callback\n  functions provided by the application to inform it about a start/end of\n  every block, a start/end of every span, and with any textual contents.\n\n* **Portability:** MD4C builds and works on Windows and POSIX-compliant OSes.\n  (It should be simple to make it run also on most other platforms, at least as\n  long as the platform provides C standard library, including a heap memory\n  management.)\n\n* **Encoding:** MD4C by default expects UTF-8 encoding of the input document.\n  But it can be compiled to recognize ASCII-only control characters (i.e. to\n  disable all Unicode-specific code), or (on Windows) to expect UTF-16 (i.e.\n  what is on Windows commonly called just \"Unicode\"). See more details below.\n\n* **Permissive license:** MD4C is available under the [MIT license](LICENSE.md).\n\n\n## Using MD4C\n\n### Parsing Markdown\n\nIf you need just to parse a Markdown document, you need to include `md4c.h`\nand link against MD4C library (`-lmd4c`); or alternatively add `md4c.[hc]`\ndirectly to your code base as the parser is only implemented in the single C\nsource file.\n\nThe main provided function is `md_parse()`. It takes a text in the Markdown\nsyntax and a pointer to a structure which provides pointers to several callback\nfunctions.\n\nAs `md_parse()` processes the input, it calls the callbacks (when entering or\nleaving any Markdown block or span; and when outputting any textual content of\nthe document), allowing application to convert it into another format or render\nit onto the screen.\n\n\n### Converting to HTML\n\nIf you need to convert Markdown to HTML, include `md4c-html.h` and link against\nMD4C-HTML library (`-lmd4c-html`); or alternatively add the sources `md4c.[hc]`,\n`md4c-html.[hc]` and `entity.[hc]` into your code base.\n\nTo convert a Markdown input, call `md_html()` function. It takes the Markdown\ninput and calls the provided callback function. The callback is fed with\nchunks of the HTML output. Typical callback implementation just appends the\nchunks into a buffer or writes them to a file.\n\n\n## Markdown Extensions\n\nThe default behavior is to recognize only Markdown syntax defined by the\n[CommonMark specification](http://spec.commonmark.org/).\n\nHowever, with appropriate flags, the behavior can be tuned to enable some\nextensions:\n\n* With the flag `MD_FLAG_COLLAPSEWHITESPACE`, a non-trivial whitespace is\n  collapsed into a single space.\n\n* With the flag `MD_FLAG_TABLES`, GitHub-style tables are supported.\n\n* With the flag `MD_FLAG_TASKLISTS`, GitHub-style task lists are supported.\n\n* With the flag `MD_FLAG_STRIKETHROUGH`, strike-through spans are enabled\n  (text enclosed in tilde marks, e.g. `~foo bar~`).\n\n* With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS` permissive URL autolinks\n  (not enclosed in `\u003c` and `\u003e`) are supported.\n\n* With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, permissive e-mail\n  autolinks (not enclosed in `\u003c` and `\u003e`) are supported.\n\n* With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS` permissive WWW autolinks\n  without any scheme specified (e.g. `www.example.com`) are supported. MD4C\n  then assumes `http:` scheme.\n\n* With the flag `MD_FLAG_LATEXMATHSPANS` LaTeX math spans (`$...$`) and\n  LaTeX display math spans (`$$...$$`) are supported. (Note though that the\n  HTML renderer outputs them verbatim in a custom tag `\u003cx-equation\u003e`.)\n\n* With the flag `MD_FLAG_WIKILINKS`, wiki-style links (`[[link label]]` and\n  `[[target article|link label]]`) are supported. (Note that the HTML renderer\n  outputs them in a custom tag `\u003cx-wikilink\u003e`.)\n\n* With the flag `MD_FLAG_UNDERLINE`, underscore (`_`) denotes an underline\n  instead of an ordinary emphasis or strong emphasis.\n\nFew features of CommonMark (those some people see as mis-features) may be\ndisabled with the following flags:\n\n* With the flag `MD_FLAG_NOHTMLSPANS` or `MD_FLAG_NOHTMLBLOCKS`, raw inline\n  HTML or raw HTML blocks respectively are disabled.\n\n* With the flag `MD_FLAG_NOINDENTEDCODEBLOCKS`, indented code blocks are\n  disabled.\n\n\n## Input/Output Encoding\n\nThe CommonMark specification declares that any sequence of Unicode code points\nis a valid CommonMark document.\n\nBut, under a closer inspection, Unicode plays any role in few very specific\nsituations when parsing Markdown documents:\n\n1. For detection of word boundaries when processing emphasis and strong\n   emphasis, some classification of Unicode characters (whether it is\n   a whitespace or a punctuation) is needed.\n\n2. For (case-insensitive) matching of a link reference label with the\n   corresponding link reference definition, Unicode case folding is used.\n\n3. For translating HTML entities (e.g. `\u0026amp;`) and numeric character\n   references (e.g. `\u0026#35;` or `\u0026#xcab;`) into their Unicode equivalents.\n\n   However note MD4C leaves this translation on the renderer/application; as\n   the renderer is supposed to really know output encoding and whether it\n   really needs to perform this kind of translation. (For example, when the\n   renderer outputs HTML, it may leave the entities untranslated and defer the\n   work to a web browser.)\n\nMD4C relies on this property of the CommonMark and the implementation is, to\na large degree, encoding-agnostic. Most of MD4C code only assumes that the\nencoding of your choice is compatible with ASCII. I.e. that the codepoints\nbelow 128 have the same numeric values as ASCII.\n\nAny input MD4C does not understand is simply seen as part of the document text\nand sent to the renderer's callback functions unchanged.\n\nThe two situations (word boundary detection and link reference matching) where\nMD4C has to understand Unicode are handled as specified by the following\npreprocessor macros (as specified at the time MD4C is being built):\n\n* If preprocessor macro `MD4C_USE_UTF8` is defined, MD4C assumes UTF-8 for the\n  word boundary detection and for the case-insensitive matching of link labels.\n\n  When none of these macros is explicitly used, this is the default behavior.\n\n* On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C uses\n  `WCHAR` instead of `char` and assumes UTF-16 encoding in those situations.\n  (UTF-16 is what Windows developers usually call just \"Unicode\" and what\n  Win32API generally works with.)\n\n  Note that because this macro affects also the types in `md4c.h`, you have\n  to define the macro both when building MD4C as well as when including\n  `md4c.h`.\n\n  Also note this is only supported in the parser (`md4c.[hc]`). The HTML\n  renderer does not support this and you will have to write your own custom\n  renderer to use this feature.\n\n* If preprocessor macro `MD4C_USE_ASCII` is defined, MD4C assumes nothing but\n  an ASCII input.\n\n  That effectively means that non-ASCII whitespace or punctuation characters\n  won't be recognized as such and that link reference matching will work in\n  a case-insensitive way only for ASCII letters (`[a-zA-Z]`).\n\n\n## Documentation\n\nThe API of the parser is quite well documented in the comments in the `md4c.h`.\nSimilarly, the markdown-to-html API is described in its header `md4c-html.h`.\n\nThere is also [project wiki](http://github.com/mity/md4c/wiki) which provides\nsome more comprehensive documentation. However note it is incomplete and some\ndetails may be somewhat outdated.\n\n\n## FAQ\n\n**Q: How does MD4C compare to other Markdown parsers?**\n\n**A:** Some other implementations combine Markdown parser and HTML generator\ninto a single entangled code hidden behind an interface which just allows the\nconversion from Markdown to HTML. They are often unusable if you want to\nprocess the input in any other way.\n\nSecond, most parsers (if not all of them; at least within the scope of C/C++\nlanguage) are full DOM-like parsers: They construct abstract syntax tree (AST)\nrepresentation of the whole Markdown document. That takes time and it leads to\nbigger memory footprint.\n\nBuilding AST is completely fine as long as you need it. If you don't, there is\na very high chance that using MD4C will be substantially faster and less hungry\nin terms of memory consumption.\n\nLast but not least, some Markdown parsers are implemented in a naive way. When\nfed with a [smartly crafted input pattern](test/pathological_tests.py), they\nmay exhibit quadratic (or even worse) parsing times. What MD4C can still parse\nin a fraction of second may turn into long minutes or possibly hours with them.\nHence, when such a naive parser is used to process an input from an untrusted\nsource, the possibility of denial-of-service attacks becomes a real danger.\n\nA lot of our effort went into providing linear parsing times no matter what\nkind of crazy input MD4C parser is fed with. (If you encounter an input pattern\nwhich leads to a sub-linear parsing times, please do not hesitate and report it\nas a bug.)\n\n**Q: Does MD4C perform any input validation?**\n\n**A:** No. And we are proud of it. :-)\n\nCommonMark specification states that any sequence of Unicode characters is\na valid Markdown document. (In practice, this more or less always means UTF-8\nencoding.)\n\nIn other words, according to the specification, it does not matter whether some\nMarkdown syntax construction is in some way broken or not. If it's broken, it\nwon't be recognized and the parser should see it just as a verbatim text.\n\nMD4C takes this a step further: It sees any sequence of bytes as a valid input,\nfollowing completely the GIGO philosophy (garbage in, garbage out). I.e. any\nill-formed UTF-8 byte sequence will propagate to the respective callback as\na part of the text.\n\nIf you need to validate that the input is, say, a well-formed UTF-8 document,\nyou have to do it on your own. The easiest way how to do this is to simply\nvalidate the whole document before passing it to the MD4C parser.\n\n\n## License\n\nMD4C is covered with MIT license, see the file `LICENSE.md`.\n\n\n## Links to Related Projects\n\nPorts and bindings to other languages:\n\n* [commonmark-d](https://github.com/AuburnSounds/commonmark-d):\n  Port of MD4C to D language.\n\n* [markdown-wasm](https://github.com/rsms/markdown-wasm):\n  Port of MD4C to WebAssembly.\n\n* [PyMD4C](https://github.com/dominickpastore/pymd4c):\n  Python bindings for MD4C\n\nSoftware using MD4C:\n\n* [imgui_md](https://github.com/mekhontsev/imgui_md):\n  Markdown renderer for [Dear ImGui](https://github.com/ocornut/imgui)\n\n* [MarkDown Monolith Assembler](https://github.com/1Hyena/mdma):\n  A command line tool for building browser-based books.\n\n* [QOwnNotes](https://www.qownnotes.org/):\n  A plain-text file notepad and todo-list manager with markdown support and\n  ownCloud / Nextcloud integration.\n\n* [Qt](https://www.qt.io/):\n  Cross-platform C++ GUI framework.\n\n* [Textosaurus](https://github.com/martinrotter/textosaurus):\n  Cross-platform text editor based on Qt and Scintilla.\n\n* [8th](https://8th-dev.com/):\n  Cross-platform concatenative programming language.\n","funding_links":[],"categories":["C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmity%2Fmd4c","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmity%2Fmd4c","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmity%2Fmd4c/lists"}