{"id":13593639,"url":"https://github.com/parrt/bookish","last_synced_at":"2025-04-06T01:09:09.061Z","repository":{"id":28868068,"uuid":"118663296","full_name":"parrt/bookish","owner":"parrt","description":"A tool that translates augmented markdown into HTML or latex","archived":false,"fork":false,"pushed_at":"2022-06-19T17:26:26.000Z","size":1722,"stargazers_count":464,"open_issues_count":1,"forks_count":31,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-03-30T00:07:40.202Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/parrt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-01-23T20:09:41.000Z","updated_at":"2025-03-29T17:18:25.000Z","dependencies_parsed_at":"2022-07-24T20:32:26.445Z","dependency_job_id":null,"html_url":"https://github.com/parrt/bookish","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/parrt%2Fbookish","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/parrt%2Fbookish/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/parrt%2Fbookish/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/parrt%2Fbookish/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/parrt","download_url":"https://codeload.github.com/parrt/bookish/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247419860,"owners_count":20936012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:22.539Z","updated_at":"2025-04-06T01:09:09.032Z","avatar_url":"https://github.com/parrt.png","language":"Java","readme":"# Bookish\n\nBookish is an xml-ish + some markdown format for books and articles that it can convert to HTML and latex. I used it to generate this article: [The Matrix Calculus You Need For Deep Learning](https://explained.ai/matrix-calculus/index.html). \n\nYou can use python directly in the doc like a notebook to compute and print stuff:\n\n\u003ctable\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cimg src=images/snapshot-print.png width=600\u003e\n\u003c/table\u003e\n\nand display data frames:\n\n\u003ctable\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cimg src=images/snapshot-df.png width=400\u003e\n\u003c/table\u003e\n\nand even show matplotlib graphs:\n\n\u003ctable\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cimg  src=images/snapshot-graph.png width=600\u003e\n\u003c/table\u003e\n\nAs see below, it also does some really fancy magic to convert full latex equations (or even latex chunks) to SVG images for display inline (tricky to get vertical alignment correct.)\n\n\u003ctable\u003e\n\u003ctr\u003e\u003ctd\u003e\n\u003cimg src=images/snapshot.png width=700\u003e\n\u003c/table\u003e\n\n## Meta-language\n\nBookish is mostly XML-like but uses markdown for the more common things like italics and code fonts. (Note that the xml tags do not always have an end tag or even the trailing `/`' as in `\u003c.../\u003e` .)\n\nBookish requires a root document that is kind of like a metadata file:\n\n```xml\n\u003cbook title=\"A simple book\" author=\"T. Parr\"\u003e\n\n\u003cinclude file=chap1.xml\u003e\n\u003cinclude file=chap2.xml\u003e\n```\n\nThen the chapter files look like:\n\n```\n\u003cchapter label=\"intro\" title=\"An intro\"\u003e\n\nSome text *foo* and `this` is code. Ref [summary], which is forward ref in\nanother file. Links are [cnn](http://www.cnn.com).\n```\n\n## Cheatsheet\n\nHere are the tags that contain attributes, not all of which are required:\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;book label=\"...\" author=\"...\" title=\"...\" version=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;chapter label=\"...\" author=\"...\" title=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;data dir=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;notebook-support file=\"...\"\u003e\u003c/tt\u003e ;\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;include file=\"...\"\u003e\u003c/tt\u003e \n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;section label=\"...\" title=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;subsection label=\"...\" title=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;subsubsection label=\"...\" title=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;site label=\"...\" url=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;citation  label=\"...\" title=\"...\" author=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;chapquote quote=\"...\" author=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;sidequote label=\"...\" quote=\"...\" author=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;sidenote label=\"...\"\u003e ... \u0026lt;/sidenote\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;sidefig label=\"...\" caption=\"...\"\u003e ... \u0026lt;/sidefig\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;figure label=\"...\" caption=\"...\"\u003e ... \u0026lt;/figure\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;aside title=\"...\"\u003e ... \u003c/aside\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;pyeval label=\"...\" output=\"...\" hide=\"...\"\u003e ... \u0026lt;/pyeval\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;pyfig label=\"...\" side=\"...\" hide=\"...\"\u003e ... \u0026lt;/pyfig\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;py label=\"...\"\u003e ... \u003c/py\u003e\u003c/tt\u003e or \u003ctt\u003e\u0026lt;py\u003e...\u0026lt;/py\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;th width=\"...\"\u003e\u003c/tt\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e\u003ctt\u003e\u0026lt;img src=\"...\" width=\"...\"\u003e\u003c/tt\u003e\n\u003c/table\u003e\n\n## Origins in math-infested markup\n\n[Jeremy Howard](http://www.fast.ai/about/#jeremy) and [I](http://parrt.cs.usfca.edu/) wrote up a nice mathy latex document called ``*The Matrix Calculus You Need For Deep Learning*'' that has over 600 equations. We wanted to post it to the web in HTML or markdown but quickly ran into a problem trying to get equations rendered.\n\nIn the end we converted the source document to markdown and build a translator that generated [HTML](https://explained.ai/matrix-calculus/index.html) using SVG for equations and [PDF](https://arxiv.org/pdf/1802.01528) from native latex equations. It does a pretty good job with html as you can see:\n\n\u003ctable\u003e\n\u003ctr\u003e\u003ctd\u003e\n\u003cimg src=images/snapshot.png width=700\u003e\n\u003c/table\u003e\n\nAll of those equations, even the ones inline in the text paragraph, are `\u003cimg\u003e` references.\n\nHere is the [raw matrix-calculus.md](https://raw.githubusercontent.com/parrt/bookish/master/examples/matrix-calculus/matrix-calculus.md) that `bookish` processed to generate those documents.\n\n### What's so hard about rendering equations?\n\nIf you're doing markdown or HTML, people tend to use MathJax or its faster cousin Katex. MathJax is just too slow when you have 600 equations. Katex is much better but it (and MathJax) requires every `\u0026`, `_`, etc... be escaped as `\\\u0026`, `\\_` to avoid getting processed as markdown.  That's no problem because I built a translator that escaped everything for me. Then I found out that the JavaScript parser that extracted the latex equation strings was extremely finicky. I had to randomly insert spaces in my equations trying to get them recognized as equations.\n\nThere's another problem.  Is all of that JavaScript gonna work in epub formats? What about the Kindle? Because I'm hoping to write a book on machine learning, I'm leary of relying on full-blown JavaScript to render equations.\n\nI tried pandoc and a few other tools like multimarkdown but not everything came through correctly to the translated output and I got tired of chasing all of this down.\n\nAs the [ANTLR](http://www.antlr.org) guy, I ain't afeared of building a language translator and so, following my motto ``*Why program by hand in five days what you can spend five years of your life automating*'', I decided to simply solve this problem by building my own markdown translator.\n\n### How to typeset and display math via SVG\n\nIf you can't use JavaScript, you have to use images. If you have to use images, you want scalable graphics, which means SVG files. So, the translator must extract equations and replace them with `\u003cimg\u003e` tags referencing SVG files. That part is not too hard; take a look at [Tex2SVG](https://github.com/parrt/bookish/blob/master/src/us/parr/bookish/semantics/Tex2SVG.java) and you'll see that I'm just running three programs in sequence to process the equation into an SVG file: `xelatex` then `pdfcrop` then `pdf2svg`.\n\nThe really tricky bit is the vertical alignment of equations within a line of HTML text. Check out this sentence with embedded equations:\n\n\u003ctable\u003e\n\u003ctr\u003e\u003ctd\u003e\n\u003cimg src=\"images/snapshot2.png\" width=560\u003e\n\u003c/table\u003e\n\n(I had to take a snapshot and show that instead of giving raw HTML plus equations; github's markdown processor didn't handle it properly. haha.)\n\nWhat does it mean to properly align an equation's image? It's painful.  We need to convince latex to give us metrics on how far the typeset image drops below the baseline. (Latex calls this the *depth*.)  It took a while, but I figured out how to not only compute the depth below baseline but also how to get it back into this Java program via the latex log file. You can see how all of this is done here: [Translator.visitEqn()](https://github.com/parrt/bookish/blob/master/src/us/parr/bookish/translate/Translator.java#L302). Here is the latex incantation to extract height and depth of the rendered equation:\n\n```tex\n\\begin{document}\n\\thispagestyle{empty}\n\u003cbody\u003e\n\\setbox0=\\vbox{\u003cbody\u003e}\n\\typeout{// bookish metrics: \\the\\ht0, \\the\\dp0}\n\\end{document}\n```\n\nwhere `\u003cbody\u003e` is the hole where the equation goes.\n\nOh, and to get the font to look less anemic, you need to set the math fonts:\n\n```tex\n\\DeclareSymbolFont{operators}   {OT1}{ztmcm}{m}{n}\n\\DeclareSymbolFont{letters}     {OML}{ztmcm}{m}{it}\n\\DeclareSymbolFont{symbols}     {OMS}{ztmcm}{m}{n}\n\\DeclareSymbolFont{largesymbols}{OMX}{ztmcm}{m}{n}\n\\DeclareSymbolFont{bold}        {OT1}{ptm}{bx}{n}\n\\DeclareSymbolFont{italic}      {OT1}{ptm}{m}{it}\n```\n\nOne last little tidbit. Image file names are based upon the MD5 digest hash of the equation. There are two benefits: (1) repeated equations share the same file and (2) latex is slow, like 1 second per equation, but the hashed filename lets us cache all of the images and know when we must refresh an image because the equation changed.  \n\nIt's safe to stop reading here.  You can learn everything you need to know about doing this yourself from this description and the source code.  This repository is just getting started and is in progress so don't expect a tool you can use yourself, at least at the moment.\n\n## Implementation\n\nYou will also notice that I have built this program as if it were a programming language translator.  The strategy I use is to construct a model of the document from the parse tree using a visitor. Then I use a [fiendishly clever bit of code](https://github.com/parrt/bookish/blob/master/src/us/parr/bookish/translate/ModelConverter.java) to automatically convert that representation of the document into a tree of [string templates](http://www.stringtemplate.org).  Of course the set of templates you use determines what output you get.  Change the templates and you change the target language. For example here are the [HTML templates](https://github.com/parrt/bookish/blob/master/resources/templates/html-book.stg).\n","funding_links":[],"categories":["Java"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparrt%2Fbookish","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fparrt%2Fbookish","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparrt%2Fbookish/lists"}