{"id":19437796,"url":"https://github.com/wenkokke/unlit","last_synced_at":"2025-02-25T07:15:39.530Z","repository":{"id":20819417,"uuid":"24105176","full_name":"wenkokke/unlit","owner":"wenkokke","description":"Tool to convert literate code between styles or to code.","archived":false,"fork":false,"pushed_at":"2019-06-04T16:55:31.000Z","size":9367,"stargazers_count":11,"open_issues_count":4,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-07T21:25:00.908Z","etag":null,"topics":["literate-programming","literate-programs","tool"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wenkokke.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-09-16T15:23:06.000Z","updated_at":"2024-09-17T12:46:48.000Z","dependencies_parsed_at":"2022-07-12T15:17:55.211Z","dependency_job_id":null,"html_url":"https://github.com/wenkokke/unlit","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenkokke%2Funlit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenkokke%2Funlit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenkokke%2Funlit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenkokke%2Funlit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wenkokke","download_url":"https://codeload.github.com/wenkokke/unlit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240619444,"owners_count":19830206,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["literate-programming","literate-programs","tool"],"created_at":"2024-11-10T15:15:55.115Z","updated_at":"2025-02-25T07:15:39.364Z","avatar_url":"https://github.com/wenkokke.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/wenkokke/unlit.png?branch=master)](https://travis-ci.org/wenkokke/unlit)\n``` haskell\n{-# LANGUAGE OverloadedStrings #-}\nmodule Unlit.Text (\n  unlit, relit\n  , Style, parseStyle\n  , WhitespaceMode(..), parseWhitespaceMode\n  , all, infer, latex, bird, jekyll,  haskell, markdown, tildefence, backtickfence\n  , Lang, setLang\n  , Error(..), showError\n) where\n```\n``` haskell\nimport Data.Functor ((\u003c$\u003e))\nimport Data.Foldable (asum)\nimport Data.Bool (bool)\nimport Data.Maybe (fromMaybe, maybeToList)\nimport Data.Monoid ((\u003c\u003e))\nimport Prelude hiding (all, or, String, unlines, lines, drop)\nimport Data.Text (Text, stripStart, stripEnd, isPrefixOf, isSuffixOf, isInfixOf, unlines, lines, pack, drop, toLower)\n```\nWhat are literate programs?\n===========================\n\nThere are several styles of literate programming. Most commonly,\nthese are LaTeX-style code tags, Bird tags and Markdown fenced code\nblocks.\n\n``` haskell\ndata Delimiter\n  = LaTeX    BeginEnd\n  | OrgMode  BeginEnd Lang\n  | Bird\n  | Jekyll   BeginEnd Lang\n  | Markdown Fence Lang\n  | Asciidoc BeginEnd Lang\n  deriving (Eq, Show)\n```\nSome of these code blocks need to carry around additional information.\nFor instance, LaTex code blocks use distinct opening and closing tags.\n\n``` haskell\ndata BeginEnd\n  = Begin\n  | End\n  deriving (Eq, Show)\n```\n``` haskell\nisBegin :: Delimiter -\u003e Bool\nisBegin (LaTeX    Begin  ) = True\nisBegin (OrgMode  Begin _) = True\nisBegin (Jekyll   Begin _) = True\nisBegin (Asciidoc Begin _) = True\nisBegin (Markdown _ _)     = True\nisBegin  _                 = False\n```\n``` haskell\nsetBegin :: BeginEnd -\u003e Delimiter -\u003e Delimiter\nsetBegin beginEnd (LaTeX    _  )    = LaTeX    beginEnd\nsetBegin beginEnd (OrgMode  _ lang) = OrgMode  beginEnd lang\nsetBegin beginEnd (Jekyll   _ lang) = Jekyll   beginEnd lang\nsetBegin beginEnd (Asciidoc _ lang) = Asciidoc beginEnd lang\nsetBegin _         del              = del\n```\nOn the other hand, Markdown-style fences occur in two different variants.\n\n``` haskell\ndata Fence\n  = Tilde\n  | Backtick\n  deriving (Eq, Show)\n```\nFurthermore they may be annotated with all sorts of information. Most prominently,\ntheir programming language.\n\n``` haskell\ntype Lang = Maybe Text\n```\n``` haskell\ncontainsLang :: Text -\u003e Lang -\u003e Bool\ncontainsLang _ Nothing     = True\ncontainsLang l (Just lang) = toLower lang `isInfixOf` toLower l\n```\nIn order to emit these code blocks, we will define the\nfollowing function.\n\n``` haskell\nemitDelimiter :: Delimiter -\u003e Text\nemitDelimiter (LaTeX Begin)         = \"\\\\begin{code}\"\nemitDelimiter (LaTeX End)           = \"\\\\end{code}\"\nemitDelimiter (OrgMode Begin l)     = \"#+BEGIN_SRC\" \u003c+\u003e fromMaybe \"\" l\nemitDelimiter (OrgMode End _)       = \"#+END_SRC\"\nemitDelimiter  Bird                 = \"\u003e\"\nemitDelimiter (Jekyll Begin l)      = \"{% highlight\" \u003c+\u003e fromMaybe \"\" l \u003c+\u003e \"%}\"\nemitDelimiter (Jekyll End   _)      = \"{% endhighlight %}\"\nemitDelimiter (Asciidoc Begin l)    = \"[source\" \u003c\u003e maybe \"\" (\", \"\u003c\u003e) l \u003c\u003e \"]\\n----\"\nemitDelimiter (Asciidoc End   _)    = \"----\"\nemitDelimiter (Markdown Tilde l)    = \"~~~\" \u003c+\u003e fromMaybe \"\" l\nemitDelimiter (Markdown Backtick l) = \"```\" \u003c+\u003e fromMaybe \"\" l\n```\n``` haskell\ninfixr 5 \u003c+\u003e\n(\u003c+\u003e) :: Text -\u003e Text -\u003e Text\n\"\" \u003c+\u003e y  = y\nx  \u003c+\u003e \"\" = x\nx  \u003c+\u003e y  = x \u003c\u003e \" \" \u003c\u003e y\n```\nFurthermore, we need a set of functions which is able to recognise\nthese code blocks.\n\n``` haskell\ntype Recogniser = Text -\u003e Maybe Delimiter\n```\nFor instance, in LaTeX-style, a codeblock is delimited by\n`\\begin{code}` and `\\end{code}` tags, which must appear at the first\nposition (since we do not support indented code blocks).\n\n``` haskell\nisLaTeX :: Recogniser\nisLaTeX l\n  | \"\\\\begin{code}\" `isPrefixOf` stripStart l = Just $ LaTeX Begin\n  | \"\\\\end{code}\"   `isPrefixOf` stripStart l = Just $ LaTeX End\n  | otherwise = Nothing\n```\n``` haskell\nisOrgMode :: Lang -\u003e Recogniser\nisOrgMode lang l\n  | \"#+BEGIN_SRC\" `isPrefixOf` stripStart l\n    \u0026\u0026 l `containsLang` lang                = Just $ OrgMode Begin lang\n  | \"#+END_SRC\"   `isPrefixOf` stripStart l = Just $ OrgMode End Nothing\n  | otherwise = Nothing\n```\nIn Bird-style, every line in a codeblock must start with a Bird tag.\nA tagged line is defined as *either* a line containing solely the\nsymbol '\u003e', or a line starting with the symbol '\u003e' followed by at\nleast one space.\n\n``` haskell\nisBird :: Recogniser\nisBird l = bool Nothing (Just Bird) (l == \"\u003e\" || \"\u003e \" `isPrefixOf` l)\n```\nDue to this definition, whenever we strip a bird tag, in normal\nwhitespace modes we also remove the first space following it.\n\n``` haskell\nstripBird :: Text -\u003e Text\nstripBird = stripBird' WsKeepIndent\n```\n``` haskell\nstripBird' :: WhitespaceMode -\u003e Text -\u003e Text\nstripBird' WsKeepAll    l = \" \" \u003c\u003e drop 1 l\nstripBird' WsKeepIndent l = drop 2 l\n```\nThen we have Jekyll Liquid code blocks.\n\n``` haskell\nisJekyll :: Lang -\u003e Recogniser\nisJekyll lang l\n  | \"{% highlight\" `isPrefixOf` stripStart l\n    \u0026\u0026 l `containsLang` lang\n    \u0026\u0026 \"%}\" `isSuffixOf` stripEnd l     = Just $ Jekyll Begin lang\n  | \"{% endhighlight %}\" `isPrefixOf` l = Just $ Jekyll End   lang\n  | otherwise                           = Nothing\n```\nMarkdown fenced codeblocks have as a peculiarity that they\ncan be defined to only match on fences for a certain language.\n\nBelow we only check if the given language occurs *anywhere* in the\nstring; we don't bother parsing the entire line to see if it's\nwell-formed Markdown.\n\n``` haskell\nisMarkdown :: Fence -\u003e Text -\u003e Lang -\u003e Recogniser\nisMarkdown fence fenceStr lang l\n  | fenceStr `isPrefixOf` stripStart l =\n    Just $ Markdown fence $ bool Nothing lang (l `containsLang` lang)\n  | otherwise = Nothing\n```\nThe Asciidoc fence in the beginning takes two lines, `[source,lang]` and `----`.\nHere we just check for the source line. The second line will be consumed by asciidocBlock.\n\n``` haskell\nisAsciidoc :: Lang -\u003e Recogniser\nisAsciidoc lang l\n  | \"[source\" `isPrefixOf` l\n    \u0026\u0026 l `containsLang` lang\n    \u0026\u0026 \"]\" `isSuffixOf` stripEnd l = Just $ Asciidoc Begin lang\n  | \"----\" `isPrefixOf` l          = Just $ Asciidoc End   lang\n  | otherwise                      = Nothing\n```\n``` haskell\nasciidocFence :: [(Int,Text)] -\u003e Maybe [(Int,Text)]\nasciidocFence ls | ((_,\"----\"):ls') \u003c- ls = Just ls'\n                 | otherwise              = Nothing\n```\nIn general, we will also need a function that checks, for a given\nline, whether it conforms to *any* of a set of given styles.\n\n``` haskell\nisDelimiter :: Style -\u003e Recogniser\nisDelimiter ds l = asum (map go ds)\n  where\n    go (LaTeX _)                = isLaTeX l\n    go  Bird                    = isBird l\n    go (Jekyll _ lang)          = isJekyll lang l\n    go (Markdown Tilde lang)    = isMarkdown Tilde \"~~~\" lang l\n    go (Markdown Backtick lang) = isMarkdown Backtick \"```\" lang l\n    go (OrgMode _ lang)         = isOrgMode lang l\n    go (Asciidoc _ lang)        = isAsciidoc lang l\n```\nAnd, for the styles which use opening and closing brackets, we will\nneed a function that checks if these pairs match.\n\n``` haskell\nmatch :: Delimiter -\u003e Delimiter -\u003e Bool\nmatch (LaTeX Begin)      (LaTeX End)          = True\nmatch (Jekyll Begin _)   (Jekyll End _)       = True\nmatch (OrgMode Begin _)  (OrgMode End _)      = True\nmatch (Asciidoc Begin _) (Asciidoc End _)     = True\nmatch (Markdown f _)     (Markdown g Nothing) = f == g\nmatch  _                  _                   = False\n```\nNote that Bird-tags are notably absent from the `match` function, as\nthey are a special case.\n\nWhat do we want `unlit` to do?\n==============================\n\nThe `unlit` program that we will implement below will do the following:\nit will read a literate program from the standard input—allowing one\nor more styles of code block—and emit only the code to the standard\noutput.\n\nThe options for source styles are as follows:\n\n``` haskell\ntype Style = [Delimiter]\n```\n``` haskell\nall, backtickfence, tildefence, bird, haskell, infer, jekyll, latex, markdown, orgmode, asciidoc :: Style\nall           = latex \u003c\u003e markdown \u003c\u003e orgmode \u003c\u003e jekyll \u003c\u003e asciidoc\nbacktickfence = [Markdown Backtick Nothing]\ntildefence    = [Markdown Tilde Nothing]\nbird          = [Bird]\nhaskell       = latex \u003c\u003e bird\ninfer         = []\njekyll        = [Jekyll Begin Nothing, Jekyll End Nothing]\nlatex         = [LaTeX Begin, LaTeX End]\nmarkdown      = bird \u003c\u003e tildefence \u003c\u003e backtickfence\norgmode       = [OrgMode Begin Nothing, OrgMode End Nothing]\nasciidoc      = [Asciidoc Begin Nothing, Asciidoc End Nothing]\n```\n``` haskell\nparseStyle :: Text -\u003e Maybe Style\nparseStyle s = case toLower s of\n  \"all\"           -\u003e Just all\n  \"backtickfence\" -\u003e Just backtickfence\n  \"bird\"          -\u003e Just bird\n  \"haskell\"       -\u003e Just haskell\n  \"infer\"         -\u003e Just infer\n  \"jekyll\"        -\u003e Just jekyll\n  \"latex\"         -\u003e Just latex\n  \"markdown\"      -\u003e Just markdown\n  \"orgmode\"       -\u003e Just orgmode\n  \"asciidoc\"      -\u003e Just asciidoc\n  \"tildefence\"    -\u003e Just tildefence\n  _               -\u003e Nothing\n```\nIt is possible to set the language of the source styles using the following function.\n\n``` haskell\nsetLang :: Lang -\u003e Style -\u003e Style\nsetLang = fmap . setLang'\n```\n``` haskell\nsetLang' :: Lang -\u003e Delimiter -\u003e Delimiter\nsetLang' lang (Markdown fence _)   = Markdown fence lang\nsetLang' lang (OrgMode beginEnd _) = OrgMode beginEnd lang\nsetLang' lang (Jekyll beginEnd _)  = Jekyll beginEnd lang\nsetLang' _     d                   = d\n```\nAdditionally, when the source style is empty, the program will\nattempt to guess the style based on the first delimiter it\nencounters. It will try to be permissive in this, and therefore, if\nit encounters a Bird-tag, will infer general Markdown-style.\n\n``` haskell\ninferred :: Maybe Delimiter -\u003e Style\ninferred  Nothing              = []\ninferred (Just (LaTeX _))      = latex\ninferred (Just (Jekyll _ _))   = jekyll\ninferred (Just (OrgMode _ _))  = orgmode\ninferred (Just (Asciidoc _ _)) = asciidoc\ninferred (Just _)              = markdown\n```\nLastly, we would like `unlit` to be able to operate in several\ndifferent whitespace modes. For now, these are:\n\n``` haskell\ndata WhitespaceMode\n  = WsKeepIndent -- ^ keeps only indentations\n  | WsKeepAll    -- ^ keeps all lines and whitespace\n```\n``` haskell\nparseWhitespaceMode :: Text -\u003e Maybe WhitespaceMode\nparseWhitespaceMode s = case toLower s of\n  \"all\"    -\u003e Just WsKeepAll\n  \"indent\" -\u003e Just WsKeepIndent\n  _        -\u003e Nothing\n```\nWe would like to combine the inferred style with current styles as\none would combine maybe values using the alternative operator\n`(\u003c|\u003e)`. Therefore, we will define our own version of this operator.\n\n``` haskell\nor :: [a] -\u003e [a] -\u003e [a]\nxs `or` [] = xs\n[] `or` ys = ys\nxs `or` _  = xs\n```\nThus, the `unlit` function will have two parameters: its source style\nand the text to convert.\n\n``` haskell\nunlit :: WhitespaceMode -\u003e Style -\u003e Text -\u003e Either Error Text\nunlit ws ss = fmap unlines . unlit' ws ss Nothing . zip [1..] . lines\n```\nHowever, the helper function `unlit'` is best thought of as a finite\nstate automaton, where the states are used to remember the what kind\nof code block (if any) the automaton currently is in.\n\n``` haskell\ntype State = Maybe Delimiter\n```\nWith this, the signature of `unlit'` becomes:\n\n``` haskell\nunlit' :: WhitespaceMode -\u003e Style -\u003e State -\u003e [(Int, Text)] -\u003e Either Error [Text]\nunlit' _ _  Nothing    []  = Right []\nunlit' _ _ (Just Bird) []  = Right []\nunlit' _ _ (Just o)    []  = Left $ UnexpectedEnd o\nunlit' ws ss q ((n, l):ls) = case (q, q') of\n\n  (Nothing  , Nothing)   -\u003e continue  $ lineIfKeepAll\n\n  (Just Bird, Nothing)   -\u003e close     $ lineIfKeepAll\n  (Just _o  , Nothing)   -\u003e continue  $ [l]\n\n  (Nothing  , Just Bird) -\u003e open      $ lineIfKeepIndent \u003c\u003e [stripBird' ws l]\n  (Nothing  , Just (Asciidoc Begin _))\n    | Just ls' \u003c- asciidocFence ls\n                         -\u003e open' ls' $ lineIfKeepAll \u003c\u003e lineIfKeepIndent\n  (Nothing  , Just c)\n    | isBegin c          -\u003e open      $ lineIfKeepAll \u003c\u003e lineIfKeepIndent\n    | otherwise          -\u003e Left      $ SpuriousDelimiter n c\n\n  (Just Bird, Just Bird) -\u003e continue  $ [stripBird' ws l]\n  (Just _o  , Just Bird) -\u003e continue  $ [l]\n  (Just o   , Just c)\n    | o `match` c        -\u003e close     $ lineIfKeepAll\n    | otherwise          -\u003e Left      $ SpuriousDelimiter n c\n\n  where\n    q'                    = isDelimiter (ss `or` all) l\n    continueWith r ls' l' = (l' \u003c\u003e) \u003c$\u003e unlit' ws (ss `or` inferred q') r ls'\n    open' ls'             = continueWith q' ls'\n    open                  = open' ls\n    continue              = continueWith q ls\n    close                 = continueWith Nothing ls\n    lineIfKeepAll         = case ws of WsKeepAll    -\u003e [\"\"]; WsKeepIndent -\u003e []\n    lineIfKeepIndent      = case ws of WsKeepIndent -\u003e [\"\"]; WsKeepAll -\u003e []\n```\nWhat do we want `relit` to do?\n==============================\n\nSadly, no, `relit` won't be able to take source code and\nautomatically convert it to literate code. I'm not quite up to the\nchallenge of automatically generating meaningful documentation from\narbitrary code... I wish I was.\n\nWhat `relit` will do is read a literate file using one style of\ndelimiters and emit the same file using an other style of delimiters.\n\n``` haskell\nrelit :: Style -\u003e Delimiter -\u003e Text -\u003e Either Error Text\nrelit ss ts = fmap unlines . relit' ss ts Nothing . zip [1..] . lines\n```\nAgain, we will interpret the helper function `relit'` as an\nautomaton, which remembers the current state. However, we now also\nneed a function which can emit code blocks in a certain style. For\nthis purpose we will define a few functions.\n\nTODO: Currently, if a delimiter is indented, running `relit` will remove this\n      indentation. This is obviously an error, however changing it would require\n      adding indentation information to all delimiters.\n\n``` haskell\nemitBird :: Text -\u003e Text\nemitBird l | stripStart l == \"\" = \"\u003e\"\n           | otherwise          = \"\u003e \" \u003c\u003e l\n```\n``` haskell\nemitOpen :: Delimiter -\u003e Maybe Text -\u003e [Text]\nemitOpen  Bird l = fmap emitBird (maybeToList l)\nemitOpen  del  l = emitDelimiter (setBegin Begin del) : maybeToList l\n```\n``` haskell\nemitCode :: Delimiter -\u003e Text -\u003e Text\nemitCode Bird l = emitBird l\nemitCode _    l = l\n```\n``` haskell\nemitClose :: Delimiter -\u003e Maybe Text -\u003e [Text]\nemitClose  Bird l = maybeToList l\nemitClose  del  l = emitDelimiter (setBegin End $ setLang' Nothing del) : maybeToList l\n```\nUsing these simple functions we can easily define the `relit'`\nfunction.\n\n``` haskell\nrelit' :: Style -\u003e Delimiter -\u003e State -\u003e [(Int, Text)] -\u003e Either Error [Text]\nrelit' _ _   Nothing    [] = Right []\nrelit' _ ts (Just Bird) [] = Right (emitClose ts Nothing)\nrelit' _ _  (Just o)    [] = Left $ UnexpectedEnd o\nrelit' ss ts q ((n, l):ls) = case (q, q') of\n\n  (Nothing  , Nothing)   -\u003e continue\n\n  (Nothing  , Just Bird) -\u003e blockOpen $ Just (stripBird l)\n  (Nothing  , Just (Asciidoc Begin _))\n    | Just ls' \u003c- asciidocFence ls\n                         -\u003e blockOpen' ls' Nothing\n  (Nothing  , Just c)\n    | isBegin c          -\u003e blockOpen Nothing\n    | otherwise          -\u003e Left $ SpuriousDelimiter n c\n\n  (Just Bird, Nothing)   -\u003e blockClose $ Just l\n  (Just _o  , Nothing)   -\u003e blockContinue l\n\n  (Just Bird, Just Bird) -\u003e blockContinue $ stripBird l\n  (Just _o  , Just Bird) -\u003e continue\n  (Just o   , Just c)\n    | o `match` c        -\u003e blockClose Nothing\n    | otherwise          -\u003e Left $ SpuriousDelimiter n c\n\n  where\n    q'                 = isDelimiter (ss `or` all) l\n    continueWith r ls' = relit' (ss `or` inferred q') ts r ls'\n    continue           = (l :)                \u003c$\u003e continueWith q ls\n    blockOpen' ls' l'  = (emitOpen  ts l' \u003c\u003e) \u003c$\u003e continueWith q' ls'\n    blockOpen      l'  = blockOpen' ls l'\n    blockContinue  l'  = (emitCode  ts l' :)  \u003c$\u003e continueWith q ls\n    blockClose     l'  = (emitClose ts l' \u003c\u003e) \u003c$\u003e continueWith Nothing ls\n```\nError handling\n==============\n\nIn case of an error both `unlit` and `relit` return a value of the datatype `Error`.\n\n``` haskell\ndata Error\n  = SpuriousDelimiter Int Delimiter\n  | UnexpectedEnd     Delimiter\n  deriving (Eq, Show)\n```\nWe can get a text representation of the error using `showError`.\n\n``` haskell\nshowError :: Error -\u003e Text\nshowError (UnexpectedEnd       q) = \"unexpected end of file: unmatched \" \u003c\u003e emitDelimiter q\nshowError (SpuriousDelimiter n q) = \"at line \" \u003c\u003e pack (show n) \u003c\u003e \": spurious \"  \u003c\u003e emitDelimiter q\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenkokke%2Funlit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwenkokke%2Funlit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenkokke%2Funlit/lists"}