{"id":13630770,"url":"https://github.com/lettier/parsing-with-haskell-parser-combinators","last_synced_at":"2025-03-26T18:31:06.717Z","repository":{"id":147294184,"uuid":"198321068","full_name":"lettier/parsing-with-haskell-parser-combinators","owner":"lettier","description":"🔍 A step-by-step guide to parsing using Haskell parser combinators.","archived":false,"fork":false,"pushed_at":"2019-11-11T06:02:25.000Z","size":43,"stargazers_count":93,"open_issues_count":0,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-22T06:31:37.565Z","etag":null,"topics":["functional-programming","functional-programming-examples","haskell","haskell-exercises","haskell-learning","haskell-tutorial","learn-to-code","learning-by-doing","parsec","parser","parser-combinator","parser-combinators","parsers","parsing","programming-exercises","srt","srt-format","srt-parser","srt-subtitles","subtitles"],"latest_commit_sha":null,"homepage":"https://lettier.github.io/parsing-with-haskell-parser-combinators/","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lettier.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-07-23T00:25:28.000Z","updated_at":"2024-12-03T14:27:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"19f77c82-512c-476b-85b7-990160075fa3","html_url":"https://github.com/lettier/parsing-with-haskell-parser-combinators","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lettier%2Fparsing-with-haskell-parser-combinators","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lettier%2Fparsing-with-haskell-parser-combinators/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lettier%2Fparsing-with-haskell-parser-combinators/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lettier%2Fparsing-with-haskell-parser-combinators/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lettier","download_url":"https://codeload.github.com/lettier/parsing-with-haskell-parser-combinators/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245712593,"owners_count":20660265,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["functional-programming","functional-programming-examples","haskell","haskell-exercises","haskell-learning","haskell-tutorial","learn-to-code","learning-by-doing","parsec","parser","parser-combinator","parser-combinators","parsers","parsing","programming-exercises","srt","srt-format","srt-parser","srt-subtitles","subtitles"],"created_at":"2024-08-01T22:01:58.932Z","updated_at":"2025-03-26T18:31:06.437Z","avatar_url":"https://github.com/lettier.png","language":"Haskell","funding_links":[],"categories":["Haskell","Table of Contents"],"sub_categories":["Repos"],"readme":"\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Parsing With Haskell Parser Combinators\" src=\"https://i.imgur.com/hKqlZrP.gif\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\n# Parsing With Haskell Parser Combinators\n\nNeed to parse something?\nNever heard of a \"parser combinator\"?\nLooking to learn some Haskell?\nAwesome!\nBelow is everything you'll need to get up and parsing with Haskell parser combinators.\nFrom here you can try tackling esoteric data serialization formats,\ncompiler front ends,\ndomain specific languages—you name it!\n\n- [Building The Demos](#building-the-demos)\n- [Running The Demos](#running-the-demos)\n- [Parser Combinator](#parser-combinator)\n- [Version Number](#version-number)\n- [SRT](#srt)\n- [Exercises](#exercises)\n\n## Building The Demos\n\nIncluded with this guide are two demo programs.\n\n`version-number-parser` parses a file for a version number.\n`srt-file-parser` parses a file for SRT subtitles.\nFeel free to try them out with the files found in `test-input/`.\n\n### Stack\n\nDownload the Haskell tool [Stack](https://docs.haskellstack.org/en/stable/README/)\nand then run the following.\n\n```bash\ngit clone https://github.com/lettier/parsing-with-haskell-parser-combinators\ncd parsing-with-haskell-parser-combinators\nstack build\n```\n\n### Cabal\n\nIf using Cabal, you can run the following.\n\n```bash\ngit clone https://github.com/lettier/parsing-with-haskell-parser-combinators\ncd parsing-with-haskell-parser-combinators\ncabal sandbox init\ncabal --require-sandbox build\ncabal --require-sandbox install\n```\n\n## Running The Demos\n\nAfter building the two demo programs, you can run them like so.\n\n### Stack\n\nTo try the version number parser, run the following.\n\n```bash\ncd parsing-with-haskell-parser-combinators\nstack exec -- version-number-parser\nWhat is the version output file path?\ntest-input/gifcurry-version-output.txt\n```\n\nTo try the SRT file parser, run the following.\n\n```bash\ncd parsing-with-haskell-parser-combinators\nstack exec -- srt-file-parser\nWhat is the SRT file path?\ntest-input/subtitles.srt\n```\n\n### Cabal\n\nTo try the version number parser, run the following.\n\n```bash\ncd parsing-with-haskell-parser-combinators\n.cabal-sandbox/bin/version-number-parser\nWhat is the version output file path?\ntest-input/gifcurry-version-output.txt\n```\n\nTo try the SRT file parser, run the following.\n\n```bash\ncd parsing-with-haskell-parser-combinators\n.cabal-sandbox/bin/srt-file-parser\nWhat is the SRT file path?\ntest-input/subtitles.srt\n```\n\n## Parser Combinator\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Parser Combinators\" src=\"https://i.imgur.com/MLHPxhx.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\nOne of the better ways to learn about the parsing strategy,\n[parser combinator](https://en.wikipedia.org/wiki/Parser_combinator),\nis to look at an implementation of one.\n\n\u003cblockquote\u003e\n\u003cp align=\"right\"\u003e\nParsers built using combinators are straightforward to construct, readable, modular, well-structured, and easily maintainable.\n\u003cbr\u003e\u003cbr\u003e\n\u003csup\u003e\n—\u003ca href=\"https://en.wikipedia.org/wiki/Parser_combinator\"\u003eParser combinator - Wikipedia\u003c/a\u003e\n\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/blockquote\u003e\n\n### ReadP\n\nLet's take a look under the hood of [ReadP](https://hackage.haskell.org/package/base-4.12.0.0/docs/Text-ParserCombinators-ReadP.html),\na parser combinator library found in base.\nSince it is in base, you should already have it.\n\n:bulb: Note, you may want to try out [Parsec](https://hackage.haskell.org/package/parsec) after getting familiar with ReadP.\nIt too is a parser combinator library that others prefer to ReadP.\nAs an added bonus, it is included in\n[GHC's boot libraries](https://gitlab.haskell.org/ghc/ghc/wikis/commentary/libraries/version-history)\nas of GHC version 8.4.1.\n\n#### P Data Type\n\n```haskell\n-- (c) The University of Glasgow 2002\n\ndata P a\n  = Get (Char -\u003e P a)\n  | Look (String -\u003e P a)\n  | Fail\n  | Result a (P a)\n  | Final [(a,String)]\n  deriving Functor\n```\n\nWe'll start with the `P` data type.\nThe `a` in `P a` is up to you (the library user) and can be whatever you'd like.\nThe compiler creates a functor instance automatically and there are hand-written instances for\napplicative,\nmonad,\n`MonadFail`,\nand alternative.\n\n:bulb: Note, for more on functors, applicatives, and monads, checkout\n[Your easy guide to Monads, Applicatives, \u0026 Functors](https://medium.com/@lettier/your-easy-guide-to-monads-applicatives-functors-862048d61610).\n\n`P` is a [sum type](https://en.wikipedia.org/wiki/Tagged_union) with five cases.\n\n- `Get` consumes a single character from the input string and returns a new `P`.\n- `Look` accepts a duplicate of the input string and returns a new `P`.\n- `Fail` indicates the parser finished without a result.\n- `Result` holds a possible parsing and another `P` case.\n- `Final` is a list of two-tuples. The first tuple element is a possible parsing of the input\n  and the second tuple element is the rest of the input string that wasn't consumed by `Get`.\n\n#### Run\n\n```haskell\n-- (c) The University of Glasgow 2002\n\nrun :: P a -\u003e ReadS a\nrun (Get f)      (c:s) = run (f c) s\nrun (Look f)     s     = run (f s) s\nrun (Result x p) s     = (x,s) : run p s\nrun (Final r)    _     = r\nrun _            _     = []\n```\n\n`run` is the heart of the ReadP parser.\nIt does all of the heavy lifting as it recursively runs through all of the parser states that we saw up above.\nYou can see that it takes a `P` and returns a `ReadS`.\n\n```haskell\n-- (c) The University of Glasgow 2002\n\ntype ReadS a = String -\u003e [(a,String)]\n```\n\n`ReadS a` is a type alias for `String -\u003e [(a,String)]`.\nSo whenever you see `ReadS a`, think `String -\u003e [(a,String)]`.\n\n```haskell\n-- (c) The University of Glasgow 2002\n\nrun :: P a -\u003e String -\u003e [(a,String)]\nrun (Get f)      (c:s) = run (f c) s\nrun (Look f)     s     = run (f s) s\nrun (Result x p) s     = (x,s) : run p s\nrun (Final r)    _     = r\nrun _            _     = []\n```\n\n`run` pattern matches the different cases of `P`.\n\n- If it's `Get`,\n  it calls itself with a new `P` (returned by passing the function `f`, in `Get f`, the next character `c` in the input string)\n  and the rest of the input string `s`.\n- If it's `Look`,\n  it calls itself with a new `P` (returned by passing the function `f`, in `Look f`,  the input string `s`)\n  and the input string.\n  Notice how `Look` doesn't consume any characters from the input string like `Get` does.\n- If it's `Result`,\n  it assembles a two-tuple—containing the parsed result and what's left of the input string—and\n  prepends this to the result of a recursive call that runs with another `P` case and the input string.\n- If it's `Final`, `run` returns a list of two-tuples containing parsed results and input string leftovers.\n- For anything else, `run` returns an empty list.\n  For example, if the case is `Fail`, `run` will return an empty list.\n\n```haskell\n\u003e run (Get (\\ a -\u003e Get (\\ b -\u003e Result [a,b] Fail))) \"12345\"\n[(\"12\",\"345\")]\n```\n\nReadP doesn't expose `run` but if it did, you could call it like this.\nThe two `Get`s consume the `'1'` and `'2'`, leaving the `\"345\"` behind.\n\n```haskell\n\u003e run (Get (\\ a -\u003e Get (\\ b -\u003e Result [a,b] Fail))) \"12345\"\n\u003e run (Get (\\ b -\u003e Result ['1',b] Fail)) \"2345\"\n\u003e run (Result ['1','2'] Fail) \"345\"\n\u003e (['1', '2'], \"345\") : run (Fail) \"345\"\n\u003e (['1', '2'], \"345\") : []\n[(\"12\",\"345\")]\n```\n\nRunning through each recursive call, you can see how we arrived at the final result.\n\n```haskell\n\u003e run (Get (\\ a -\u003e Get (\\ b -\u003e Result [a,b] (Final [(['a','b'],\"c\")])))) \"12345\"\n[(\"12\",\"345\"),(\"ab\",\"c\")]\n```\n\nUsing `Final`, you can include a parsed result in the final list of two-tuples.\n\n#### readP_to_S\n\n```haskell\n-- (c) The University of Glasgow 2002\n\n   readP_to_S :: ReadP a -\u003e ReadS a\n-- readP_to_S :: ReadP a -\u003e String -\u003e [(a,String)]\n   readP_to_S (R f) = run (f return)\n```\n\nWhile ReadP doesn't expose `run` directly, it does expose it via `readP_to_S`.\n`readP_to_S` introduces a `newtype` called `ReadP`.\n`readP_to_S` accepts a `ReadP a`, a string, and returns a list of two-tuples.\n\n#### ReadP Newtype\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"ReadP Newtype\" src=\"https://i.imgur.com/7WJPwLC.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\n```haskell\n-- (c) The University of Glasgow 2002\n\nnewtype ReadP a = R (forall b . (a -\u003e P b) -\u003e P b)\n```\n\nHere's the definition of `ReadP a`.\nThere are instances for functor, applicative, monad, `MonadFail`, alternative, and `MonadPlus`.\nThe `R` constructor takes a function that takes another function and returns a `P`.\nThe accepted function takes whatever you chose for `a` and returns a `P`.\n\n```haskell\n-- (c) The University of Glasgow 2002\n\nreadP_to_S (R f) = run (f return)\n```\n\nRecall that `P` is a monad and `return`'s type is `a -\u003e m a`.\nSo `f` is the `(a -\u003e P b) -\u003e Pb` function and `return` is the `(a -\u003e P b)` function.\nUltimately, `run` gets the `P b` it expects.\n\n```haskell\n-- (c) The University of Glasgow 2002\n\nreadP_to_S (R f) inputString = run (f return) inputString\n--               ^^^^^^^^^^^                  ^^^^^^^^^^^\n```\n\nIt's left off in the source code but remember that `readP_to_S` and `run` expects an input string.\n\n```haskell\n-- (c) The University of Glasgow 2002\n\ninstance Functor ReadP where\n  fmap h (R f) = R (\\k -\u003e f (k . h))\n```\n\nHere's the functor instance definition for `ReadP`.\n\n```haskell\n\u003e readP_to_S (fmap toLower get) \"ABC\"\n[('a',\"BC\")]\n\n\u003e readP_to_S (toLower \u003c$\u003e get) \"ABC\"\n[('a',\"BC\")]\n```\n\nThis allows us to do something like this.\n`fmap` functor maps `toLower` over the functor `get` which equals `R Get`.\nRecall that the type of `Get` is `(Char -\u003e P a) -\u003e P a` which the `ReadP` constructor (`R`) accepts.\n\n```haskell\n-- (c) The University of Glasgow 2002\n\nfmap h       (R f  ) = R (\\ k -\u003e f   (k . h      ))\nfmap toLower (R Get) = R (\\ k -\u003e Get (k . toLower))\n```\n\nHere you see the functor definition rewritten for the `fmap toLower get` example.\n\n#### Applicative P Instance\n\nLooking up above, how did `readP_to_S` return `[('a',\"BC\")]` when we only used `Get` which doesn't terminate `run`?\nThe answer lies in the applicative definition for `P`.\n\n```haskell\n-- (c) The University of Glasgow 2002\n\ninstance Applicative P where\n  pure x = Result x Fail\n  (\u003c*\u003e) = ap\n```\n\n`return` equals `pure` so we could rewrite `readP_to_S (R f) = run (f return)` to be `readP_to_S (R f) = run (f pure)`.\nBy using `return` or rather `pure`, `readP_to_S` sets `Result x Fail` as the final case `run` will encounter.\nIf reached,\n`run` will terminate and we'll get our list of parsings.\n\n```haskell\n\u003e readP_to_S (fmap toLower get) \"ABC\"\n\n-- Use the functor instance to transform fmap toLower get.\n\u003e readP_to_S (R (\\ k -\u003e Get (k . toLower))) \"ABC\"\n\n-- Call run which removes R.\n\u003e run ((\\ k -\u003e Get (k . toLower)) pure) \"ABC\"\n\n-- Call function with pure to get rid of k.\n\u003e run (Get (pure . toLower)) \"ABC\"\n\n-- Call run for Get case to get rid of Get.\n\u003e run ((pure . toLower) 'A') \"BC\"\n\n-- Call toLower with 'A' to get rid of toLower.\n\u003e run (pure 'a') \"BC\"\n\n-- Use the applicative instance to transform pure 'a'.\n\u003e run (Result 'a' Fail) \"BC\"\n\n-- Call run for the Result case to get rid of Result.\n\u003e ('a', \"BC\") : run (Fail) \"BC\"\n\n-- Call run for the Fail case to get rid of Fail.\n\u003e ('a', \"BC\") : []\n\n-- Prepend.\n[('a',\"BC\")]\n```\n\nHere you see the flow from `readP_to_S` to the parsed result.\n\n#### Alternative P Instance\n\n```haskell\n-- (c) The University of Glasgow 2002\n\ninstance Alternative P where\n  -- ...\n\n  -- most common case: two gets are combined\n  Get f1     \u003c|\u003e Get f2     = Get (\\c -\u003e f1 c \u003c|\u003e f2 c)\n\n  -- results are delivered as soon as possible\n  Result x p \u003c|\u003e q          = Result x (p \u003c|\u003e q)\n  p          \u003c|\u003e Result x q = Result x (p \u003c|\u003e q)\n\n  -- ...\n```\n\nThe `Alternative` instance for `P` allows us to split the flow of the parser into a left and right path.\nThis comes in handy when the input can go none, one, or (more rarely) two of two ways.\n\n```haskell\n\u003e readP_to_S ((get \u003e\u003e= \\ a -\u003e return a) \u003c|\u003e (get \u003e\u003e get \u003e\u003e= \\ b -\u003e return b)) \"ABC\"\n[('A',\"BC\"),('B',\"C\")]\n```\n\nThe `\u003c|\u003e` operator or function introduces a fork in the parser's flow.\nThe parser will travel through both the left and right paths.\nThe end result will contain all of the possible parsings that went left\nand all of the possible parsings that went right.\nIf both paths fail, then the whole parser fails.\n\n:bulb: Note, in other parser combinator implementations,\nwhen using the `\u003c|\u003e` operator,\nthe parser will go left or right but not both.\nIf the left succeeds, the right is ignored.\nThe right is only processed if the left side fails.\n\n```haskell\n\u003e readP_to_S ((get \u003e\u003e= \\ a -\u003e return [a]) \u003c|\u003e look \u003c|\u003e (get \u003e\u003e get \u003e\u003e= \\a -\u003e return [a])) \"ABC\"\n[(\"ABC\",\"ABC\"),(\"A\",\"BC\"),(\"B\",\"C\")]\n```\n\nYou can chain the `\u003c|\u003e` operator for however many options or alternatives there are.\nThe parser will return a possible parsing involving each.\n\n#### ReadP Failure\n\n```haskell\n-- (c) The University of Glasgow 2002\n\ninstance Monad ReadP where\n  fail _    = R (\\_ -\u003e Fail)\n  R m \u003e\u003e= f = R (\\k -\u003e m (\\a -\u003e let R m' = f a in m' k))\n```\n\nHere is the `ReadP` monad instance.\nNotice the definition for `fail`.\n\n```haskell\n\u003e readP_to_S ((\\ a b c -\u003e [a,b,c]) \u003c$\u003e get \u003c*\u003e get \u003c*\u003e get) \"ABC\"\n[(\"ABC\",\"\")]\n\n\u003e readP_to_S ((\\ a b c -\u003e [a,b,c]) \u003c$\u003e get \u003c*\u003e fail \"\" \u003c*\u003e get) \"ABC\"\n[]\n\n\u003e readP_to_S (get \u003e\u003e= \\ a -\u003e get \u003e\u003e= \\ b -\u003e get \u003e\u003e= \\ c -\u003e return [a,b,c]) \"ABC\"\n[(\"ABC\",\"\")]\n\n\u003e readP_to_S (get \u003e\u003e= \\ a -\u003e get \u003e\u003e= \\ b -\u003e fail \"\" \u003e\u003e= \\ c -\u003e return [a,b,c]) \"ABC\"\n[]\n```\n\nYou can cause an entire parser path to abort by calling `fail`.\nSince ReadP doesn't provide a direct way to generate a `Result` or `Final` case,\nthe return value will be an empty list.\nIf the failed path is the only path, then the entire result will be an empty list.\nRecall that when `run` matches `Fail`, it returns an empty list.\n\n```haskell\n-- (c) The University of Glasgow 2002\n\ninstance Alternative P where\n  -- ...\n\n  -- fail disappears\n  Fail       \u003c|\u003e p          = p\n  p          \u003c|\u003e Fail       = p\n\n  -- ...\n```\n\nGoing back to the alternative `P` instance,\nyou can see how a failure on either side (but not both) will not fail the whole parser.\n\n```haskell\n\u003e readP_to_S (get \u003e\u003e= \\ a -\u003e get \u003e\u003e= \\ b -\u003e pfail \u003e\u003e= \\ c -\u003e return [a,b,c]) \"ABC\"\n[]\n```\n\nInstead of using `fail`, ReadP provides `pfail` which allows you to generate a `Fail` case directly.\n\n## Version Number\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Version Number\" src=\"https://i.imgur.com/mHnqDjf.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\n[Gifcurry](https://github.com/lettier/gifcurry),\nthe Haskell-built video editor for GIF makers, shells out to various different programs.\nTo ensure compatibility, it needs the version number for each of the programs it shells out to.\nOne of those programs is ImageMagick.\n\n```bash\nVersion: ImageMagick 6.9.10-14 Q16 x86_64 2018-10-24 https://imagemagick.org\nCopyright: © 1999-2018 ImageMagick Studio LLC\nLicense: https://imagemagick.org/script/license.php\nFeatures: Cipher DPC HDRI Modules OpenCL OpenMP\n```\n\nHere you see the output of `convert --version`.\nHow could you parse this to capture the 6, 9, 10, and 14?\n\nLooking at the output,\nwe know the version number is a collection of numbers separated by either a period or a dash.\nThis definition covers the dates as well so we'll make sure that the first two numbers are separated by a period.\nThat way, if they put a date before the version number, we won't get the wrong result.\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Version Number Parser\" src=\"https://i.imgur.com/3hZDOpI.gif\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\n```txt\n1. Consume zero or more characters that are not 0 through 9 and go to 2.\n2. Consume zero or more characters that are 0 through 9, save this number, and go to 3.\n3. Look at the rest of the input and go to 4.\n4. If the input\n    - is empty, go to 6.\n    - starts with a period, go to 1.\n    - starts with a dash\n        - and you have exactly one number, go to 5.\n        - and you have more than one number, go to 1.\n    - doesn't start with a period or dash\n        - and you have exactly one number, go to 5.\n        - you have more than one number, go to 6.\n5. Delete any saved numbers and go to 1.\n6. Return the numbers found.\n```\n\nBefore we dive into the code, here's the algorithm we'll be following.\n\n### Building The Version Number Parser\n\n```haskell\nparseVersionNumber\n  ::  [String]\n  -\u003e  ReadP [String]\nparseVersionNumber\n  nums\n  = do\n  _         \u003c- parseNotNumber\n  num       \u003c- parseNumber\n  let nums' = nums ++ [num]\n  parseSeparator nums' parseVersionNumber\n```\n\n`parseVersionNumber` is the main parser combinator that parses an input string for a version number.\nIt accepts a list of strings and returns a list of strings in the context of the `ReadP` data type.\nThe accepted list of strings is not the input that gets parsed but rather the list of numbers found so far.\nFor the first function call, the list is empty since it hasn't parsed anything yet.\n\n```haskell\nparseVersionNumber\n  nums\n```\n\nStarting from the top,\n`parseVersionNumber` takes a list of strings which are the current list of numbers found so far.\n\n```haskell\n  _         \u003c- parseNotNumber\n```\n\n`parseNotNumber` consumes everything that isn't a number from the input string.\nSince we are not interested in the result, we discard it (`_ \u003c-`).\n\n```haskell\n  num       \u003c- parseNumber\n  let nums' = nums ++ [num]\n```\n\nNext we consume everything that is a number and then add that to the list of numbers found so far.\n\n```haskell\n  parseSeparator nums' parseVersionNumber\n```\n\nAfter `parseVersionNumber` has processed the next number, it passes the list of numbers found and itself to `parseSeparator`.\n\n#### Parsing The Separator\n\n```haskell\nparseSeparator\n  ::  [String]\n  -\u003e  ([String] -\u003e ReadP [String])\n  -\u003e  ReadP [String]\nparseSeparator\n  nums\n  f\n  = do\n  next \u003c- look\n  case next of\n    \"\"    -\u003e return nums\n    (c:_) -\u003e\n      case c of\n        '.' -\u003e f nums\n        '-' -\u003e if length nums == 1 then f [] else f nums\n        _   -\u003e if length nums == 1 then f [] else return nums\n```\n\nHere you see `parseSeparator`.\n\n```haskell\n  next \u003c- look\n  case next of\n    \"\"    -\u003e return nums\n    (c:_) -\u003e\n```\n\n`look` allows us to get what's left of the input string without consuming it.\nIf there's nothing left, it returns the numbers found.\nHowever, if there is something left, it analyzes the first character.\n\n```haskell\n      case c of\n        '.' -\u003e f nums\n        '-' -\u003e if length nums == 1 then f [] else f nums\n        _   -\u003e if length nums == 1 then f [] else return nums\n```\n\nIf the next character is a period, call `parseVersionNumber` again with the current list of numbers found.\nIf it's a dash and we have exactly one number, call `parseVersionNumber` with an empty list of numbers since it's a date.\nIf it's a dash and we don't have exactly one number, call `parseVersionNumber` with the list of numbers found so far.\nOtherwise,\ncall `parseVersionNumber` with an empty list if we have exactly one number\nor return the numbers found if we don't have exactly one number.\n\n#### Parsing Non-numbers\n\n```haskell\nparseNotNumber\n  ::  ReadP String\nparseNotNumber\n  =\n  munch (not . isNumber)\n```\n\n`parseNotNumber` uses `munch` which `ReadP` provides.\n`munch` is given the predicate `(not . isNumber)` which returns true for any character that isn't 0 through 9.\n\n```haskell\nmunch :: (Char -\u003e Bool) -\u003e ReadP String\n```\n\n`munch` continuously calls `get` if the next character in the input string satisfies the predicate.\nIf it doesn't, `munch` returns the characters that did, if any.\nSince it only uses `get`, munch always succeeds.\n\n:bulb: Note, `parseNumber` is similar to `parseNotNumber`.\nInstead of `not . isNumber`, the predicate is just `isNumber`.\n\n#### Munch Versus Many\n\n```haskell\nparseNotNumber'\n  ::  ReadP String\nparseNotNumber'\n  =\n  many (satisfy (not . isNumber))\n```\n\nInstead of using `munch`,\nyou could write `parseNotNumber` like this,\nusing `many` and `satisfy`—both of which ReadP provides.\nLooking at the type signature for `many`, it accepts a single parser combinator (`ReadP a`).\nIn this instance, it's being given the parser combinator `satisfy`.\n\n```haskell\n\u003e readP_to_S (satisfy (not . isNumber)) \"a\"\n[('a',\"\")]\n\n\u003e readP_to_S (satisfy (not . isNumber)) \"1\"\n[]\n```\n\n`satisfy` takes a predicate and uses `get` to consume the next character.\nIf the accepted predicate returns true, `satisfy` returns the character.\nOtherwise, `satisfy` calls `pfail` and fails.\n\n```haskell\n\u003e readP_to_S (munch (not . isNumber)) \"abc123\"\n[(\"abc\",\"123\")]\n\n\u003e readP_to_S (many (satisfy (not . isNumber))) \"abc123\"\n[(\"\",\"abc123\"),(\"a\",\"bc123\"),(\"ab\",\"c123\"),(\"abc\",\"123\")]\n```\n\nUsing `many` can give you unwanted results.\nUltimately, `many` introduces one or more `Result` cases.\nBecause of this, `many` always succeeds.\n\n```haskell\n\u003e readP_to_S (many look) \"abc123\"\n-- Runs forever.\n```\n\n`many` will run your parser until it fails or runs out of input.\nIf your parser never fails or never runs out of input, `many` will never return.\n\n```haskell\n\u003e readP_to_S (many (get \u003e\u003e= \\ a -\u003e return (read (a : \"\") :: Int))) \"12345\"\n[([],\"12345\"),([1],\"2345\"),([1,2],\"345\"),([1,2,3],\"45\"),([1,2,3,4],\"5\"),([1,2,3,4,5],\"\")]\n```\n\nFor every index in the result,\nthe parsed result will be the outcome of having ran the parser index times on the entire input.\n\n```haskell\n\u003e let parser        = get \u003e\u003e= \\ a -\u003e return (read (a : \"\") :: Int)\n\u003e let many' results = return results \u003c|\u003e (parser \u003e\u003e= \\ result -\u003e many' (results ++ [result]))\n\u003e readP_to_S (many' []) \"12345\"\n[([],\"12345\"),([1],\"2345\"),([1,2],\"345\"),([1,2,3],\"45\"),([1,2,3,4],\"5\"),([1,2,3,4,5],\"\")]\n```\n\nHere's an alternate definition for `many`.\nOn the left side of `\u003c|\u003e`,\nit returns the current parser results.\nOn the right side of `\u003c|\u003e`,\nit runs the parser,\nadds that result to the current parser results,\nand calls itself with the updated results.\nThis has a cumulative sum type effect where index `i` is the parser result appended to the parser result at\n`i - 1`,\n`i - 2`,\n...,\nand `1`.\n\n### Running The Version Number Parser\n\nNow that we built the parser, let's run it.\n\n```haskell\n\u003e let inputString =\n\u003e     \"Some Program (C) 1234-56-78 All rights reserved.\\n\\\n\u003e     \\Version: 12.345.6-7\\n\\\n\u003e     \\License: Some open source license.\"\n\u003e readP_to_S (parseVersionNumber []) inputString\n[([\"12\",\"345\",\"6\",\"7\"],\"\\nLicense: Some open source license.\")]\n```\n\nYou can see it extracted the version number correctly even with the date coming before it.\n\n## SRT\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"SRT\" src=\"https://i.imgur.com/pMTs3AB.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\nNow let's parse something more complicated—SRT files.\n\nFor the release of\n[Gifcurry](https://lettier.github.io/gifcurry)\nsix, I needed to parse\n[SRT (SubRip Text) files](http://www.visualsubsync.org/help/srt).\nSRT files contain subtitles that video processing programs use to display text on top of a video.\nTypically this text is the dialog of a movie translated into various different languages.\nBy keeping the text separate from the video,\nthere only needs to be one video which saves time, storage space, and bandwidth.\nThe video software can swap out the text without having to swap out the video.\nContrast this with burning-in or hard-coding the subtitles where the text becomes a part of the image data that makes up the video.\nIn this case, you would need a video for each collection of subtitles.\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Gifcurry\" src=\"https://i.imgur.com/RUwM8eE.gif\"\u003e\n\u003cbr\u003e\n\u003csup\u003eInner Video © Blender Foundation | www.sintel.org\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\nGifcurry can take a SRT file and burn-in the subtitles for the video slice your select.\n\n```txt\n7\n00:02:09,400 --\u003e 00:02:13,800\nWhat brings you to\nthe land of the gatekeepers?\n\n8\n00:02:15,000 --\u003e 00:02:17,500\nI'm searching for someone.\n\n9\n00:02:18,000 --\u003e 00:02:22,200\nSomeone very dear?\nA kindred spirit?\n```\n\nHere you see the English subtitles for\n[Sintel](https://durian.blender.org/) (© Blender Foundation | www.sintel.org).\n\n### SRT Format\n\n\u003cblockquote\u003e\n\u003cp align=\"right\"\u003e\nSRT is perhaps the most basic of all subtitle formats.\n\u003cbr\u003e\u003cbr\u003e\n\u003csup\u003e\n—\u003ca href=\"https://matroska.org/technical/specs/subtitles/srt.html\"\u003eSRT Subtitle | Matrosk\u003c/a\u003e\n\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/blockquote\u003e\n\nThe SRT file format consists of blocks, one for each subtitle, separated by an empty line.\n\n```txt\n2\n```\n\nAt the top of the block is the index.\nThis determines the order of the subtitles.\nHopefully the subtitles are already in order and all of them have unique indexes but this may not be the case.\n\n```txt\n01:04:13,000 --\u003e 02:01:01,640 X1:167 X2:267 Y1:33 Y2:63\n```\n\nAfter the index is the start time, end time, and an optional set of points specifying the rectangle the\nsubtitle text should go in.\n\n```txt\n01:04:13,000\n```\n\nThe timestamp format is `hours:minutes:seconds,milliseconds`.\n\n:bulb: Note the comma instead of the period separating the seconds from the milliseconds.\n\n```txt\nThis is the actual subtitle\ntext. It can span multiple lines.\nIt may include formating\nlike \u003cb\u003ebold\u003c/b\u003e, \u003ci\u003eitalic\u003c/i\u003e,\n\u003cu\u003eunderline\u003c/u\u003e,\nand \u003cfont color=\"#010101\"\u003efont color\u003c/font\u003e.\n```\n\nThe third and last part of a block is the subtitle text.\nIt can span multiple lines and ends when there is an empty line.\nThe text can include formatting tags reminiscent of HTML.\n\n### Building The SRT Parser\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Parsing SRT\" src=\"https://i.imgur.com/N1qlzd6.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\n```haskell\nparseSrt\n  ::  ReadP [SrtSubtitle]\nparseSrt\n  =\n  manyTill parseBlock (skipSpaces \u003e\u003e eof)\n```\n\n`parseSrt` is the main parser combinator that handles everything.\nIt parses each block until it reaches the end of the file (`eof`) or input.\nTo be on the safe side,\nthere could be trailing whitespace between the last block and the end of the file.\nTo handle this, it parses zero or more characters of whitespace (`skipSpaces`) before parsing\nthe end of the file (`skipSpaces \u003e\u003e eof`).\nIf there is still input left by the time `eof` is reached, `eof` will fail and this will return nothing.\nTherefore, it's important that `parseBlock` doesn't leave any thing but whitespace behind.\n\n#### Building The SRT Block Parser\n\n```haskell\nparseBlock\n  ::  ReadP SrtSubtitle\nparseBlock\n  = do\n  i      \u003c- parseIndex\n  (s, e) \u003c- parseTimestamps\n  c      \u003c- parseCoordinates\n  t      \u003c- parseTextLines\n  return\n    SrtSubtitle\n      { index       = i\n      , start       = s\n      , end         = e\n      , coordinates = c\n      , taggedText  = t\n      }\n```\n\nAs we went over earlier, a block consists of an index, timestamps, possibly some coordinates, and some lines of text.\nIn this version of `parseBlock`, you see the more imperative do notation style with the record syntax.\n\n```haskell\nparseBlock'\n  ::  ReadP SrtSubtitle\nparseBlock'\n  =\n      SrtSubtitle\n  \u003c$\u003e parseIndex\n  \u003c*\u003e parseStartTimestamp\n  \u003c*\u003e parseEndTimestamp\n  \u003c*\u003e parseCoordinates\n  \u003c*\u003e parseTextLines\n```\n\nHere's another way you could write `parseBlock`.\nThis is the applicative style.\nJust be sure to get the order right.\nFor example, I could've accidentally mixed up the start and end timestamps.\n\n#### Building The SRT Index Parser\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Parsing The Index\" src=\"https://i.imgur.com/bPF76DS.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\n```haskell\nparseIndex\n  ::  ReadP Int\nparseIndex\n  =\n      skipSpaces\n  \u003e\u003e  readInt \u003c$\u003e parseNumber\n```\n\nAt the top of the block is the index.\nHere you see `skipSpaces` again.\nAfter skipping over whitespace,\nit parses the input for numbers and converts it to an actual integer.\n\n```haskell\nreadInt\n  ::  String\n  -\u003e  Int\nreadInt\n  =\n  read\n```\n\n`readInt` looks like this.\n\n```haskell\n\u003e read \"123\" :: Int\n123\n\u003e read \"1abc\" :: Int\n*** Exception: Prelude.read: no parse\n```\n\nNormally using `read` directly can be dangerous.\n`read` may not be able to convert the input to the specified type.\nHowever, `parseNumber` will only return the 10 numerical digit characters (`['0'..'9']`)\nso using `read` directly becomes safe.\n\n#### Building The SRT Timestamps Parser\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Parsing The Timestamps\" src=\"https://i.imgur.com/yI3o6NM.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\nParsing the timestamps are a little more involved than parsing the index.\n\n```haskell\nparseTimestamps\n  ::  ReadP (Timestamp, Timestamp)\nparseTimestamps\n  = do\n  _   \u003c- char '\\n'\n  s   \u003c- parseTimestamp\n  _   \u003c- skipSpaces\n  _   \u003c- string \"--\u003e\"\n  _   \u003c- skipSpaces\n  e   \u003c- parseTimestamp\n  return (s, e)\n```\n\nThis is the main combinator for parsing the timestamps.\n\n`char` parses the character you give it or it fails.\nIf it fails then `parseTimestamps` fails, ultimately causing `parseSrt` to fail\nso there must be a newline character after the index.\n\n`string` is like `char` except instead of just one character, it\nparses the string of characters you give it or it fails.\n\n```haskell\nparseStartTimestamp\n  ::  ReadP Timestamp\nparseStartTimestamp\n  =\n      char '\\n'\n  \u003e\u003e  parseTimestamp\n```\n\n`parseTimestamps` parses both timestamps,\nbut for the applicative style (`parseSrt'`),\nwe need a parser just for the start timestamp.\n\n```haskell\nparseEndTimestamp\n  ::  ReadP Timestamp\nparseEndTimestamp\n  =\n      skipSpaces\n  \u003e\u003e  string \"--\u003e\"\n  \u003e\u003e  skipSpaces\n  \u003e\u003e  parseTimestamp\n```\n\nThis parses everything between the timestamps and returns the end timestamp.\n\n```haskell\nparseTimestamp\n  ::  ReadP Timestamp\nparseTimestamp\n  = do\n  h  \u003c- parseNumber\n  _  \u003c- char ':'\n  m  \u003c- parseNumber\n  _  \u003c- char ':'\n  s  \u003c- parseNumber\n  _  \u003c- char ',' \u003c|\u003e char '.'\n  m' \u003c- parseNumber\n  return\n    Timestamp\n      { hours        = readInt h\n      , minutes      = readInt m\n      , seconds      = readInt s\n      , milliseconds = readInt m'\n      }\n```\n\nThis parses the four numbers that make up the timestamp.\nThe first three numbers are separated by a colon and the last one is separated by a comma.\nTo be more forgiving, however, we allow the possibility of there being a period instead of a comma.\n\n```haskell\n\u003e readP_to_S (char '.' \u003c|\u003e char ',') \"...\"\n[('.',\"..\")]\n\n\u003e readP_to_S (char '.' \u003c|\u003e char ',') \",..\"\n[(',',\"..\")]\n```\n\n:bulb: Note, when using `char` with `\u003c|\u003e`,\nonly one side can succeed (two `char` enter, one `char` leave)\nsince `char` consumes a single character and two characters cannot occupy the same space.\n\n#### Building The SRT Coordinates Parser\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Parsing The Coordinates\" src=\"https://i.imgur.com/0mpO88C.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\nThe coordinates are an optional part of the block but if included, will be on the same line as the timestamps.\n\n```haskell\nparseCoordinates\n  ::  ReadP (Maybe SrtSubtitleCoordinates)\nparseCoordinates\n  =\n  option Nothing $ do\n    _  \u003c- skipSpaces1\n    x1 \u003c- parseCoordinate 'x' 1\n    _  \u003c- skipSpaces1\n    x2 \u003c- parseCoordinate 'x' 2\n    _  \u003c- skipSpaces1\n    y1 \u003c- parseCoordinate 'y' 1\n    _  \u003c- skipSpaces1\n    y2 \u003c- parseCoordinate 'y' 2\n    return\n      $ Just\n        SrtSubtitleCoordinates\n          { x1 = readInt x1\n          , x2 = readInt x2\n          , y1 = readInt y1\n          , y2 = readInt y2\n          }\n```\n\n`option` takes two arguments.\nThe first argument is returned if the second argument, a parser, fails.\nSo if the coordinates parser fails, `parseCoordinates` will return `Nothing`.\nPut another way, the coordinates parser failing does not cause the whole parser to fail.\nThis block will just have `Nothing` for its `coordinates` \"field\".\n\n```haskell\nparseCoordinate\n  ::  Char\n  -\u003e  Int\n  -\u003e  ReadP String\nparseCoordinate\n  c\n  n\n  = do\n  _  \u003c- char (Data.Char.toUpper c) \u003c|\u003e char (Data.Char.toLower c)\n  _  \u003c- string $ show n ++ \":\"\n  parseNumber\n```\n\nThis parser allows the coordinate labels to be in either uppercase or lowercase.\nFor example, `x1:1 X2:2 Y1:3 y2:4` would succeed.\n\n#### Building The SRT Text Parser\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Parsing The Text\" src=\"https://i.imgur.com/vMuZsa1.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\nParsing the text is the most involved portion due to the HTML-like tag formatting.\n\nTag parsing can be challenging—just ask anyone who parses them with a regular expression.\nTo make this easier on us—and for the user—we'll use a\n[tag soup](https://en.wikipedia.org/wiki/Tag_soup)\nkind of approach.\nThe parser will allow unclosed and/or wrongly nested tags.\nIt will also allow any tag and not just `b`, `u`, `i`, and `font`.\n\n```haskell\nparseTextLines\n  ::  ReadP [TaggedText]\nparseTextLines\n  =\n      char '\\n'\n  \u003e\u003e  (getTaggedText \u003c$\u003e manyTill parseAny parseEndOfTextLines)\n```\n\nWe start out by matching on a newline character.\nAfter that, we functor map or fmap (`\u003c$\u003e`) `getTaggedText` over the subtitle text characters until we reach the end of the text lines.\n\n```haskell\nparseEndOfTextLines\n  ::  ReadP ()\nparseEndOfTextLines\n  =\n  void (string \"\\n\\n\") \u003c|\u003e eof\n```\n\nWe stop collecting characters (`parseAny`) when we reach two newline characters or the end of the file.\nThis signals the end of the block.\n\n```haskell\ngetTaggedText\n  ::  String\n  -\u003e  [TaggedText]\ngetTaggedText\n  s\n  =\n  fst\n    $ foldl\n      folder\n      ([], [])\n      parsed\n  where\n```\n\n`getTaggedText` folds through the parsed text from left to right, returning the accumulated tagged text.\n\n```haskell\n    parsed\n      ::  [String]\n    parsed\n      =\n      case readP_to_S (parseTaggedText []) s of\n        []      -\u003e [s]\n        r@(_:_) -\u003e (fst . last) r\n```\n\n`parsed` returns a list of one or more strings.\nIt attempts to parse the input text for tags.\nIf that fails, `parsed` returns the input string inside a list.\nOtherwise, if `parseTaggedText` succeeds, `parse` returns the last possible parsing (`(fst . last) r`).\n\n```haskell\n    folder\n      ::  ([TaggedText], [Tag])\n      -\u003e  String\n      -\u003e  ([TaggedText], [Tag])\n    folder\n      (tt, t)\n      x\n      | isTag x   = (tt, updateTags t x)\n      | otherwise = (tt ++ [TaggedText { text = x, tags = t}], t)\n```\n\nAs `folder` moves from left to right, over the parsed strings, it checks if the current string is a tag.\nIf it is a tag, it updates the current set of active tags (`t`).\nOtherwise, it appends another tagged piece of text associated with the set of active tags.\n\n```haskell\nupdateTags\n  ::  [Tag]\n  -\u003e  String\n  -\u003e  [Tag]\nupdateTags\n  tags\n  x\n  | isClosingTag x = remove compare' tags (makeTag x)\n  | isOpeningTag x = add    compare' tags (makeTag x)\n  | otherwise      = tags\n  where\n    compare'\n      ::  Tag\n      -\u003e  Tag\n      -\u003e  Bool\n    compare'\n      a\n      b\n      =\n      name a /= name b\n```\n\n`updateTags` updates the `tags` given by either removing or adding the given tag (`x`) depending on if it is a closing or opening tag.\nIf it is neither, it just returns the passed set of tags.\n`add` will overwrite an existing tag if `tags` already has a tag by the same name.\nYou can see this in the `compare'` function given.\n\nTo keep the parser simple, if an opening tag `T` is found, `T` gets added to the list of tags\nor overwrites an exiting `T` if already present.\nIf a corresponding closing `/T` is found, then `T` is removed from the list of tags, if present.\nIt doesn't matter if there is two or more `T`s in a row,\none or more `T`s without a closing `/T`,\nand/or there's a closing `/T` without an opening `T`.\n\n```haskell\nmakeTag\n  ::  String\n  -\u003e  Tag\nmakeTag\n  s\n  =\n  Tag\n    { name       = getTagName       s\n    , attributes = getTagAttributes s\n    }\n```\n\n`makeTag` assembles a tag from the given string (`s`).\nEach `Tag` has a name and zero or more attributes.\n\n```haskell\nparseTaggedText\n  ::  [String]\n  -\u003e  ReadP [String]\nparseTaggedText\n  strings\n  = do\n  s \u003c- look\n  case s of\n    \"\" -\u003e return strings\n    _  -\u003e do\n      r \u003c- munch1 (/= '\u003c') \u003c++ parseClosingTag \u003c++ parseOpeningTag\n      parseTaggedText $ strings ++ [r]\n```\n\n`parseTaggedText` returns the input string broken up into pieces.\nEach piece is either the text enclosed by tags, a closing tag, or an opening tag.\nAfter it splits off a piece, it adds it to the other pieces and calls itself again.\nIf the remaining input string is empty, it returns the list of strings found.\n\n```haskell\n\u003e readP_to_S (string \"ab\" \u003c++ string \"abc\") \"abcd\"\n[(\"ab\",\"cd\")]\n\n\u003e readP_to_S (string \"ab\" +++ string \"abc\") \"abcd\"\n[(\"ab\",\"cd\"),(\"abc\",\"d\")]\n\n\u003e readP_to_S (string \"ab\" \u003c|\u003e string \"abc\") \"abcd\"\n[(\"ab\",\"cd\"),(\"abc\",\"d\")]\n```\n\nThe `\u003c++` operator is left biased meaning that if the left side succeeds, it won't even bother with the right.\nRecall that when we run the parser, we get a list of all the possible parsings.\nAll of these possible parsings are the result of the parser having traveled through all of the possible paths.\nBy using `\u003c++`,\nwe receive the possible parsings from the left path and from the right path if and only if the left side failed.\nIf you'd like all of the possible parsings through the left and right side,\nyou can use the `+++` operator provided by `ReadP`.\n`+++` is just `\u003c|\u003e` which we saw up above.\n\n```haskell\nparseOpeningTag\n  ::  ReadP String\nparseOpeningTag\n  = do\n  _ \u003c- char '\u003c'\n  t \u003c- munch1 (\\ c -\u003e c /= '/' \u0026\u0026 c /= '\u003e')\n  _ \u003c- char '\u003e'\n  return $ \"\u003c\" ++ t ++ \"\u003e\"\n```\n\nAn opening tag is an opening angle bracket, some text that doesn't include a forward slash, and the next immediate closing angle bracket.\n\n```haskell\nparseClosingTag\n  ::  ReadP String\nparseClosingTag\n  = do\n  _ \u003c- char '\u003c'\n  _ \u003c- char '/'\n  t \u003c- munch1 (/= '\u003e')\n  _ \u003c- char '\u003e'\n  return $ \"\u003c/\" ++ t ++ \"\u003e\"\n```\n\nA closing tag is an opening angle bracket, a forward slash, some text, and the next immediate closing angle bracket.\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Parsing Tags\" src=\"https://i.imgur.com/5HJWKPA.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\n```haskell\ngetTagAttributes\n  ::  String\n  -\u003e  [TagAttribute]\ngetTagAttributes\n  s\n  =\n  if isOpeningTag s\n    then\n      case readP_to_S (parseTagAttributes []) s of\n        []    -\u003e []\n        (x:_) -\u003e fst x\n    else\n      []\n```\n\nOpening tags can have attributes.\nFor example, `\u003cfont color=\"#101010\"\u003e`.\nEach attribute is a two-tuple, key-value pair.\nIn the above example, `color` would be the key and `#101010` would be the value.\n\n```haskell\ngetTagName\n  ::  String\n  -\u003e  String\ngetTagName\n  s\n  =\n  case readP_to_S parseTagName s of\n    []    -\u003e \"\"\n    (x:_) -\u003e toLower' $ fst x\n```\n\nThis returns the tag name in lowercase.\n\n```haskell\nparseTagName\n  ::  ReadP String\nparseTagName\n  = do\n  _ \u003c- char '\u003c'\n  _ \u003c- munch (== '/')\n  _ \u003c- skipSpaces\n  n \u003c- munch1 (\\ c -\u003e c /= ' ' \u0026\u0026 c /= '\u003e')\n  _ \u003c- munch  (/= '\u003e')\n  _ \u003c- char '\u003e'\n  return n\n```\n\nThe tag name is the first string of non-whitespace characters\nafter the opening angle bracket,\na possible forward slash,\nand some possible whitespace\nand before some more whitespace\nand/or the closing angle bracket.\n\n```haskell\nparseTagAttributes\n  ::  [TagAttribute]\n  -\u003e  ReadP [TagAttribute]\nparseTagAttributes\n  tagAttributes\n  = do\n  s \u003c- look\n  case s of\n    \"\" -\u003e return tagAttributes\n    _  -\u003e do\n      let h = head s\n      case h of\n        '\u003e' -\u003e return tagAttributes\n        '\u003c' -\u003e trimTagname \u003e\u003e parseTagAttributes'\n        _   -\u003e parseTagAttributes'\n  where\n    parseTagAttributes'\n      ::  ReadP [TagAttribute]\n    parseTagAttributes'\n      = do\n      tagAttribute \u003c- parseTagAttribute\n      parseTagAttributes\n        ( add\n            (\\ a b -\u003e fst a /= fst b)\n            tagAttributes\n            tagAttribute\n        )\n```\n\n`parseTagAttributes` recursively goes through the input string, collecting up the key-value pairs.\nAt the start of the tag (`\u003c`), it first trims the tag name before tackling the attributes.\nIt stops parsing for attributes when it reaches the closing angle bracket (`\u003e`).\nIf a tag happens to have duplicate attributes (based on the key),\n`add` will ensure only the latest one remains in the list.\n\n```haskell\ntrimTagname\n  :: ReadP ()\ntrimTagname\n  =\n      char '\u003c'\n  \u003e\u003e skipSpaces\n  \u003e\u003e munch1 (\\ c -\u003e c /= ' ' \u0026\u0026 c /= '\u003e')\n  \u003e\u003e return ()\n```\n\nThis trims or discards the tag name.\n\n```haskell\nparseTagAttribute\n  ::  ReadP TagAttribute\nparseTagAttribute\n  = do\n  _ \u003c- skipSpaces\n  k \u003c- munch1 (/= '=')\n  _ \u003c- string \"=\\\"\"\n  v \u003c- munch1 (/= '\\\"')\n  _ \u003c- char '\\\"'\n  _ \u003c- skipSpaces\n  return (toLower' k, v)\n```\n\nThe attribute key is any string of non-whitespace characters before the equal sign.\nThe attribute value is any characters after the equal sign and double quote and before the next immediate double quote.\n\n```haskell\nisTag\n  ::  String\n  -\u003e  Bool\nisTag\n  s\n  =\n  isOpeningTag s || isClosingTag s\n```\n\nA string is a tag if it is either an opening tag or a closing tag.\n\n```haskell\nisOpeningTag\n  ::  String\n  -\u003e  Bool\nisOpeningTag\n  s\n  =\n  isPresent $ readP_to_S parseOpeningTag s\n```\n\nA string is an opening tag if the opening tag parser succeeds.\n\n```haskell\nisClosingTag\n  ::  String\n  -\u003e  Bool\nisClosingTag\n  s\n  =\n  isPresent $ readP_to_S parseClosingTag s\n```\n\nA string is a closing tag if the closing tag parser succeeds.\n\n### Running The SRT Parser\n\n\u003cspan\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"Parsed SRT Results\" src=\"https://i.imgur.com/owAu628.jpg\"\u003e\n\u003cbr\u003e\n\u003csup\u003e\u003c/sup\u003e\n\u003c/p\u003e\n\u003c/span\u003e\n\nNow that we've assembled the parser, let's try it out.\n\n```haskell\n\u003e let srt =\n\u003e       \" 1\\n\\\n\u003e       \\0:0:0,1 --\u003e 0:1:0.2  x1:1 X2:3  y1:4 y2:10\\n\\\n\u003e       \\\u003cfont color=\\\"red\\\" color=\\\"blue\\\"\u003eThis is some \u003cb\u003e\u003cu\u003e\u003ci\u003e\\n \\\n\u003e       \\subtitle \\n\\\n\u003e       \\\u003c/u\u003etext.\u003c/b\u003e  \"\n\u003e readP_to_S parseSrt srt\n[([ SrtSubtitle\n      { index = 1\n      , start = Timestamp {hours = 0, minutes = 0, seconds = 0, milliseconds = 1}\n      , end   = Timestamp {hours = 0, minutes = 1, seconds = 0, milliseconds = 2}\n      , coordinates = Just (SrtSubtitleCoordinates {x1 = 1, x2 = 3, y1 = 4, y2 = 10})\n      , taggedText =  [ TaggedText\n                        { text = \"This is some \"\n                        , tags = [ Tag {name = \"font\", attributes = [(\"color\",\"blue\")]}\n                                 ]\n                        }\n                      , TaggedText\n                          { text = \"\\n subtitle \\n\"\n                          , tags = [ Tag {name = \"font\", attributes = [(\"color\",\"blue\")]}\n                                   , Tag {name = \"b\",    attributes = []}\n                                   , Tag {name = \"u\",    attributes = []}\n                                   , Tag {name = \"i\",    attributes = []}\n                                   ]\n                          }\n                      , TaggedText\n                          { text = \"text.\"\n                          , tags = [ Tag {name = \"font\", attributes = [(\"color\",\"blue\")]}\n                                   , Tag {name = \"b\",    attributes = []}\n                                   , Tag {name = \"i\",    attributes = []}\n                                   ]\n                          }\n                      , TaggedText\n                          { text = \"  \"\n                          , tags = [ Tag {name = \"font\", attributes = [(\"color\",\"blue\")]}\n                                   , Tag {name = \"i\",    attributes = []}\n                                   ]\n                          }\n                      ]\n      }\n  ]\n, \"\"\n)]\n```\n\nHere you see the result of parsing a test string.\nNotice the errors in the test string like the use of a period instead of a comma or the duplicate tag attribute.\n\n## Exercises\n\n- Write a program that can convert an SRT file to a JSON file.\n- Rewrite the version number parser using Parsec instead of ReadP.\n- Rewrite the SRT parser using Parsec instead of ReadP.\n\n## Copyright\n\n(C) 2019 David Lettier\n\u003cbr\u003e\n[lettier.com](https://www.lettier.com/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flettier%2Fparsing-with-haskell-parser-combinators","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flettier%2Fparsing-with-haskell-parser-combinators","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flettier%2Fparsing-with-haskell-parser-combinators/lists"}