{"id":13687466,"url":"https://github.com/haskell-github-trust/replace-megaparsec","last_synced_at":"2025-12-11T23:28:51.396Z","repository":{"id":45484816,"uuid":"201959141","full_name":"haskell-github-trust/replace-megaparsec","owner":"haskell-github-trust","description":"Stream editing with Haskell Megaparsec parsers","archived":false,"fork":false,"pushed_at":"2024-05-22T11:12:53.000Z","size":149,"stargazers_count":79,"open_issues_count":13,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-23T03:45:20.740Z","etag":null,"topics":["find-and-replace","haskell","haskell-library","megaparsec","stream-editor"],"latest_commit_sha":null,"homepage":"","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/haskell-github-trust.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-08-12T15:36:18.000Z","updated_at":"2024-05-22T11:12:56.000Z","dependencies_parsed_at":"2024-01-14T16:12:18.213Z","dependency_job_id":"707a7d8d-2d4d-429e-b3fe-14cf407b3874","html_url":"https://github.com/haskell-github-trust/replace-megaparsec","commit_stats":null,"previous_names":["jamesdbrock/replace-megaparsec"],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haskell-github-trust%2Freplace-megaparsec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haskell-github-trust%2Freplace-megaparsec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haskell-github-trust%2Freplace-megaparsec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haskell-github-trust%2Freplace-megaparsec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/haskell-github-trust","download_url":"https://codeload.github.com/haskell-github-trust/replace-megaparsec/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250366678,"owners_count":21418768,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["find-and-replace","haskell","haskell-library","megaparsec","stream-editor"],"created_at":"2024-08-02T15:00:55.123Z","updated_at":"2025-12-11T23:28:46.344Z","avatar_url":"https://github.com/haskell-github-trust.png","language":"Haskell","funding_links":[],"categories":["Haskell"],"sub_categories":[],"readme":"# replace-megaparsec\n\n[![Hackage](https://img.shields.io/hackage/v/replace-megaparsec.svg?style=flat)](https://hackage.haskell.org/package/replace-megaparsec)\n[![Stackage Nightly](http://stackage.org/package/replace-megaparsec/badge/nightly)](http://stackage.org/nightly/package/replace-megaparsec)\n[![Stackage LTS](http://stackage.org/package/replace-megaparsec/badge/lts)](http://stackage.org/lts/package/replace-megaparsec)\n\n* [Usage Examples](#usage-examples)\n* [In the Shell](#in-the-shell)\n* [Alternatives](#alternatives)\n* [Benchmarks](#benchmarks)\n* [Hypothetically Asked Questions](#hypothetically-asked-questions)\n\n__replace-megaparsec__ is for finding text patterns, and also\nreplacing or splitting on the found patterns.\nThis activity is traditionally done with regular expressions,\nbut __replace-megaparsec__ uses\n[__megaparsec__](http://hackage.haskell.org/package/megaparsec)\nparsers instead for the pattern matching.\n\n__replace-megaparsec__ can be used in the same sort of “pattern capture”\nor “find all” situations in which one would use Python\n[`re.findall`](https://docs.python.org/3/library/re.html#re.findall)\nor\nPerl [`m//`](https://perldoc.perl.org/functions/m.html),\nor\nUnix [`grep`](https://www.gnu.org/software/grep/).\n\n__replace-megaparsec__ can be used in the same sort of “stream editing”\nor “search-and-replace” situations in which one would use Python\n[`re.sub`](https://docs.python.org/3/library/re.html#re.sub),\nor\nPerl [`s///`](https://perldoc.perl.org/functions/s.html),\nor Unix\n[`sed`](https://www.gnu.org/software/sed/manual/html_node/The-_0022s_0022-Command.html),\nor\n[`awk`](https://www.gnu.org/software/gawk/manual/gawk.html).\n\n__replace-megaparsec__ can be used in the same sort of “string splitting”\nsituations in which one would use Python\n[`re.split`](https://docs.python.org/3/library/re.html#re.split)\nor Perl\n[`split`](https://perldoc.perl.org/functions/split.html).\n\nSee [__replace-attoparsec__](https://hackage.haskell.org/package/replace-attoparsec)\nfor the\n[__attoparsec__](http://hackage.haskell.org/package/attoparsec)\nversion.\n\n## Why would we want to do pattern matching and substitution with parsers instead of regular expressions?\n\n* Haskell parsers have a nicer syntax than\n  [regular expressions](https://en.wikipedia.org/wiki/Regular_expression),\n  which are notoriously\n  [difficult to read](https://en.wikipedia.org/wiki/Write-only_language).\n\n* Regular expressions can do “group capture” on sections of the matched\n  pattern, but they can only return stringy lists of the capture groups. Parsers\n  can construct typed data structures based on the capture groups, guaranteeing\n  no disagreement between the pattern rules and the rules that we're using\n  to build data structures based on the pattern matches.\n\n  For example, consider\n  scanning a string for numbers. A lot of different things can look like a number,\n  and can have leading plus or minus signs, or be in scientific notation, or\n  have commas, or whatever. If we try to parse all of the numbers out of a string\n  using regular expressions, then we have to make sure that the regular expression\n  and the string-to-number conversion function agree about exactly what is\n  and what isn't a numeric string. We can get into an awkward situation in which\n  the regular expression says it has found a numeric string but the\n  string-to-number conversion function fails. A typed parser will perform both\n  the pattern match and the conversion, so it will never be in that situation.\n  [Parse, don't validate.](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)\n\n* Regular expressions are only able to pattern-match\n  [regular](https://en.wikipedia.org/wiki/Chomsky_hierarchy#The_hierarchy)\n  grammars.\n  Megaparsec parsers are able pattern-match context-free grammars, and\n  even context-sensitive grammars, if needed. See below for\n  an example of lifting a `Parser` into a `State` monad for context-sensitive\n  pattern-matching.\n\n* The replacement expression for a traditional regular expression-based\n  substitution command is usually just a string template in which\n  the *Nth* “capture group” can be inserted with the syntax `\\N`. With\n  this library, instead of a template, we get\n  an `editor` function which can perform any computation, including IO.\n\n# Usage Examples\n\nThe examples depend on these imports.\n\n```haskell\nimport Data.Void\nimport Replace.Megaparsec\nimport Text.Megaparsec\nimport Text.Megaparsec.Char\nimport Text.Megaparsec.Char.Lexer\n```\n\n## Split strings with `splitCap`\n\n### Find all pattern matches, capture the matched text and the parsed result\n\nSeparate the input string into sections which can be parsed as a hexadecimal\nnumber with a prefix `\"0x\"`, and sections which can't. Parse the numbers.\n\n```haskell\nlet hexparser = chunk \"0x\" *\u003e hexadecimal :: Parsec Void String Integer\nsplitCap (match hexparser) \"0xA 000 0xFFFF\"\n```\n```haskell\n[Right (\"0xA\",10), Left \" 000 \", Right (\"0xFFFF\",65535)]\n```\n\n### Find all pattern matches, capture only the locations of the matched patterns\n\nFind all of the sections of the stream which are letters. Capture a list of\nthe offsets of the beginning of every pattern match.\n\n```haskell\nimport Data.Either\nlet letterOffset = getOffset \u003c* some letterChar :: Parsec Void String Int\nrights $ splitCap letterOffset \" a  bc \"\n```\n```haskell\n[1,4]\n```\n### Pattern match balanced parentheses\n\nFind groups of balanced nested parentheses. This is an example of a\n“context-free” grammar, a pattern that can't be expressed by a regular\nexpression. We can express the pattern with a recursive parser.\n\n```haskell\nimport Data.Functor (void)\nimport Data.Bifunctor (second)\nlet parens :: Parsec Void String ()\n    parens = do\n        char '('\n        manyTill\n            (void (noneOf \"()\") \u003c|\u003e void parens)\n            (char ')')\n        pure ()\n\nsecond fst \u003c$\u003e splitCap (match parens) \"(()) (()())\"\n```\n```haskell\n[Right \"(())\",Left \" \",Right \"(()())\"]\n```\n\n## Edit strings with `streamEdit`\n\nThe following examples show how to search for a pattern in a string of text\nand then edit the string of text to substitute in some replacement text\nfor the matched patterns.\n\n### Pattern match and replace with a constant\n\nReplace all carriage-return-newline occurances with newline.\n\n```haskell\nlet crnl = chunk \"\\r\\n\" :: Parsec Void String String\nstreamEdit crnl (const \"\\n\") \"1\\r\\n2\\r\\n\"\n```\n```haskell\n\"1\\n2\\n\"\n```\n\n### Pattern match and edit the matches\n\nReplace alphabetic characters with the next character in the alphabet.\n\n```haskell\nlet somelet = some letterChar :: Parsec Void String String\nstreamEdit somelet (fmap succ) \"HAL 9000\"\n```\n```haskell\n\"IBM 9000\"\n```\n\n### Pattern match and maybe edit the matches, or maybe leave them alone\n\nFind all of the string sections *`s`* which can be parsed as a\nhexadecimal number *`r`*,\nand if *`r≤16`*, then replace *`s`* with a decimal number. Uses the\n[`match`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec.html#v:match)\ncombinator.\n\n```haskell\nlet hexparser = chunk \"0x\" *\u003e hexadecimal :: Parsec Void String Integer\nstreamEdit (match hexparser) (\\(s,r) -\u003e if r\u003c=16 then show r else s) \"0xA 000 0xFFFF\"\n```\n```haskell\n\"10 000 0xFFFF\"\n```\n\n### Pattern match and edit the matches with IO with [`streamEditT`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEditT)\n\nFind an environment variable in curly braces and replace it with its\nvalue from the environment.\n\n```haskell\nimport System.Environment (getEnv)\nlet bracevar = char '{' *\u003e manyTill anySingle (char '}') :: ParsecT Void String IO String\nstreamEditT bracevar getEnv \"- {HOME} -\"\n```\n```haskell\n\"- /home/jbrock -\"\n```\n\n### Context-sensitive pattern match and edit the matches with [`streamEditT`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEditT)\n\nCapitalize the third letter in a string. The `capThird` parser searches for\nindividual letters, and it needs to remember how many times it has run so\nthat it can match successfully only on the third time that it finds a letter.\nTo enable the parser to remember how many times it has run, we'll\ncompose the parser with a `State` monad from\nthe `mtl` package. (Run in `ghci` with `cabal v2-repl -b mtl`). Because it has\nstateful memory, this parser is an example of a “context-sensitive” grammar.\n\n```haskell\nimport qualified Control.Monad.State.Strict as MTL\nimport Control.Monad.State.Strict (get, put, evalState)\nimport Data.Char (toUpper)\n\nlet capThird :: ParsecT Void String (MTL.State Int) String\n    capThird = do\n        x \u003c- letterChar\n        i \u003c- get\n        let i' = i+1\n        put i'\n        if i'==3 then pure [x] else empty\n\nflip evalState 0 $ streamEditT capThird (pure . fmap toUpper) \"a a a a a\"\n```\n```haskell\n\"a a A a a\"\n```\n\n\n### Pattern match, edit the matches, and count the edits with [`streamEditT`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEditT)\n\nFind and capitalize no more than three letters in a string, and return the\nedited string along with the number of letters capitalized. To enable the\neditor function to remember how many letters it has capitalized, we'll\nrun `streamEditT` in the `State` monad from the `mtl` package. Use this\ntechnique to get the same functionality as Python\n[`re.subn`](https://docs.python.org/3/library/re.html#re.subn).\n\n```haskell\nimport qualified Control.Monad.State.Strict as MTL\nimport Control.Monad.State.Strict (get, put, runState)\nimport Data.Char (toUpper)\n\nlet editThree :: Char -\u003e MTL.State Int String\n    editThree x = do\n        i \u003c- get\n        if i\u003c3\n            then do\n                put $ i+1\n                pure [toUpper x]\n            else pure [x]\n\nflip runState 0 $ streamEditT letterChar editThree \"a a a a a\"\n```\n```haskell\n(\"A A A a a\",3)\n```\n\n\n### Non-greedy pattern repetition\n\nThis is not a feature of this library, but it’s\na useful technique to know.\n\nHow do we do non-greedy repetition of a pattern `p`, like we would in Regex\nby writing `p*?`?\n\nBy using the\n[`manyTill_`](https://hackage.haskell.org/package/parser-combinators/docs/Control-Monad-Combinators.html#v:manyTill_) combinator. To repeat pattern `p` non-greedily, write\n`manyTill_ p q` where `q` is the entire rest of the parser.\n\nFor example, this parse fails because `many` repeats the pattern `letterChar`\ngreedily.\n\n```haskell\nflip parseMaybe \"aab\" $ do\n  a \u003c- many letterChar\n  b \u003c- single 'b'\n  pure (a,b)\n```\n```haskell\nNothing\n```\n\nTo repeat pattern `letterChar` non-greedily, use `manyTill_`.\n\n```haskell\nflip parseMaybe \"aab\" $ do\n  (a,b) \u003c- manyTill_ letterChar $ do\n    single 'b'\n  pure (a,b)\n```\n```haskell\nJust (\"aa\",'b')\n```\n\n\n# In the Shell\n\nIf we're going to have a viable `sed` replacement then we want to be able\nto use it easily from the command line. This\n[Stack script interpreter](https://docs.haskellstack.org/en/stable/GUIDE/#script-interpreter)\nscript will find decimal numbers in a stream and replace them with their double.\n\n```haskell\n#!/usr/bin/env stack\n{- stack\n  script\n  --resolver lts-16\n  --package megaparsec\n  --package replace-megaparsec\n-}\n-- https://docs.haskellstack.org/en/stable/GUIDE/#script-interpreter\n\nimport Data.Void\nimport Text.Megaparsec\nimport Text.Megaparsec.Char\nimport Text.Megaparsec.Char.Lexer\nimport Replace.Megaparsec\n\nmain = interact $ streamEdit (decimal :: Parsec Void String Int) (show . (*2))\n```\n\nIf you have\n[The Haskell Tool Stack](https://docs.haskellstack.org/en/stable/README/)\ninstalled then you can just copy-paste this into a file named `doubler.hs` and\nrun it. (On the first run Stack may need to download the dependencies.)\n\n```bash\n$ chmod u+x doubler.hs\n$ echo \"1 6 21 107\" | ./doubler.hs\n2 12 42 214\n```\n\n\n# Alternatives\n\nSome libraries that one might consider instead of this one.\n\n\u003chttp://hackage.haskell.org/package/regex-applicative\u003e\n\n\u003chttp://hackage.haskell.org/package/pcre-heavy\u003e\n\n\u003chttp://hackage.haskell.org/package/lens-regex-pcre\u003e\n\n\u003chttp://hackage.haskell.org/package/regex\u003e\n\n\u003chttp://hackage.haskell.org/package/pipes-parse\u003e\n\n\u003chttp://hackage.haskell.org/package/stringsearch\u003e\n\n\u003chttp://hackage.haskell.org/package/substring-parser\u003e\n\n\u003chttp://hackage.haskell.org/package/pcre-utils\u003e\n\n\u003chttp://hackage.haskell.org/package/template\u003e\n\n# Benchmarks\n\nThese benchmarks are intended to measure the wall-clock speed\nof *everything except the actual pattern-matching*. Speed of the\npattern-matching is the responsibility of the\n[__megaparsec__](http://hackage.haskell.org/package/megaparsec) and\n[__attoparsec__](http://hackage.haskell.org/package/attoparsec)\nlibraries.\n\nThe benchmark task is to find all of the one-character patterns `x` in a\ntext stream and replace them by a function which returns the constant\nstring `oo`. So, like the regex `s/x/oo/g`.\n\nWe have two benchmark input cases, which we call __dense__ and __sparse__.\n\nThe __dense__ case is ten megabytes of alternating spaces and `x`s\nlike\n\n```\nx x x x x x x x x x x x x x x x x x x x x x x x x x x x\n```\n\nThe __sparse__ case is ten megabytes of spaces with a single `x` in the middle\nlike\n\n```\n                         x\n```\n\nEach benchmark program reads the input from `stdin`, replaces `x` with `oo`,\nand writes the result to `stdout`. The time elapsed is measured by `perf stat`,\nand the best observed time is recorded.\n\nSee [replace-benchmark](https://github.com/jamesdbrock/replace-benchmark)\nfor details.\n\n| Program                                           | dense *ms*  | sparse *ms* |\n| :---                                              |      ---: |     ---:  |\n| Python 3.10.9 [`re.sub`](https://docs.python.org/3/library/re.html#re.sub) *repl* function | 557.22 | 35.47 |\n| Perl  v5.36.0 [`s///ge`](https://perldoc.perl.org/functions/s.html) function | 1208.66 | 12.61 |\n| [`Replace.Megaparsec.streamEdit`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEdit) `String` | 2921.25 | 2911.81 |\n| [`Replace.Megaparsec.streamEdit`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEdit) `ByteString` | 3743.25 | 757.21 |\n| [`Replace.Megaparsec.streamEdit`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEdit) `Text` | 3818.47 | 881.69 |\n| [`Replace.Attoparsec.ByteString.streamEdit`](https://hackage.haskell.org/package/replace-attoparsec/docs/Replace-Attoparsec-ByteString.html#v:streamEdit) | 3006.38 | 179.66 |\n| [`Replace.Attoparsec.Text.streamEdit`](https://hackage.haskell.org/package/replace-attoparsec/docs/Replace-Attoparsec-Text.html#v:streamEdit) | 3062.43 | 300.13 |\n| [`Replace.Attoparsec.Text.Lazy.streamEdit`](https://hackage.haskell.org/package/replace-attoparsec/docs/Replace-Attoparsec-Text-Lazy.html#v:streamEdit) | 3102.15 | 241.58 |\n| [`Text.Regex.Applicative.replace`](http://hackage.haskell.org/package/regex-applicative/docs/Text-Regex-Applicative.html#v:replace) `String` | 13875.25 | 4330.52 |\n| [`Text.Regex.PCRE.Heavy.gsub`](http://hackage.haskell.org/package/pcre-heavy/docs/Text-Regex-PCRE-Heavy.html#v:gsub) `Text` | ∞ | 113.27 |\n| [`Control.Lens.Regex.ByteString.match`](https://hackage.haskell.org/package/lens-regex-pcre/docs/Control-Lens-Regex-ByteString.html#v:match) | ∞ | 117.05 |\n| [`Control.Lens.Regex.Text.match`](https://hackage.haskell.org/package/lens-regex-pcre/docs/Control-Lens-Regex-Text.html#v:match) | ∞ | 35.97 |\n\n# Hypothetically Asked Questions\n\n1. *Could we write this library for __parsec__?*\n\n   No, because the\n   [`match`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec.html#v:match)\n   combinator doesn't exist for __parsec__. (I can't find it anywhere.\n   [Can it be written?](http://www.serpentine.com/blog/2014/05/31/attoparsec/#from-strings-to-buffers-and-cursors))\n\n2. *Is this a good idea?*\n\n   You may have\n   [heard it suggested](https://stackoverflow.com/questions/57667534/how-can-i-use-a-parser-in-haskell-to-find-the-locations-of-some-substrings-in-a/57712672#comment101804063_57667534)\n   that monadic parsers are better for pattern-matching when\n   the input stream is mostly signal, and regular expressions are better\n   when the input stream is mostly noise.\n\n   The premise of this library is that monadic parsers are great for finding\n   small signal patterns in a stream of otherwise noisy text.\n\n   Our reluctance to forego the speedup opportunities afforded by restricting\n   ourselves to regular grammars is an old superstition about\n   opportunities which\n   [remain mostly unexploited anyway](https://swtch.com/~rsc/regexp/regexp1.html).\n   The performance compromise of allowing stack memory allocation (a.k.a. pushdown\n   automata, a.k.a. context-free grammar) was once considered\n   [controversial for *general-purpose* programming languages](https://vanemden.wordpress.com/2014/06/18/how-recursion-got-into-programming-a-comedy-of-errors-3/).\n   I think we\n   can now resolve that controversy the same way for pattern matching languages.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaskell-github-trust%2Freplace-megaparsec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhaskell-github-trust%2Freplace-megaparsec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaskell-github-trust%2Freplace-megaparsec/lists"}