Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kseo/tagsoup-megaparsec
A Tag token parser and Tag specific parsing combinators
https://github.com/kseo/tagsoup-megaparsec
Last synced: 10 days ago
JSON representation
A Tag token parser and Tag specific parsing combinators
- Host: GitHub
- URL: https://github.com/kseo/tagsoup-megaparsec
- Owner: kseo
- License: bsd-3-clause
- Created: 2016-07-12T08:08:26.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2020-01-29T17:20:47.000Z (almost 5 years ago)
- Last Synced: 2024-10-11T23:22:17.549Z (27 days ago)
- Language: Haskell
- Homepage: https://hackage.haskell.org/package/tagsoup-megaparsec
- Size: 11.7 KB
- Stars: 10
- Watchers: 3
- Forks: 8
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# tagsoup-megaparsec
[![Hackage](https://img.shields.io/hackage/v/tagsoup-megaparsec.svg?style=flat)](https://hackage.haskell.org/package/tagsoup-megaparsec)
[![Build Status](https://travis-ci.org/kseo/tagsoup-megaparsec.svg?branch=master)](https://travis-ci.org/kseo/tagsoup-megaparsec)A Tag token parser and Tag specific parsing combinators, inspired by [parsec-tagsoup][parsec-tagsoup] and [tagsoup-parsec][tagsoup-parsec]. This library helps you build a megaparsec parser using TagSoup's Tag as tokens.
[parsec-tagsoup]: https://hackage.haskell.org/package/parsec-tagsoup
[tagsoup-parsec]: https://hackage.haskell.org/package/tagsoup-parsec## Usage
### DOM parser
We can build a DOM parser using TagSoup's Tag as a token type in Megaparsec. Let's start the example with importing all the required modules.
```haskell
import Data.Text ( Text )
import qualified Data.Text as T
import Data.HashMap.Strict ( HashMap )
import qualified Data.HashMap.Strict as HMS
import Text.HTML.TagSoup
import Text.Megaparsec
import Text.Megaparsec.ShowToken
import Text.Megaparsec.TagSoup
```Here's the data types used to represent our DOM. `Node` is either `ElementNode` or `TextNode`. `TextNode` data constructor takes a `Text` and `ElementNode` data constructor takes an `Element` whose fields consist of `elementName`, `elementAttrs` and `elementChildren`.
```haskell
type AttrName = Text
type AttrValue = Textdata Element = Element
{ elementName :: !Text
, elementAttrs :: !(HashMap AttrName AttrValue)
, elementChildren :: [Node]
} deriving (Eq, Show)data Node =
ElementNode Element
| TextNode Text
deriving (Eq, Show)
```Our `Parser` is defined as a type synonym for `TagParser Text`. `TagParser` takes a type argument representing the string type and we chose `Text` here. We can pass any of `StringLike` types such as `String` and `ByteString`.
```haskell
type Parser = TagParser Text
```There is nothing new in defining a parser except that our token is `Tag Text` instead of `Char`. We can use any Megaparsec combinators we want as usual. Our `node` parser is either `element` or `text` so we used the choice combinator `(<|>)`.
```haskell
node :: Parser Node
node = ElementNode <$> element
<|> TextNode <$> text
```tagsoup-megaparsec library provides some `Tag` specific combinators.
* `tagText`: parse a chunk of text.
* `anyTagOpen`/`anyTagClose`: parse any opening and closing tag.`text` and `element` parsers are built using these combinators.
NOTE: We don't need to worry about the text blocks containing only whitespace characters because all the parsers provided by tagsoup-megaparsec are lexeme parsers.
```haskell
text :: Parser Text
text = fromTagText <$> tagTextelement :: Parser Element
element = do
t@(TagOpen tagName attrs) <- anyTagOpen
children <- many node
closeTag@(TagClose tagName') <- anyTagClose
if tagName == tagName'
then return $ Element tagName (HMS.fromList attrs) children
else fail $ "unexpected close tag" ++ showToken closeTag
```Now it's time to define our driver. `parseDOM` takes a `Text` and returns either `ParseError` or `[Node]`. We used `many` combinator to represent that there are zero or more occurences of `node`. We used TagSoup's `parseTags` to create tokens and passed it to Megaparsec's `parse` function.
```haskell
parseDOM :: Text -> Either ParseError [Node]
parseDOM html = parse (many node) "" tags
where tags = parseTags html
```