{"id":47444102,"url":"https://github.com/JuliaWeb/Gumbo.jl","last_synced_at":"2026-04-06T13:00:57.304Z","repository":{"id":16680994,"uuid":"19437010","full_name":"JuliaWeb/Gumbo.jl","owner":"JuliaWeb","description":"Julia wrapper around Google's gumbo C library for parsing HTML","archived":false,"fork":false,"pushed_at":"2025-01-02T19:16:10.000Z","size":146,"stargazers_count":159,"open_issues_count":12,"forks_count":26,"subscribers_count":7,"default_branch":"master","last_synced_at":"2026-03-05T18:53:51.043Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"nightscout/cgm-remote-monitor","license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JuliaWeb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-05-04T21:53:08.000Z","updated_at":"2026-02-23T14:19:37.000Z","dependencies_parsed_at":"2024-11-25T22:18:34.418Z","dependency_job_id":"f31f2881-cfe4-493f-9c01-9df66e55afb2","html_url":"https://github.com/JuliaWeb/Gumbo.jl","commit_stats":{"total_commits":165,"total_committers":22,"mean_commits":7.5,"dds":0.5818181818181818,"last_synced_commit":"afc2b2b83501d483e416d86063d98d567968fea7"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/JuliaWeb/Gumbo.jl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaWeb%2FGumbo.jl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaWeb%2FGumbo.jl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaWeb%2FGumbo.jl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaWeb%2FGumbo.jl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JuliaWeb","download_url":"https://codeload.github.com/JuliaWeb/Gumbo.jl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaWeb%2FGumbo.jl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31473271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-06T08:36:52.050Z","status":"ssl_error","status_checked_at":"2026-04-06T08:36:51.267Z","response_time":112,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-03-23T06:00:59.902Z","updated_at":"2026-04-06T13:00:57.288Z","avatar_url":"https://github.com/JuliaWeb.png","language":"Julia","funding_links":[],"categories":["Web Security"],"sub_categories":["HTTP and Web Frameworks"],"readme":"# Gumbo.jl\n\n[![version](https://juliahub.com/docs/Gumbo/version.svg)](https://juliahub.com/ui/Packages/Gumbo/mllB2) [![Build Status](https://travis-ci.org/JuliaWeb/Gumbo.jl.svg?branch=master)](https://travis-ci.org/JuliaWeb/Gumbo.jl) [![codecov.io](http://codecov.io/github/JuliaWeb/Gumbo.jl/coverage.svg?branch=master)](http://codecov.io/github/JuliaWeb/Gumbo.jl?branch=master) [![pkgeval](https://juliahub.com/docs/Gumbo/pkgeval.svg)](https://juliahub.com/ui/Packages/Gumbo/mllB2) [![deps](https://juliahub.com/docs/Gumbo/deps.svg)](https://juliahub.com/ui/Packages/Gumbo/mllB2?t=2)\n\nGumbo.jl is a Julia wrapper around\n[the gumbo library](https://github.com/google/gumbo-parser) for\nparsing HTML.\n\n\u003e [!WARNING]  \n\u003e The underlying C library is currently unmaintained. Use at your own risk.\n\nGetting started is very easy:\n\n```julia\njulia\u003e using Gumbo\n\njulia\u003e parsehtml(\"\u003ch1\u003e Hello, world! \u003c/h1\u003e\")\nHTML Document:\n\u003c!DOCTYPE \u003e\nHTMLElement{:HTML}:\n\u003cHTML\u003e\n  \u003chead\u003e\u003c/head\u003e\n  \u003cbody\u003e\n    \u003ch1\u003e\n       Hello, world!\n    \u003c/h1\u003e\n  \u003c/body\u003e\n\u003c/HTML\u003e\n```\n\nRead on for further documentation.\n\n## Installation\n\n```jl\nusing Pkg\nPkg.add(\"Gumbo\")\n```\n\nor activate `Pkg` mode in the REPL by typing `]`, and then:\n\n```\nadd Gumbo\n```\n\n## Basic usage\n\nThe workhorse is the `parsehtml` function, which takes a single\nargument, a valid UTF8 string, which is interpreted as HTML data to be\nparsed, e.g.:\n\n```julia\nparsehtml(\"\u003ch1\u003e Hello, world! \u003c/h1\u003e\")\n```\n\nParsing an HTML file named `filename`can be done using:\n\n```julia\njulia\u003e parsehtml(read(filename, String))\n```\n\nThe result of a call to `parsehtml` is an `HTMLDocument`, a type which\nhas two fields: `doctype`, which is the doctype of the parsed document\n(this will be the empty string if no doctype is provided), and `root`,\nwhich is a reference to the `HTMLElement` that is the root of the\ndocument.\n\nNote that gumbo is a very permissive HTML parser, designed to\ngracefully handle the insanity that passes for HTML out on the wild,\nwild web. It will return a valid HTML document for *any* input, doing\nall sorts of algorithmic gymnastics to twist what you give it into\nvalid HTML.\n\nIf you want an HTML validator, this is probably not your library. That\nsaid, `parsehtml` does take an optional `Bool` keyword argument,\n`strict` which, if `true`, causes an `InvalidHTMLError` to be thrown\nif the call to the gumbo C library produces any errors.\n\n## HTML types\n\nThis library defines a number of types for representing HTML.\n\n### `HTMLDocument`\n\n`HTMlDocument` is what is returned from a call to `parsehtml` it has a\n`doctype` field, which contains the doctype of the parsed document,\nand a `root` field, which is a reference to the root of the document.\n\n### `HTMLNode`s\n\nA document contains a tree of HTML Nodes, which are represented as\nchildren of the `HTMLNode` abstract type. The first of these is\n`HTMLElement`.\n\n### `HTMLElement`\n\n```julia\nmutable struct HTMLElement{T} \u003c: HTMLNode\n    children::Vector{HTMLNode}\n    parent::HTMLNode\n    attributes::Dict{String, String}\nend\n```\n\n`HTMLElement` is probably the most interesting and frequently used\ntype. An `HTMLElement` is parameterized by a symbol representing its\ntag. So an `HTMLElement{:a}` is a different type from an\n`HTMLElement{:body}`, etc. An empty `HTMLElement` of a given tag can be\nconstructed as follows:\n\n```julia\njulia\u003e HTMLElement(:div)\nHTMLElement{:div}:\n\u003cdiv\u003e\u003c/div\u003e\n```\n\n`HTMLElement`s have a `parent` field, which refers to another\n`HTMLNode`. `parent` will always be an `HTMLElement`, unless the\nelement has no parent (as is the case with the root of a document), in\nwhich case it will be a `NullNode`, a special type of `HTMLNode` which\nexists for just this purpose. Empty `HTMLElement`s constructed as in\nthe example above will also have a `NullNode` for a parent.\n\n`HTMLElement`s also have `children`, which is a vector of\n`HTMLElement` containing the children of this element, and\n`attributes`, which is a `Dict` mapping attribute names to values.\n\n`HTMLElement`s implement `getindex`, `setindex!`, and `push!`;\nindexing into or pushing onto an `HTMLElement` operates on its\nchildren array.\n\nThere are a number of convenience methods for working with `HTMLElement`s:\n\n- `tag(elem)`\n  get the tag of this element as a symbol\n\n- `attrs(elem)`\n  return the attributes dict of this element\n\n- `children(elem)`\n   return the children array of this element\n\n- `getattr(elem, name)`\n  get the value of attribute `name` or raise a `KeyError`. Also\n  supports being called with a default value (`getattr(elem, name,\n  default)`) or function (`getattr(f, elem, name)`).\n\n- `setattr!(elem, name, value)`\n  set the value of attribute `name` to `value`\n\n### `HTMLText`\n\n```jl\ntype HTMLText \u003c: HTMLNode\n    parent::HTMLNode\n    text::String\nend\n```\n\nRepresents text appearing in an HTML document. For example:\n\n```julia\njulia\u003e doc = parsehtml(\"\u003ch1\u003e Hello, world! \u003c/h1\u003e\")\nHTML Document:\n\u003c!DOCTYPE \u003e\nHTMLElement{:HTML}:\n\u003cHTML\u003e\n  \u003chead\u003e\u003c/head\u003e\n  \u003cbody\u003e\n    \u003ch1\u003e\n       Hello, world!\n    \u003c/h1\u003e\n  \u003c/body\u003e\n\u003c/HTML\u003e\n\njulia\u003e doc.root[2][1][1]\nHTML Text:  Hello, world!\n```\n\nThis type is quite simple, just a reference to its parent and the\nactual text it represents (this is also accessible by a `text`\nfunction). You can construct `HTMLText` instances as follows:\n\n```jl\njulia\u003e HTMLText(\"Example text\")\nHTML Text: Example text\n```\n\nJust as with `HTMLElement`s, the parent of an instance so constructed\nwill be a `NullNode`.\n\n\n## Tree traversal\n\nUse the iterators defined in\n[AbstractTrees.jl](https://github.com/Keno/AbstractTrees.jl/), e.g.:\n\n```julia\njulia\u003e using AbstractTrees\n\njulia\u003e using Gumbo\n\njulia\u003e doc = parsehtml(\"\"\"\n                     \u003chtml\u003e\n                       \u003cbody\u003e\n                         \u003cdiv\u003e\n                           \u003cp\u003e\u003c/p\u003e \u003ca\u003e\u003c/a\u003e \u003cp\u003e\u003c/p\u003e\n                         \u003c/div\u003e\n                         \u003cdiv\u003e\n                            \u003cspan\u003e\u003c/span\u003e\n                         \u003c/div\u003e\n                        \u003c/body\u003e\n                     \u003c/html\u003e\n                     \"\"\");\n\njulia\u003e for elem in PreOrderDFS(doc.root) println(tag(elem)) end\nHTML\nhead\nbody\ndiv\np\na\np\ndiv\nspan\n\njulia\u003e for elem in PostOrderDFS(doc.root) println(tag(elem)) end\nhead\np\na\np\ndiv\nspan\ndiv\nbody\nHTML\n\njulia\u003e for elem in StatelessBFS(doc.root) println(tag(elem)) end\nHTML\nhead\nbody\ndiv\ndiv\np\na\np\nspan\n\njulia\u003e\n```\n\n## TODOS\n\n- support CDATA\n- support comments\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJuliaWeb%2FGumbo.jl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJuliaWeb%2FGumbo.jl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJuliaWeb%2FGumbo.jl/lists"}