{"id":13582216,"url":"https://github.com/msoap/html2data","last_synced_at":"2025-07-27T19:06:13.675Z","repository":{"id":54363568,"uuid":"49379940","full_name":"msoap/html2data","owner":"msoap","description":"Library and cli for extracting data from HTML via CSS selectors","archived":false,"fork":false,"pushed_at":"2024-09-30T21:05:08.000Z","size":7496,"stargazers_count":69,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-24T03:57:57.552Z","etag":null,"topics":["cli","css-selector","extract-data","golang","homebrew","html","library","parser","scrapping"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/msoap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-01-10T18:40:23.000Z","updated_at":"2025-02-23T17:27:24.000Z","dependencies_parsed_at":"2024-01-15T03:57:16.250Z","dependency_job_id":"ae935417-ca4f-4538-a0a2-4f64a772ebcd","html_url":"https://github.com/msoap/html2data","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msoap%2Fhtml2data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msoap%2Fhtml2data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msoap%2Fhtml2data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msoap%2Fhtml2data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/msoap","download_url":"https://codeload.github.com/msoap/html2data/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246884767,"owners_count":20849554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","css-selector","extract-data","golang","homebrew","html","library","parser","scrapping"],"created_at":"2024-08-01T15:02:30.068Z","updated_at":"2025-04-02T20:08:47.570Z","avatar_url":"https://github.com/msoap.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"html2data\n=========\n\n[![Go Reference](https://pkg.go.dev/badge/github.com/msoap/html2data.svg)](https://pkg.go.dev/github.com/msoap/html2data)\n[![Go](https://github.com/msoap/html2data/actions/workflows/go.yml/badge.svg)](https://github.com/msoap/html2data/actions/workflows/go.yml)\n[![Coverage Status](https://coveralls.io/repos/github/msoap/html2data/badge.svg?branch=master)](https://coveralls.io/github/msoap/html2data?branch=master)\n[![Sourcegraph](https://sourcegraph.com/github.com/msoap/html2data/-/badge.svg)](https://sourcegraph.com/github.com/msoap/html2data?badge)\n[![Report Card](https://goreportcard.com/badge/github.com/msoap/html2data)](https://goreportcard.com/report/github.com/msoap/html2data)\n\nLibrary and cli-utility for extracting data from HTML via CSS selectors\n\nInstall\n-------\n\nInstall package and command line utility:\n\n    go install github.com/msoap/html2data/cmd/html2data@latest\n\nInstall package only:\n\n    go get -u github.com/msoap/html2data\n\nMethods\n-------\n\n  * `FromReader(io.Reader)` - create document for parse\n  * `FromURL(URL, [config URLCfg])` - create document from http(s) URL\n  * `FromFile(file)` - create document from local file\n  * `doc.GetData(css map[string]string)` - get texts by CSS selectors\n  * `doc.GetDataFirst(css map[string]string)` - get texts by CSS selectors, get first entry for each selector or \"\"\n  * `doc.GetDataNested(outerCss string, css map[string]string)` - extract nested data by CSS-selectors from another CSS-selector\n  * `doc.GetDataNestedFirst(outerCss string, css map[string]string)` - extract nested data by CSS-selectors from another CSS-selector, get first entry for each selector or \"\"\n  * `doc.GetDataSingle(css string)` - get one result by one CSS selector\n\n  or with config:\n\n  * `doc.GetData(css map[string]string, html2data.Cfg{DontTrimSpaces: true})`\n  * `doc.GetDataNested(outerCss string, css map[string]string, html2data.Cfg{DontTrimSpaces: true})`\n  * `doc.GetDataSingle(css string, html2data.Cfg{DontTrimSpaces: true})`\n\nPseudo-selectors\n----------------\n\n  * `:attr(attr_name)` - getting attribute instead of text, for example getting urls from links: `a:attr(href)`\n  * `:html` - getting HTML instead of text\n  * `:get(N)` - getting n-th element from list\n\nExample\n-------\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"log\"\n\n    \"github.com/msoap/html2data\"\n)\n\nfunc main() {\n    doc := html2data.FromURL(\"http://example.com\")\n    // or with config\n    // doc := html2data.FromURL(\"http://example.com\", html2data.URLCfg{UA: \"userAgent\", TimeOut: 10, DontDetectCharset: false})\n    if doc.Err != nil {\n        log.Fatal(doc.Err)\n    }\n\n    // get title\n    title, _ := doc.GetDataSingle(\"title\")\n    fmt.Println(\"Title is:\", title)\n\n    title, _ = doc.GetDataSingle(\"title\", html2data.Cfg{DontTrimSpaces: true})\n    fmt.Println(\"Title as is, with spaces:\", title)\n\n    texts, _ := doc.GetData(map[string]string{\"h1\": \"h1\", \"links\": \"a:attr(href)\"})\n    // get all H1 headers:\n    if textOne, ok := texts[\"h1\"]; ok {\n        for _, text := range textOne {\n            fmt.Println(text)\n        }\n    }\n    // get all urls from links\n    if links, ok := texts[\"links\"]; ok {\n        for _, text := range links {\n            fmt.Println(text)\n        }\n    }\n}\n```\n\nCommand line utility\n--------------------\n\n[![Homebrew formula exists](https://img.shields.io/badge/homebrew-🍺-d7af72.svg)](https://github.com/msoap/html2data#install-1)\n\n### Usage\n\n    html2data [options] URL \"css selector\"\n    html2data [options] URL :name1 \"css1\" :name2 \"css2\"...\n    html2data [options] file.html \"css selector\"\n    cat file.html | html2data \"css selector\"\n\n### Options\n\n  * `-user-agent=\"Custom UA\"` -- set custom user-agent\n  * `-find-in=\"outer.css.selector\"` -- search in the specified elements instead document\n  * `-json` -- get result as JSON\n  * `-dont-trim-spaces` -- get text as is\n  * `-dont-detect-charset` -- don't detect charset and convert text\n  * `-timeout=10` -- setting timeout when loading the URL\n\n### Install\n\nDownload binaries from: [releases](https://github.com/msoap/html2data/releases) (OS X/Linux/Windows/RaspberryPi)\n\nOr install from homebrew (MacOS):\n\n    brew tap msoap/tools\n    brew install html2data\n    # update:\n    brew upgrade html2data\n\nUsing snap (Ubuntu or any Linux distribution with snap):\n\n    # install stable version:\n    sudo snap install html2data\n    \n    # install the latest version:\n    sudo snap install --edge html2data\n    \n    # update\n    sudo snap refresh html2data\n\nFrom source:\n\n    go get -u github.com/msoap/html2data/cmd/html2data\n\n### examples\n\nGet title of page:\n\n    html2data https://go.dev/ title\n\nLast blog posts:\n\n    html2data https://go.dev/blog/ 'div#blogindex p.blogtitle a'\n\nGetting RSS URL:\n\n    html2data https://go.dev/blog/ 'link[type=\"application/atom+xml\"]:attr(href)'\n\nMore examples from [wiki](https://github.com/msoap/html2data/wiki/Examples).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsoap%2Fhtml2data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmsoap%2Fhtml2data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsoap%2Fhtml2data/lists"}