{"id":13395168,"url":"https://github.com/mgdm/htmlq","last_synced_at":"2025-04-23T20:51:23.256Z","repository":{"id":38340961,"uuid":"185476675","full_name":"mgdm/htmlq","owner":"mgdm","description":"Like jq, but for HTML.","archived":false,"fork":false,"pushed_at":"2024-05-29T03:40:49.000Z","size":59,"stargazers_count":7272,"open_issues_count":42,"forks_count":118,"subscribers_count":34,"default_branch":"master","last_synced_at":"2025-04-08T17:14:23.407Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mgdm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-07T20:55:20.000Z","updated_at":"2025-04-08T10:45:47.000Z","dependencies_parsed_at":"2022-07-09T04:15:06.307Z","dependency_job_id":"69c025bb-f1cc-4036-859d-4b016efbd010","html_url":"https://github.com/mgdm/htmlq","commit_stats":{"total_commits":42,"total_committers":12,"mean_commits":3.5,"dds":0.5238095238095238,"last_synced_commit":"739cd363543cd5c36a2d7bcbbb3ab7e811205611"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgdm%2Fhtmlq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgdm%2Fhtmlq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgdm%2Fhtmlq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgdm%2Fhtmlq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mgdm","download_url":"https://codeload.github.com/mgdm/htmlq/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250514755,"owners_count":21443208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T17:01:44.810Z","updated_at":"2025-04-23T20:51:23.230Z","avatar_url":"https://github.com/mgdm.png","language":"Rust","readme":"# htmlq\nLike [`jq`](https://stedolan.github.io/jq/), but for HTML. Uses [CSS selectors](https://developer.mozilla.org/en-US/docs/Learn/CSS/Introduction_to_CSS/Selectors) to extract bits of content from HTML files.\n\n## Installation\n\n### [Cargo](https://crates.io/crates/htmlq)\n\n```sh\ncargo install htmlq\n```\n\n### [FreeBSD pkg](https://www.freshports.org/textproc/htmlq)\n\n```sh\npkg install htmlq\n```\n\n### [Homebrew](https://formulae.brew.sh/formula/htmlq)\n\n```sh\nbrew install htmlq\n```\n\n### [Scoop](https://scoop.sh/)\n\n```sh\nscoop install htmlq\n```\n\n## Usage\n\n```console\n$ htmlq -h\nhtmlq 0.4.0\nMichael Maclean \u003cmichael@mgdm.net\u003e\nRuns CSS selectors on HTML\n\nUSAGE:\n    htmlq [FLAGS] [OPTIONS] [--] [selector]...\n\nFLAGS:\n    -B, --detect-base          Try to detect the base URL from the \u003cbase\u003e tag in the document. If not found, default to\n                               the value of --base, if supplied\n    -h, --help                 Prints help information\n    -w, --ignore-whitespace    When printing text nodes, ignore those that consist entirely of whitespace\n    -p, --pretty               Pretty-print the serialised output\n    -t, --text                 Output only the contents of text nodes inside selected elements\n    -V, --version              Prints version information\n\nOPTIONS:\n    -a, --attribute \u003cattribute\u003e         Only return this attribute (if present) from selected elements\n    -b, --base \u003cbase\u003e                   Use this URL as the base for links\n    -f, --filename \u003cFILE\u003e               The input file. Defaults to stdin\n    -o, --output \u003cFILE\u003e                 The output file. Defaults to stdout\n    -r, --remove-nodes \u003cSELECTOR\u003e...    Remove nodes matching this expression before output. May be specified multiple\n                                        times\n\nARGS:\n    \u003cselector\u003e...    The CSS expression to select [default: html]\n$\n```\n\n## Examples\n\n### Using with cURL to find part of a page by ID\n\n```console\n$ curl --silent https://www.rust-lang.org/ | htmlq '#get-help'\n\u003cdiv class=\"four columns mt3 mt0-l\" id=\"get-help\"\u003e\n        \u003ch4\u003eGet help!\u003c/h4\u003e\n        \u003cul\u003e\n          \u003cli\u003e\u003ca href=\"https://doc.rust-lang.org\"\u003eDocumentation\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"https://users.rust-lang.org\"\u003eAsk a Question on the Users Forum\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"http://ping.rust-lang.org\"\u003eCheck Website Status\u003c/a\u003e\u003c/li\u003e\n        \u003c/ul\u003e\n        \u003cdiv class=\"languages\"\u003e\n            \u003clabel class=\"hidden\" for=\"language-footer\"\u003eLanguage\u003c/label\u003e\n            \u003cselect id=\"language-footer\"\u003e\n                \u003coption title=\"English (US)\" value=\"en-US\"\u003eEnglish (en-US)\u003c/option\u003e\n\u003coption title=\"French\" value=\"fr\"\u003eFrançais (fr)\u003c/option\u003e\n\u003coption title=\"German\" value=\"de\"\u003eDeutsch (de)\u003c/option\u003e\n\n            \u003c/select\u003e\n        \u003c/div\u003e\n      \u003c/div\u003e\n```\n\n### Find all the links in a page\n\n```console\n$ curl --silent https://www.rust-lang.org/ | htmlq --attribute href a\n/\n/tools/install\n/learn\n/tools\n/governance\n/community\nhttps://blog.rust-lang.org/\n/learn/get-started\nhttps://blog.rust-lang.org/2019/04/25/Rust-1.34.1.html\nhttps://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html\n[...]\n```\n\n### Get the text content of a post\n\n```console\n$ curl --silent https://nixos.org/nixos/about.html | htmlq  --text .main\n\n          About NixOS\n\nNixOS is a GNU/Linux distribution that aims to\nimprove the state of the art in system configuration management.  In\nexisting distributions, actions such as upgrades are dangerous:\nupgrading a package can cause other packages to break, upgrading an\nentire system is much less reliable than reinstalling from scratch,\nyou can’t safely test what the results of a configuration change will\nbe, you cannot easily undo changes to the system, and so on.  We want\nto change that.  NixOS has many innovative features:\n\n[...]\n```\n\n### Remove a node before output\n\nThere's a big SVG image in this page that I don't need, so here's how to remove it.\n\n```console\n$ curl --silent https://nixos.org/ | ./target/debug/htmlq '.whynix' --remove-nodes svg\n\u003cul class=\"whynix\"\u003e\n      \u003cli\u003e\n\n        \u003ch2\u003eReproducible\u003c/h2\u003e\n        \u003cp\u003e\n          Nix builds packages in isolation from each other. This ensures that they\n          are reproducible and don't have undeclared dependencies, so \u003cstrong\u003eif a\n            package works on one machine, it will also work on another\u003c/strong\u003e.\n        \u003c/p\u003e\n      \u003c/li\u003e\n      \u003cli\u003e\n\n        \u003ch2\u003eDeclarative\u003c/h2\u003e\n        \u003cp\u003e\n          Nix makes it \u003cstrong\u003etrivial to share development and build\n            environments\u003c/strong\u003e for your projects, regardless of what programming\n          languages and tools you’re using.\n        \u003c/p\u003e\n      \u003c/li\u003e\n      \u003cli\u003e\n\n        \u003ch2\u003eReliable\u003c/h2\u003e\n        \u003cp\u003e\n          Nix ensures that installing or upgrading one package \u003cstrong\u003ecannot\n            break other packages\u003c/strong\u003e. It allows you to \u003cstrong\u003eroll back to\n            previous versions\u003c/strong\u003e, and ensures that no package is in an\n          inconsistent state during an upgrade.\n        \u003c/p\u003e\n      \u003c/li\u003e\n    \u003c/ul\u003e\n```\n\n### Pretty print HTML\n\n(This is a bit of a work in progress)\n\n```console\n$ curl --silent https://mgdm.net | htmlq --pretty '#posts'\n\u003csection id=\"posts\"\u003e\n  \u003ch2\u003eI write about...\n  \u003c/h2\u003e\n  \u003cul class=\"post-list\"\u003e\n    \u003cli\u003e\n      \u003ctime datetime=\"2019-04-29 00:%i:1556496000\" pubdate=\"\"\u003e\n        29/04/2019\u003c/time\u003e\u003ca href=\"/weblog/nettop/\"\u003e\n        \u003ch3\u003eDebugging network connections on macOS with nettop\n        \u003c/h3\u003e\u003c/a\u003e\n      \u003cp\u003eUsing nettop to find out what network connections a program is trying to make.\n      \u003c/p\u003e\n    \u003c/li\u003e\n[...]\n```\n\n### Syntax highlighting with [`bat`](https://github.com/sharkdp/bat)\n\n```console\n$ curl --silent example.com | htmlq 'body' | bat --language html\n```\n\n\u003e \u003cimg alt=\"Syntax highlighted output\" width=\"700\" src=\"https://user-images.githubusercontent.com/2346707/132808980-db8991ff-9177-4cb7-a018-39ad94282374.png\" /\u003e\n","funding_links":[],"categories":["Rust","Command Line","\u003ca name=\"data\"\u003e\u003c/a\u003edata","Terminal","Developer Tools","HTML"],"sub_categories":["Like jq","Smart Shell","Command Line Tools","Open USP Tsukubai"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgdm%2Fhtmlq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmgdm%2Fhtmlq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgdm%2Fhtmlq/lists"}