{"id":13667177,"url":"https://github.com/orf/html-query","last_synced_at":"2025-05-15T09:05:12.275Z","repository":{"id":64167590,"uuid":"572720106","full_name":"orf/html-query","owner":"orf","description":"jq, but for HTML","archived":false,"fork":false,"pushed_at":"2025-03-31T09:58:09.000Z","size":1293,"stargazers_count":650,"open_issues_count":7,"forks_count":9,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-04T02:08:33.958Z","etag":null,"topics":["html","json","parser","rust"],"latest_commit_sha":null,"homepage":"https://orf.github.io/html-query/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/orf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-30T22:12:30.000Z","updated_at":"2025-03-31T09:56:42.000Z","dependencies_parsed_at":"2023-02-13T05:30:55.588Z","dependency_job_id":"2daa2723-8972-46a5-b09f-ac4daa8a25b0","html_url":"https://github.com/orf/html-query","commit_stats":{"total_commits":90,"total_committers":2,"mean_commits":45.0,"dds":0.2222222222222222,"last_synced_commit":"af3623e9d7f9aaab930b09c363a1da9531e8ece9"},"previous_names":["orf/hq"],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orf%2Fhtml-query","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orf%2Fhtml-query/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orf%2Fhtml-query/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orf%2Fhtml-query/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/orf","download_url":"https://codeload.github.com/orf/html-query/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248638702,"owners_count":21137700,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","json","parser","rust"],"created_at":"2024-08-02T07:00:33.072Z","updated_at":"2025-04-12T22:24:32.582Z","avatar_url":"https://github.com/orf.png","language":"HTML","funding_links":[],"categories":["HTML"],"sub_categories":[],"readme":"# hq\n\n[![Crates.io](https://img.shields.io/crates/v/html-query.svg)](https://crates.io/crates/html-query)\n\njq, but for HTML. [Try it in your browser here](https://orf.github.io/html-query/)\n\n![](./images/readme-example.gif)\n\n`hq` reads HTML and converts it into a JSON object based on a series of CSS selectors. The selectors are expressed\nin a similar way to JSON, but where the values are CSS selectors. For example:\n\n```\n{posts: .athing | [ {title: .titleline \u003e a, url: .titleline \u003e a | @(href)} ] }\n```\n\nThis will select all `.athing` elements, and it will create an array (`| [{...}]`) of objects for each element selected.\nThen for each element it will select the text of the `titleline \u003e a` element, and the `href` attribute (`| @(href)`).\n\nThe end result is the following structure:\n\n```json\n{\n  \"posts\": [\n    {\n      \"title\": \"...\",\n      \"url\": \"...\"\n    }\n  ]\n}\n```\n\n## Install\n\n`brew install hq`, or `cargo install html-query`\n\n## Special query syntax\n\n### Text\n\n`.foo | @text`\n\nThis will select the text content from the first element matching `.foo`.\n\n### Selecting attributes\n\n`.foo | @(href)`\n\nThis will select the `href` attribute from the first element matching `.foo`.\n\n### Parents\n\n`.foo | @parent`\n\nThis will return the parent element from the first element matching `.foo`.\n\n### Siblings\n\n`.foo | @sibling(1)`\n\nThis will return the sibling element from the first element matching `.foo`. \n\n\n## Examples\n\n### Full hacker news story extraction\n\n```\n{posts: .athing | [{href: .titleline \u003e a | @(href), title: .titleline \u003e a, meta: @sibling(1) | {user: .hnuser, posted: .age | @(title) }}]}\n```\n\nThis selects each `.athing` element, extracts the URL from the `href` attribute as well as the title. It then selects\nthe _sibling_ `.athing` element, and extracts the user and post time from that:\n\n```json\n{\n  \"posts\": [\n    {\n      \"title\": \"...\",\n      \"url\": \"...\",\n      \"meta\": {\n        \"posted\": \"...\",\n        \"user\": \"...\"\n      }\n    }\n  ]\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forf%2Fhtml-query","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Forf%2Fhtml-query","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forf%2Fhtml-query/lists"}