Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mgdm/htmlq

Like jq, but for HTML.
https://github.com/mgdm/htmlq

Last synced: 4 days ago
JSON representation

Like jq, but for HTML.

Awesome Lists containing this project

README

        

# htmlq
Like [`jq`](https://stedolan.github.io/jq/), but for HTML. Uses [CSS selectors](https://developer.mozilla.org/en-US/docs/Learn/CSS/Introduction_to_CSS/Selectors) to extract bits of content from HTML files.

## Installation

### [Cargo](https://crates.io/crates/htmlq)

```sh
cargo install htmlq
```

### [FreeBSD pkg](https://www.freshports.org/textproc/htmlq)

```sh
pkg install htmlq
```

### [Homebrew](https://formulae.brew.sh/formula/htmlq)

```sh
brew install htmlq
```

### [Scoop](https://scoop.sh/)

```sh
scoop install htmlq
```

## Usage

```console
$ htmlq -h
htmlq 0.4.0
Michael Maclean
Runs CSS selectors on HTML

USAGE:
htmlq [FLAGS] [OPTIONS] [--] [selector]...

FLAGS:
-B, --detect-base Try to detect the base URL from the tag in the document. If not found, default to
the value of --base, if supplied
-h, --help Prints help information
-w, --ignore-whitespace When printing text nodes, ignore those that consist entirely of whitespace
-p, --pretty Pretty-print the serialised output
-t, --text Output only the contents of text nodes inside selected elements
-V, --version Prints version information

OPTIONS:
-a, --attribute Only return this attribute (if present) from selected elements
-b, --base Use this URL as the base for links
-f, --filename The input file. Defaults to stdin
-o, --output The output file. Defaults to stdout
-r, --remove-nodes ... Remove nodes matching this expression before output. May be specified multiple
times

ARGS:
... The CSS expression to select [default: html]
$
```

## Examples

### Using with cURL to find part of a page by ID

```console
$ curl --silent https://www.rust-lang.org/ | htmlq '#get-help'


Get help!




Language

English (en-US)
Français (fr)
Deutsch (de)




```

### Find all the links in a page

```console
$ curl --silent https://www.rust-lang.org/ | htmlq --attribute href a
/
/tools/install
/learn
/tools
/governance
/community
https://blog.rust-lang.org/
/learn/get-started
https://blog.rust-lang.org/2019/04/25/Rust-1.34.1.html
https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html
[...]
```

### Get the text content of a post

```console
$ curl --silent https://nixos.org/nixos/about.html | htmlq --text .main

About NixOS

NixOS is a GNU/Linux distribution that aims to
improve the state of the art in system configuration management. In
existing distributions, actions such as upgrades are dangerous:
upgrading a package can cause other packages to break, upgrading an
entire system is much less reliable than reinstalling from scratch,
you can’t safely test what the results of a configuration change will
be, you cannot easily undo changes to the system, and so on. We want
to change that. NixOS has many innovative features:

[...]
```

### Remove a node before output

There's a big SVG image in this page that I don't need, so here's how to remove it.

```console
$ curl --silent https://nixos.org/ | ./target/debug/htmlq '.whynix' --remove-nodes svg


  • Reproducible



    Nix builds packages in isolation from each other. This ensures that they
    are reproducible and don't have undeclared dependencies, so if a
    package works on one machine, it will also work on another
    .



  • Declarative



    Nix makes it trivial to share development and build
    environments
    for your projects, regardless of what programming
    languages and tools you’re using.



  • Reliable



    Nix ensures that installing or upgrading one package cannot
    break other packages
    . It allows you to roll back to
    previous versions
    , and ensures that no package is in an
    inconsistent state during an upgrade.




```

### Pretty print HTML

(This is a bit of a work in progress)

```console
$ curl --silent https://mgdm.net | htmlq --pretty '#posts'

I write about...





  • Debugging network connections on macOS with nettop


    Using nettop to find out what network connections a program is trying to make.



  • [...]
    ```

    ### Syntax highlighting with [`bat`](https://github.com/sharkdp/bat)

    ```console
    $ curl --silent example.com | htmlq 'body' | bat --language html
    ```

    > Syntax highlighted output