https://github.com/turbot/steampipe-plugin-html
Use SQL to instantly query HTML resources. Open source CLI. No DB required.
https://github.com/turbot/steampipe-plugin-html
postgresql postgresql-fdw sql steampipe steampipe-plugin
Last synced: about 2 months ago
JSON representation
Use SQL to instantly query HTML resources. Open source CLI. No DB required.
- Host: GitHub
- URL: https://github.com/turbot/steampipe-plugin-html
- Owner: turbot
- License: apache-2.0
- Created: 2022-11-09T01:18:18.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-25T22:05:36.000Z (over 2 years ago)
- Last Synced: 2025-04-13T01:58:43.696Z (12 months ago)
- Topics: postgresql, postgresql-fdw, sql, steampipe, steampipe-plugin
- Language: Go
- Homepage: https://hub.steampipe.io/plugins/turbot/html
- Size: 161 KB
- Stars: 3
- Watchers: 8
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
html Plugin for Steampipe
## HTML plugin for Steampipe
Web pages often contain data in HTML tables. This plugin's `html_table` table downloads one or more tables from a web page into one or more CSV files that you can query using the [CSV](https://hub.steampipe.io/plugins/turbot/steampipe-plugin-csv).
Web pages also contain links. This plugin's `html_link` table queries for them.
The file `./config/html.spc` is used to define a local path to which downloaded HTML files will be saved.
## Get started
Install go, then:
```
$ git clone https://github.com/turbot/steampipe-plugin-html
$ cp ./config/html.spc ~/.steampipe/config
$ make
$ steampipe query
> select
base_name,
name,
path,
columns
from
html_table
where
url = 'https://simple.wikipedia.org/wiki/List_of_U.S._states_by_population'
and base_name = 'wiki'
```
```
+-----------+--------+---------------+------------------------------------------------------------------------
| base_name | name | path | columns
+-----------+--------+---------------+------------------------------------------------------------------------
| wiki | wiki_0 | /home/jon/csv | "Rankinstates&territories,2019","Rankinstates&territories,2010","State"
+-----------+--------+---------------+------------------------------------------------------------------------
```
In this example the plugin found one table on the page, and downloaded it as `/home/jon/csv/wiki_0.csv` (the `/home/jon/csv` path is specified in `./config/html.spc`).
Here is a query of that table.
```
with data as (
select
"State" as state,
"Rankinstates&territories,2010"::int as rank2010,
"Rankinstates&territories,2019"::int as rank2019,
replace("Percentchange,20102019[note1]",'%','')::numeric as pct_change,
replace("PercentofthetotalU.S.population,2018[note3]",'%','')::numeric as pct_of_us_pop
from
wiki_0
)
select
state,
rank2010,
rank2019,
- (rank2019 - rank2010) as rank_change,
pct_change,
pct_of_us_pop
from
data
order by
pct_change desc
```
```
+---------------+----------+----------+-------------+------------+---------------+
| state | rank2010 | rank2019 | rank_change | pct_change | pct_of_us_pop |
+---------------+----------+----------+-------------+------------+---------------+
| Utah | 35 | 30 | 5 | 16.0 | 0.96 |
| Texas | 2 | 2 | 0 | 15.3 | 8.68 |
| Colorado | 22 | 21 | 1 | 14.5 | 1.72 |
| NewYork | 3 | 3 | 0 | 14.2 | 6.44 |
| Nevada | 36 | 33 | 3 | 14.1 | 0.92 |
| Idaho | 40 | 39 | 1 | 14.0 | 0.53 |
| Arizona | 16 | 14 | 2 | 13.9 | 2.17 |
| NorthDakota | 49 | 48 | 1 | 13.3 | 0.23 |
```