An open API service indexing awesome lists of open source software.

https://github.com/asg017/har-to-sqlite


https://github.com/asg017/har-to-sqlite

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

        

# har-to-sqlite

Load a [HAR]() (HTTP Archive Format) file into a SQLite database.

## Install

`har-to-sqlite` depends on [Deno](https://deno.land/). Install with:

```bash
deno install --allow-read --allow-write -n har-to-sqlite https://deno.land/[email protected]/cli.ts

```

## Usage

```
$ har-to-sqlite input1.har archived.db
$ har-to-sqlite input2.har input3.har input4.har archived.db
$ har-to-sqlite inputs/*.har archived.db

$ har-to-sqlite -n "Wiki #1" input.har wiki-scrape.db
```

## API Reference

## JS API

There is a JS API, I don't feel like documenting it, see usage in `cli.ts`.

### Generated Tables

`har-to-sqlite` creates a few tables to store information: `archives`, `pages`, and `entries`. Every `.har` file is a single row in `archives`. Each archive has several "pages", where every page is a single row in `pages` (found in `.log.pages` in the underlying JSON object). Every page has several "entries", where every entry is one network request captured by the archive (found in `.log.entries`), and every entry has a single row in `.entries`. `har-to-sqlite` will generate these tables in the given `.db` file with the following schemas.

#### `archives`

| Column name | Type | Description |
| ----------------- | ----------------------------------- | ------------------------------------------------------------------------- |
| `rowid` | `INTEGER PRIMARY KEY AUTOINCREMENT` | - |
| `name` | `TEXT` | Optional name given to the archive, usually with the `--name`/`-n` flags. |
| `version` | `TEXT` | `.log.version` |
| `creator_name` | `TEXT` | `.log.creator.name` |
| `creator_version` | `TEXT` | `.log.creator.version` |
| `browser_name` | `TEXT` | `.log.browser.name` |
| `browser_version` | `TEXT` | `.log.browser.version` |

### `pages`

| Column name | Type | Description |
| -------------------- | ----------------------------------- | ------------------------------------------------------------- |
| `rowid` | `INTEGER PRIMARY KEY AUTOINCREMENT` | - |
| `archive` | `INTEGER` | Foreign key to `archives.rowid` to which the page belongs to. |
| `started` | `TEXT` | `.log.pages[].started` |
| `id` | `TEXT` | `.log.pages[].id` |
| `title` | `TEXT` | `.log.pages[].title` |
| `timing_contentload` | `INTEGER` | `.log.pages[].pageTimings.onContentLoad` |
| `timing_load` | `INTEGER` | `.log.pages[].pageTimings.onLoad` |

### `entries`

| Column name | Type | Description |
| ----------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `page` | `INTEGER` | Foreign key to `pages.rowid` to which the entry belongs to. |
| `started` | `TEXT` | `.log.entries[].startedDateTime` |
| `request_method` | `TEXT` | `.log.entries[].request.method` |
| `request_url` | `TEXT` | `.log.entries[].request.url` |
| `request_headers` | `TEXT` | `.log.entries[].request.headers` (raw JSON array ) |
| `request_headers_obj` | `TEXT` | `.log.entries[].request.headers` (JSON object, where the keys are ".name" and the values ".value". This may override legitmate headers.) |
| `request_cookies` | `TEXT` | `.log.entries[].` |
| `request_cookies_obj` | `TEXT` | `.log.entries[].` |
| `request_querystring` | `TEXT` | `.log.entries[].` |
| `request_querystring_obj` | `TEXT` | `.log.entries[].` |
| `request_body_size` | `INTEGER` | `.log.entries[].request.bodySize` |
| `request_headers_size` | `INTEGER` | `.log.entries[].request.headersSize` |
| `request_postdata_mimetype` | `TEXT` | `.log.entries[].` |
| `request_postdata_params` | `TEXT` | `.log.entries[].` |
| `request_postdata_params_obj` | `TEXT` | `.log.entries[].` |
| `request_postdata_text` | `TEXT` | `.log.entries[].` |
| `request_postdata_comment` | `TEXT` | `.log.entries[].` |
| `response_status` | `INTEGER` | `.log.entries[].response.status` |
| `response_status_text` | `TEXT` | `.log.entries[].response.statusText` |
| `response_http_version` | `TEXT` | `.log.entries[].response.httpVersion` |
| `response_headers` | `TEXT` | `.log.entries[].` |
| `response_headers_obj` | `TEXT` | `.log.entries[].` |
| `response_cookies` | `TEXT` | `.log.entries[].` |
| `response_cookies_obj` | `TEXT` | `.log.entries[].` |
| `response_redirect_url` | `TEXT` | `.log.entries[].response.redirectURL` |
| `response_headers_size` | `INTEGER` | `.log.entries[].response.headersSize` |
| `response_body_size` | `INTEGER` | `.log.entries[].response.bodySize` |
| `response_content_mimetype` | `TEXT` | `.log.entries[].response.content.mimeType` |
| `response_content_size` | `INTEGER` | `.log.entries[].response.content.size` |
| `response_content` | `BLOB` | A binary BLOB of `.log.entries[].response.content.text`, encoded based on `.log.entries[].response.content.encoding` |
| `timings_blocked` | `INTEGER` | `.log.entries[].timings.blocked` |
| `timings_dns` | `INTEGER` | `.log.entries[].timings.dns` |
| `timings_connect` | `INTEGER` | `.log.entries[].timings.connect` |
| `timings_ssl` | `INTEGER` | `.log.entries[].timings.ssl` |
| `timings_send` | `INTEGER` | `.log.entries[].timings.send` |
| `timings_wait` | `INTEGER` | `.log.entries[].timings.wait` |
| `timings_receive` | `INTEGER` | `.log.entries[].timings.receive` |
| `server_ip_address` | `TEXT` | `.log.entries[].serverIPAddress` |
| `connection` | `TEXT` | `.log.entries[].connection` |

## Common Recipes

Write all MP4s found in a generated HAR sqlite db onto the disk as `[rowid].db`.

```
sqlite3 output.db "select writefile(rowid || '.mp4', response_content) from entries where response_mimetype = 'video/mp4';"
```

## TODO

- [ ] entries cache
- [ ] fix globbing (deno globs don't like regex-compatible filepaths)
- [ ] Automated tests