https://github.com/dyalog/bb
Brian Becker's Sandbox
https://github.com/dyalog/bb
apl dyalog dyalog-apl dyalog-library
Last synced: 11 months ago
JSON representation
Brian Becker's Sandbox
- Host: GitHub
- URL: https://github.com/dyalog/bb
- Owner: Dyalog
- License: mit
- Created: 2014-07-29T18:01:37.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2025-02-28T03:48:00.000Z (11 months ago)
- Last Synced: 2025-02-28T11:21:40.120Z (11 months ago)
- Topics: apl, dyalog, dyalog-apl, dyalog-library
- Language: APL
- Homepage:
- Size: 1.57 MB
- Stars: 7
- Watchers: 10
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: changes.dyalog
- License: LICENSE
Awesome Lists containing this project
README
bb
==
Brian's Playground
This repository contains stuff I've tinkered with, some of it may be interesting, some of it may be incomplete. The contents are provided "as is".
----------
## `xhtml.dyalog` namespace
Contains utilities to:
- convert HTML to XHTML which is subsequently able to be parsed by `⎕XML`
- search and extract elements from the result of `⎕XML`
### `HTMLtoXHTML`
`xhtml ← xhtml.HTMLtoXHTML html`
`html` is a character vector containing HTML
`xhtml` is a matrix form of the XHTML
`HTMLtoXHTML` assumes that the HTML is reasonably formed (e.g. open tags have corresponding closing tags). It handles most, but probably not all, HTMLisms of some elements not requiring a closing tag.
### `Xfind`
`boolvec ← xml xhtml.Xfind spec`
`xml` is an XML matrix (could be XHTML, but doesn't have to be)
`spec` is a delimited-string search specification (first character is the delimiter) in the form `/levels/elements/content/attribute/value` where:
- `levels`, if non-empty, specifies the level(s) to consider in the search. For example:
- `3` specifies level 3 elements only, `3-` level 3 and lower (to 0), `3+` level 3 and higher, `3-5` levels 3 through 5
- `elements` is a space-delimited list of elements to select
- `content` is case-insensitive content to search for using `⍷`
- `attribute` is a case-sensitive attribute name to exactly search for
- `value` is a case-insensitive attribute value to search for using `⍷`, if no `attribute` is specified, all attributes will be searched.
`boolvec` is a Boolean vector marking matching elements
Examples:
```
xml xhtml.Xfind '//table//class/results' ⍝ find all elements with a class attribute containing 'results'
xml xhtml.Xfind '/2////foobar' ⍝ find all level 2 elements with any attribute containing 'foobar'
xml xhtml.Xfind '/3+/th td/bloof' ⍝ find all level 3 or higher or elements containing 'bloof'
```
### `Xsel`
`elements ← xml Xsel boolvec`
`xml` is an XML matrix (could be XHTML, but doesn't have to be)
`boolvec` is a Boolean vector with as many elements as rows in `xml`
`elements` is a nested vector of elements marked by `boolvec` and their descendants
### Typical Use Case
In general, you'll convert some HTML to XHTML and then search for and extract element of interest to you. For example:
```
resp ← HttpCommand.Get 'someurl.com/somefile.html' ⍝ make a request
'request failed' ⎕SIGNAL (0 200≢resp.(rc HttpStatus))/777 ⍝ check that it succeeded
h ← resp.Data ⍝ grab the response data
x ← xhtml.HTMLtoXHTML h ⍝ convert to XHTML
mytables ← x xhtml.Xsel x xhtml.Xfind '//table//class/results' ⍝ extract all the elements with a class attribute containing "results"