https://github.com/transpect/css-tools
Parse styles in an XHTML document and expand as XML attributes (CSSa)
https://github.com/transpect/css-tools
css
Last synced: about 1 year ago
JSON representation
Parse styles in an XHTML document and expand as XML attributes (CSSa)
- Host: GitHub
- URL: https://github.com/transpect/css-tools
- Owner: transpect
- License: bsd-2-clause
- Created: 2015-11-01T15:30:09.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2024-10-16T09:06:36.000Z (over 1 year ago)
- Last Synced: 2024-10-17T21:54:24.666Z (over 1 year ago)
- Topics: css
- Language: XSLT
- Homepage:
- Size: 636 KB
- Stars: 8
- Watchers: 16
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# css:expand
Parse CSS styles of an XHTML document and expand them as XML attributes ([CSSa](https://github.com/le-tex/CSSa))
## Example 1
Consider this document as input:
```html
css-expand expample
.red {color:red}
This text has the color red.
```
Invoke `css:expand` in your XProc pipeline. Please note
that you have to include [xproc-utils](https://github.com/transpect/xproc-util).
See [test-css-expand.xpl](https://github.com/transpect/css-tools/blob/master/xpl/test-css-expand.xpl) for an example pipeline.
After running `css-expand`, internal and external CSS style information are expanded as XML attributes.
```html
css-expand expample
.red {color:red}
This text has the color red.
```
## Example 2
This example contains an `xml-stylesheet` processing instruction as an additional CSS source. It also demonstrates the handling of
shorthand properties and pseudo elements.
See [example2.xhtml](https://github.com/transpect/css-tools/blob/master/example/example2.xhtml) and [style2.css](https://github.com/transpect/css-tools/blob/master/example/style2.css)
Expanded output (`body` only):
```html
R
Test
```
## Example 3: Styling XML
Again using an `xml-stylesheet` processing instruction.
See [example3.xml](https://github.com/transpect/css-tools/blob/master/example/example3.xml) and [style3.css](https://github.com/transpect/css-tools/blob/master/example/style3.css)
Expanded output:
```xml
Introduction
Para
Section
First para
Last para
```
You can then postprocess the `@css:pseudo.…` attributes in order to actually prepend/append the `::before` and `::after` pseudo element content to the element’s text.
# css:parse
There is also a step `css:parse` in the same css.xpl library. This will only generate an XML representation from the CSS that was included per `link` or `style` elements (note that the input is expected to be in XHTML namespace; will add reading HTML5-serialized input later).
The output, as generated by the [REx](http://bottlecaps.de/rex/)-generated parser for above `style`, looks like:
```xml
.
red
{
color
:
red
}
```
This parser output will then be transformed to an XML representation like this:
```xml
.red {color:red}
key('class', 'red')
```
There is no schema yet for this XML representation.
The text content of the selector elements are XPath expressions that will be used generating
an XSLT stylesheet that, when applied to the input document, will yield the expanded document.
The purpose of expansion is to find out the actual styling that is applied to a document location, for example prior to Schematron checks for device compatibility or accessibility. Of course checking and possibly filtering the parsed CSS alone may be sufficient for quality control. In our EPUB builder, the CSS will be parsed and re-serialized by default.
## Rationale for switching to an EBNF-based parser
We wanted to be able to parse stuff like `calc()`, `scale()`, `translateX()`, `rgba()` in a less cumbersome way than with regexes.
## Comments and whitespace
There are customers who want to retain the comments in the re-serialized CSS. Comments in rules (in selectors and properties) have proved to be particularly difficult to handle. With the previous regex-based approach, we just split between the rules and if there were comments anywhere within a rule, they were pulled out of it and went immediately before it in the serialization. However, this is no longer possible with the REx-generated parser.
REx knows a pragma, [`/* ws:definition */`](https://github.com/transpect/css-tools/commit/9e62b1da02856e72a02d07e60fa9c14be9eecc89#diff-2d7bf9880e9266456c02e71a533a4755L59), that allows to treat any named production rules as ignorable whitespace. We tried that, but apart from not allowing retention of comments, it conflicted with the descendant combinator, which happens to be just whitespace. So we (@fr4nze and @gimsieke) tried to use the [`/* ws:explicit */`] pragma in selectors, but this led to lexical ambiguities when comments were also part of the ignorable whitespace rules. If we solved this problem, we would still be losing the comments. So we now tried this: Whitespace is significant, but allowed in many places. Comments are allowed in less places, in particular not in the middle of a selector or in properties. If the parser encounters such a comment, it will raise an error. [This error will be caught](https://github.com/transpect/css-tools/commit/9e62b1da02856e72a02d07e60fa9c14be9eecc89#diff-b9619c16ea8c9baad938469e14e126aaR74), and the CSS input will be resubmitted with all comments stripped away by means of a regex. An appropriate error message will be produced on the report port, allowing users to move comments to safer places in the input. If comments may be ignored altogether, then also this error message may be ignored.