https://github.com/adobe/helix-importer
Foundation tools for importing website content into that can be consumed in an Helix project.
https://github.com/adobe/helix-importer
helix library
Last synced: 3 months ago
JSON representation
Foundation tools for importing website content into that can be consumed in an Helix project.
- Host: GitHub
- URL: https://github.com/adobe/helix-importer
- Owner: adobe
- License: apache-2.0
- Created: 2020-09-09T12:58:03.000Z (almost 6 years ago)
- Default Branch: main
- Last Pushed: 2026-02-24T16:54:50.000Z (4 months ago)
- Last Synced: 2026-02-24T20:59:50.783Z (4 months ago)
- Topics: helix, library
- Language: JavaScript
- Homepage:
- Size: 5.16 MB
- Stars: 13
- Watchers: 22
- Forks: 19
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Helix Importer
Foundation tools for importing website content into that can be consumed in an Helix project.
Basic concept of the importer: for an input url, transform the DOM and convert it into a Markdown / docx file.
## Importer
An importer must extends [PageImporter](src/importer/PageImporter.js) and implement the `fetch` and `process` method. The general idea is that `fetch` receives the url to import and is responsible to return the HTML. `process` receives the corresponding Document in order to filter / rearrange / reshuffle the DOM before it gets processed by the Markdown transformer. `process` computes and defines the list of [PageImporterResource](src/importer/PageImporterResource.ts) (could be more than one), each resource being transformed as a Markdown document.
Goal of the importer is to get rid of the generic DOM elements like the header / footer, the nav... and all elements that are common to all pages in order to get the unique piece(s) of content per page.
### HTML2x helpers
[HTML2x](src/importer/HTML2x.js) methods (`HTML2md` and `HTML2docx`) are convienence methods to run an import. As input, they take:
- `URL`: URL of the page to import
- `document`: the DOM element to import - a Document object or a string (see `createDocumentFromString` for the string case)
- `transformerCfg`: object with the transformation "rules". Object can be either:
- `{ transformDOM: ({ url, document, html, params }) => { ... return element-to-convert }, generateDocumentPath: ({ url, document, html, params }) => { ... return path-to-target; }}` for a single mapping between one input document / one output file
- `{ transform: ({ url, document, html, params }) => { ... return [{ element: first-element-to-convert, path: first-path-to-target }, ...] }` for a mapping one input document / multiple output files (useful to generate multiple docx from a single web page)
- `config`: object with several config properties
- `createDocumentFromString`: this config is required if you use the methods in a non-browser context and want to pass `document` param as string. This method receives the HTML to parse as a string and must return a Document object.
- `setBackgroundImagesFromCSS`: set to false to disable the `background-image` inlining in the DOM.
### Importer UI
The Helix Importer has a dedicated browser UI: see https://github.com/adobe/helix-importer-ui
## Installation
```shell
npm i https://github.com/adobe/helix-importer
```
TODO: publish npm module
## Usage
```js
import { ... } from '@adobe/helix-importer';
```