https://github.com/adobe/helix-importer

Foundation tools for importing website content into that can be consumed in an Helix project.
https://github.com/adobe/helix-importer

helix library

Last synced: 3 months ago
JSON representation

Foundation tools for importing website content into that can be consumed in an Helix project.

Host: GitHub
URL: https://github.com/adobe/helix-importer
Owner: adobe
License: apache-2.0
Created: 2020-09-09T12:58:03.000Z (almost 6 years ago)
Default Branch: main
Last Pushed: 2026-02-24T16:54:50.000Z (5 months ago)
Last Synced: 2026-02-24T20:59:50.783Z (5 months ago)
Topics: helix, library
Language: JavaScript
Homepage:
Size: 5.16 MB
Stars: 13
Watchers: 22
Forks: 19
Open Issues: 13
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          # Helix Importer

Foundation tools for importing website content into that can be consumed in an Helix project.

Basic concept of the importer: for an input url, transform the DOM and convert it into a Markdown / docx file.

## Importer

An importer must extends [PageImporter](src/importer/PageImporter.js) and implement the `fetch` and `process` method. The general idea is that `fetch` receives the url to import and is responsible to return the HTML. `process` receives the corresponding Document in order to filter / rearrange / reshuffle the DOM before it gets processed by the Markdown transformer. `process` computes and defines the list of [PageImporterResource](src/importer/PageImporterResource.ts) (could be more than one), each resource being transformed as a Markdown document.

Goal of the importer is to get rid of the generic DOM elements like the header / footer, the nav... and all elements that are common to all pages in order to get the unique piece(s) of content per page.

### HTML2x helpers

[HTML2x](src/importer/HTML2x.js) methods (`HTML2md` and `HTML2docx`) are convienence methods to run an import. As input, they take:

- `URL`: URL of the page to import

- `document`: the DOM element to import - a Document object or a string (see `createDocumentFromString` for the string case)

- `transformerCfg`: object with the transformation "rules". Object can be either:

  - `{ transformDOM: ({ url, document, html, params }) => { ... return element-to-convert  }, generateDocumentPath: ({ url, document, html, params }) => { ... return path-to-target; }}` for a single mapping between one input document / one output file

  - `{ transform: ({ url, document, html, params }) => { ... return [{ element: first-element-to-convert, path: first-path-to-target }, ...]  }` for a mapping one input document / multiple output files (useful to generate multiple docx from a single web page)

- `config`: object with several config properties

  - `createDocumentFromString`: this config is required if you use the methods in a non-browser context and want to pass `document` param as string. This method receives the HTML to parse as a string and must return a Document object.

  - `setBackgroundImagesFromCSS`: set to false to disable the `background-image` inlining in the DOM.

### Importer UI

The Helix Importer has a dedicated browser UI: see https://github.com/adobe/helix-importer-ui

## Installation

```shell

npm i https://github.com/adobe/helix-importer 

```

TODO: publish npm module

## Usage

```js

import { ... } from '@adobe/helix-importer';

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adobe/helix-importer

Awesome Lists containing this project

README