An open API service indexing awesome lists of open source software.

https://github.com/chubes4/html-to-blocks-converter

WordPress plugin that converts raw HTML to Gutenberg blocks, inspired by Gutenberg's client-side rawHandler
https://github.com/chubes4/html-to-blocks-converter

gutenberg wordpress

Last synced: about 1 month ago
JSON representation

WordPress plugin that converts raw HTML to Gutenberg blocks, inspired by Gutenberg's client-side rawHandler

Awesome Lists containing this project

README

          

# HTML to Blocks Converter

A WordPress plugin **and Composer package** that converts raw HTML to Gutenberg block arrays using WordPress Core's HTML API.

It works in two modes:

- **Plugin mode:** activate the plugin and it automatically converts raw HTML to blocks on `wp_insert_post()` and REST editor reads for public REST-enabled post types.
- **Package mode:** `composer require chubes4/html-to-blocks-converter` and load WordPress. Composer autoload registers the same conversion library and automatic hooks through the version registry. Consumers can also call `html_to_blocks_raw_handler()` directly.

## Description

This plugin provides server-side HTML-to-blocks conversion using WordPress Core's HTML API (`WP_HTML_Processor`) for spec-compliant HTML5 parsing. Inspired by Gutenberg's client-side `rawHandler` function from [`packages/blocks/src/api/raw-handling`](https://github.com/WordPress/gutenberg/tree/trunk/packages/blocks/src/api/raw-handling), it enables programmatic content creation with proper block structure, and ensures the block editor sees proper blocks for supported post types even when `post_content` contains raw HTML.

### Use Cases

- Migrating legacy content to Gutenberg blocks
- Importing content from external sources via REST API
- Programmatically creating posts with block-based content
- Converting HTML from headless CMS or content pipelines

## Supported Block Transforms

The plugin converts high-confidence static HTML patterns to their corresponding Gutenberg blocks:

| HTML signal | Block type |
|-------------|------------|
| `

` - `

` | `core/heading` |
| `

` and plain text | `core/paragraph` |
| `

    `, `
      ` | `core/list` with `core/list-item` children |
      | `
      ` | `core/quote` |
      | `
      ` | `core/pullquote` |
      | ``, `` | `core/image` |
      | gallery-like wrappers with multiple images | `core/gallery` with `core/image` children |
      | `` / `` with a source | `core/video` / `core/audio` |
      | recognized provider `` embeds | `core/embed` |
      | downloadable file anchors | `core/file` |
      | media-text wrappers | `core/media-text` |
      | WordPress button anchors | `core/buttons` with `core/button` children |
      | `` | `core/details` |
      | `
      ` | `core/verse` |
      
      | `
      ` | `core/code` |
      
      | `
      ` | `core/preformatted` |
      
      | `
      ` | `core/separator` |
      | `
      ` | `core/table` |
      | WordPress shortcodes | `core/shortcode` |
      | high-confidence semantic/layout wrappers | `core/group`, `core/columns`, `core/column`, `core/cover`, `core/spacer` |

      Nested lists and blockquotes with multiple paragraphs are fully supported.

      For the source-of-truth status of supported transforms, observed fallbacks,
      future candidates, and context-required block families, see the
      [Core Block Coverage Matrix](docs/core-block-coverage.md).

      For Site Editor and block theme boundaries, including which block families
      should not be inferred from raw HTML alone, see
      [Site Editor Boundary](docs/site-editor-boundary.md).

      For the supported subset h2bc intentionally keeps aligned with Gutenberg's
      `rawHandler`, see [Gutenberg rawHandler Parity](docs/gutenberg-rawhandler-parity.md).

      Unsupported top-level elements are preserved as `core/html` instead of guessed.
      When that fallback is used, h2bc fires `html_to_blocks_unsupported_html_fallback`
      with the unsupported HTML fragment, fallback context, and generated block so
      production pipelines can log, warn, or fail on unexpected fallback usage.

      ## Installation

      1. Download the plugin zip file
      2. Navigate to Plugins > Add New > Upload Plugin
      3. Upload the zip file and activate

      Or clone directly to your plugins directory:

      ```bash
      cd wp-content/plugins
      git clone https://github.com/chubes4/html-to-blocks-converter.git
      ```

      Or install as a Composer package:

      ```bash
      composer require chubes4/html-to-blocks-converter
      ```

      Composer autoloads `library.php`, which registers the conversion library
      through an Action-Scheduler-style version registry. The winning library version
      loads the raw handler and the automatic write/read hooks so bundled consumers get
      the same HTML → blocks behavior as the standalone plugin.

      When h2bc is bundled through php-scoper, callbacks registered with WordPress hook
      APIs must resolve inside the scoped namespace. Build hook callback strings from
      `__NAMESPACE__` so the same source works as the standalone plugin and as a scoped
      dependency.

      ## Usage

      The plugin hooks into `wp_insert_post_data` and automatically converts HTML content to blocks for supported post types. No configuration required for public REST-enabled post types.

      ### Programmatic Usage

      ```php
      // Content will be automatically converted to blocks
      wp_insert_post([
      'post_title' => 'My Post',
      'post_content' => '

      Hello World


      This is my content.

      ',
      'post_status' => 'publish',
      'post_type' => 'post',
      ]);
      ```

      ### REST API Usage

      ```bash
      curl -X POST https://yoursite.com/wp-json/wp/v2/posts \
      -H "Authorization: Bearer YOUR_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
      "title": "My Post",
      "content": "

      Hello World


      This is my content.

      ",
      "status": "publish"
      }'
      ```

      ### Direct Conversion

      ```php
      $html = '

      Title


      Paragraph with bold text.

      ';
      $blocks = html_to_blocks_raw_handler(['HTML' => $html]);
      $block_content = serialize_blocks($blocks);
      ```

      ## REST API Read Path (v0.4.0+)

      The plugin also converts HTML to blocks when the block editor loads a post via the REST API. When `context=edit` is requested, any post with HTML in `content.raw` (no `