An open API service indexing awesome lists of open source software.

https://github.com/mathsgod/html-image-extractor

Extract embedded data: URI images from HTML and replace with placeholders
https://github.com/mathsgod/html-image-extractor

Last synced: 22 days ago
JSON representation

Extract embedded data: URI images from HTML and replace with placeholders

Awesome Lists containing this project

README

          

# html-image-extractor

A PHP library to extract embedded `data:` URI images from HTML, replace them with temporary placeholders, and restore them with real URLs once the images have been saved or uploaded.

## Requirements

- PHP 8.1+

## Installation

```bash
composer require mathsgod/html-image-extractor
```

## How it works

```
HTML (with data: URIs)
↓ extract()
HTML with __IMG_xxx__ placeholders + image data map
↓ save / upload images, get URLs
↓ restore()
Final HTML (with real URLs)
```

## Usage

### Basic — save to local directory

```php
use HtmlImageExtractor\HtmlImageExtractor;

$extractor = new HtmlImageExtractor();

// Step 1: extract embedded images
$extractor->extract($html);

$modifiedHtml = $extractor->getHtml(); // HTML with __IMG_xxx__ placeholders
$images = $extractor->getImages(); // image data map
echo $extractor->count() . ' image(s) found';

// Step 2: save to disk and get URL map
$urlMap = $extractor->saveToDir(
saveDir: __DIR__ . '/uploads',
baseUrl: 'https://example.com/uploads'
);

// Step 3: restore placeholders with real URLs
$finalHtml = $extractor->restore($urlMap);
```

### Advanced — custom upload (e.g. cloud storage)

```php
$extractor->extract($html);

// Build the URL map yourself after uploading
$urlMap = [];
foreach ($extractor->getImages() as $id => $info) {
// $info['mimeType'] — e.g. "image/png"
// $info['data'] — base64 encoded image data
// $info['extension'] — e.g. "png"
$url = myCloudUpload(base64_decode($info['data']), $info['mimeType']);
$urlMap[$id] = $url;
}

$finalHtml = $extractor->restore($urlMap);
```

## API

| Method | Description |
|--------|-------------|
| `extract(string $html): static` | Extract all `data:` URI images and replace with placeholders. Returns `$this` for chaining. |
| `getHtml(): string` | Get the HTML with placeholders (after `extract()`). |
| `getImages(): array` | Get extracted image data keyed by placeholder ID. Each entry has `mimeType`, `data` (base64), `extension`. |
| `count(): int` | Number of images found in the last `extract()` call. |
| `saveToDir(string $saveDir, string $baseUrl): array` | Save images to a local directory. Returns a `urlMap` ready for `restore()`. |
| `restore(array $urlMap): string` | Replace placeholders with real URLs. Returns final HTML. |

## Supported image formats

`jpeg`, `png`, `gif`, `webp`, `svg`, `bmp`, `tiff`, `avif`

## License

MIT