https://github.com/mathsgod/html-image-extractor
Extract embedded data: URI images from HTML and replace with placeholders
https://github.com/mathsgod/html-image-extractor
Last synced: 22 days ago
JSON representation
Extract embedded data: URI images from HTML and replace with placeholders
- Host: GitHub
- URL: https://github.com/mathsgod/html-image-extractor
- Owner: mathsgod
- License: mit
- Created: 2026-04-16T08:24:32.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-16T08:28:49.000Z (2 months ago)
- Last Synced: 2026-04-17T00:17:45.379Z (2 months ago)
- Language: PHP
- Size: 4.88 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# html-image-extractor
A PHP library to extract embedded `data:` URI images from HTML, replace them with temporary placeholders, and restore them with real URLs once the images have been saved or uploaded.
## Requirements
- PHP 8.1+
## Installation
```bash
composer require mathsgod/html-image-extractor
```
## How it works
```
HTML (with data: URIs)
↓ extract()
HTML with __IMG_xxx__ placeholders + image data map
↓ save / upload images, get URLs
↓ restore()
Final HTML (with real URLs)
```
## Usage
### Basic — save to local directory
```php
use HtmlImageExtractor\HtmlImageExtractor;
$extractor = new HtmlImageExtractor();
// Step 1: extract embedded images
$extractor->extract($html);
$modifiedHtml = $extractor->getHtml(); // HTML with __IMG_xxx__ placeholders
$images = $extractor->getImages(); // image data map
echo $extractor->count() . ' image(s) found';
// Step 2: save to disk and get URL map
$urlMap = $extractor->saveToDir(
saveDir: __DIR__ . '/uploads',
baseUrl: 'https://example.com/uploads'
);
// Step 3: restore placeholders with real URLs
$finalHtml = $extractor->restore($urlMap);
```
### Advanced — custom upload (e.g. cloud storage)
```php
$extractor->extract($html);
// Build the URL map yourself after uploading
$urlMap = [];
foreach ($extractor->getImages() as $id => $info) {
// $info['mimeType'] — e.g. "image/png"
// $info['data'] — base64 encoded image data
// $info['extension'] — e.g. "png"
$url = myCloudUpload(base64_decode($info['data']), $info['mimeType']);
$urlMap[$id] = $url;
}
$finalHtml = $extractor->restore($urlMap);
```
## API
| Method | Description |
|--------|-------------|
| `extract(string $html): static` | Extract all `data:` URI images and replace with placeholders. Returns `$this` for chaining. |
| `getHtml(): string` | Get the HTML with placeholders (after `extract()`). |
| `getImages(): array` | Get extracted image data keyed by placeholder ID. Each entry has `mimeType`, `data` (base64), `extension`. |
| `count(): int` | Number of images found in the last `extract()` call. |
| `saveToDir(string $saveDir, string $baseUrl): array` | Save images to a local directory. Returns a `urlMap` ready for `restore()`. |
| `restore(array $urlMap): string` | Replace placeholders with real URLs. Returns final HTML. |
## Supported image formats
`jpeg`, `png`, `gif`, `webp`, `svg`, `bmp`, `tiff`, `avif`
## License
MIT