https://github.com/netwrix/flarewell
Say goodbye to MadCap Flare and convert your project to markdown!
https://github.com/netwrix/flarewell
Last synced: 5 months ago
JSON representation
Say goodbye to MadCap Flare and convert your project to markdown!
- Host: GitHub
- URL: https://github.com/netwrix/flarewell
- Owner: netwrix
- Created: 2025-05-14T03:42:17.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-06T13:24:23.000Z (about 1 year ago)
- Last Synced: 2026-01-27T23:48:25.408Z (5 months ago)
- Language: Python
- Size: 785 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# HTML to Markdown Converter - Claude Instructions
A Python tool that converts HTML documentation (particularly from MadCap Flare) to Markdown format while preserving folder structure and centralizing images with intelligent deduplication.
## Core Functionality
- **Input**: HTML files (`.html`, `.htm`, `.xhtml`)
- **Output**: Markdown files (`.md`)
- **Directory Structure**: Preserved except for images
- **Image Handling**: Centralized in `static/img/{productname}` directory
- **Filename Convention**: All lowercase with underscores replacing spaces
- **Path References**: Absolute paths from parent output directory
## Key Features
- Detects identical images using content hashing
- Stores only one copy of duplicate images
- Tracks usage in `image-manifest.json`
- Updates all internal `.html` links to `.md`
- Maintains anchor links between documents
- Resolves cross-file references automatically
- All images stored in `/static/img/{mirror_doc_directory}`
- One image folder per product
- Only referenced images are copied
## Installation & Setup
```bash
# 1. Clone repository
git clone [repository_url]
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install beautifulsoup4 markdownify
```
## Usage
```bash
python app.py /path/to/html/docs /path/to/output
```
```bash
python app.py /path/to/html/docs /path/to/output --verbose
```
## Output Structure
```
output/ # Specified output directory
├── Product1/ # Markdown files (structure preserved)
│ ├── guide/
│ │ └── intro.md
│ └── api/
│ └── reference.md
└── Product2/
└── docs/
└── overview.md
static/ # Parallel to output directory
└── img/ # Centralized images (not 'images')
├── image-manifest.json # Deduplication tracking
├── Product1/
│ ├── guide/
│ │ └── screenshot.png
│ └── api/
│ └── diagram.png
└── Product2/
└── docs/
└── logo.png
```
## Implementation Details
- Scan for images and build reference map
- Create anchor mappings for cross-references
- Build deduplication hash table
- Convert HTML to Markdown
- Update all link references
- Copy unique images to static directory
- Generate image-manifest.json
## Critical Requirements
- Never modify source files
- Preserve all internal links
- Handle MadCap Flare-specific HTML structures
- Maintain readable Markdown output
- Optimize image storage through deduplication
- Generate comprehensive image manifest
## Error Handling
- Log warning but continue processing
- Record in image-manifest.json
- Preserve image reference in Markdown
- Attempt best-effort conversion
- Log parsing errors with file path
- Continue with next file
- Check for existing files
- Option to overwrite or skip
- Log conflicts
## Performance Considerations
- **Expected Speed**: ~1-2 seconds per file
- **Memory Usage**: Scales with image deduplication table
- **Disk Usage**: Reduced through image deduplication
- **Large Documentation Sets**: Two-pass processing for efficiency
## Troubleshooting Guide
Image not referenced in HTML or missing from source
1. Verify image exists in source
2. Check if referenced in HTML
3. Review image-manifest.json
4. Confirm static/img structure
Cross-reference anchors not found
1. Check anchor mappings in verbose output
2. Verify target document exists
3. Confirm anchor ID consistency
## Command Reference
| Option | Type | Description | Default |
|--------|------|-------------|---------|
| `input_dir` | Required | Source HTML directory | - |
| `output_dir` | Required | Destination for Markdown | - |
| `--verbose, -v` | Flag | Show detailed progress | False |
| `--overwrite` | Flag | Overwrite existing files | False |
| `--skip-images` | Flag | Convert without copying images | False |
## Testing Checklist
- [ ] Basic HTML to Markdown conversion
- [ ] Image deduplication across multiple files
- [ ] Cross-file link resolution
- [ ] MadCap Flare specific elements
- [ ] Large documentation set performance
- [ ] Edge cases (empty files, broken HTML)
## Future Enhancements
- Support for custom CSS preservation
- Batch processing with progress bar
- Configuration file support
- Plugin system for custom transformations