An open API service indexing awesome lists of open source software.

https://github.com/webdevtodayjason/clone-it

CLI tool to grab an entire website and clone it to html for Site backups.
https://github.com/webdevtodayjason/clone-it

Last synced: 8 months ago
JSON representation

CLI tool to grab an entire website and clone it to html for Site backups.

Awesome Lists containing this project

README

          

# 🌐 Clone-It - Website Cloning Tool

A user-friendly command-line tool for cloning websites using wget with an interactive menu interface.

## Features

- **Interactive Menu**: Easy-to-use CLI interface with colored output
- **Domain Validation**: Ensures valid domain format before proceeding
- **Project Organization**: Creates organized folder structure in `projects/domain.com`
- **Live Output**: Shows real-time wget progress and output
- **Post-Clone Options**: View files, open folder, or clone another site
- **Error Handling**: Graceful error handling with helpful messages

## Prerequisites

- **wget**: The tool requires wget to be installed
- macOS: `brew install wget`
- Ubuntu/Debian: `sudo apt-get install wget`
- CentOS/RHEL: `sudo yum install wget`

## Installation

1. Clone or download the script to your desired location
2. Make it executable: `chmod +x clone-it.sh`
3. (Optional) Run the installer for easier access: `./install.sh`

## Usage

### Direct execution:
```bash
./clone-it.sh
```

### After installation:
```bash
clone-it
```

## How It Works

1. **Menu Interface**: The script presents a clean, colored menu asking "What are we cloning today?"
2. **Domain Input**: Enter the domain you want to clone (e.g., `example.com`)
3. **Validation**: The script validates the domain format
4. **Link Conversion Option**: Choose whether to convert internal links to .html extension
5. **Directory Creation**: Creates `projects/domain.com/` folder structure
6. **Cloning Process**: Runs the wget command with optimal parameters:
```bash
wget --mirror -w 2 -p --html-extension --convert-links https://domain.com/
```
7. **Link Processing**: If enabled, converts internal links to work with .html extensions
8. **Live Output**: Shows the wget output in real-time within a bordered window
9. **Completion Menu**: Offers options to:
- Clone another website
- Open the project folder
- View project contents
- Exit

## wget Parameters Explained

- `--mirror`: Creates a complete mirror of the site
- `-w 2`: Waits 2 seconds between downloads (respectful crawling)
- `-p`: Downloads all page prerequisites (images, CSS, etc.)
- `--html-extension`: Adds .html extension to files
- `--convert-links`: Converts links for offline browsing

## Project Structure

```
CLONE-IT/
├── clone-it.sh # Main cloning script
├── fix-links.sh # Link conversion utility
├── install.sh # Installation helper
├── README.md # This file
└── projects/ # Created when first used
└── domain.com/ # Individual site folders
└── domain.com/ # Actual site files
```

## Error Handling

- Checks for wget installation before running
- validates domain format
- Handles existing directories with user confirmation
- Reports wget exit codes and errors
- Graceful handling of user cancellations

## Colors and UI

The script uses ANSI color codes for better user experience:
- 🔵 Blue: Process information
- 🟢 Green: Success messages
- 🟡 Yellow: Warnings and prompts
- 🔴 Red: Error messages
- 🔷 Cyan: Headers and menus

## Link Conversion Feature

The tool includes a link conversion feature to fix a common issue with cloned websites:

**The Problem**: When wget clones a site, it adds `.html` extensions to files that didn't originally have them. This breaks internal links like `/office-visits` which become `/office-visits.html` but the HTML still links to the original path without the extension.

**The Solution**: When you choose "Y" for "Convert all internal links to .html?", the script will:
- Scan all HTML files in the cloned site
- Convert internal links from `/page-name` to `/page-name.html`
- Handle both absolute and relative links
- Preserve existing `.html` links unchanged
- Create backups when using the standalone fix-links utility

### Standalone Link Fixer

For sites already cloned without link conversion, use the separate utility:

```bash
./fix-links.sh # Interactive mode
./fix-links.sh example.com # Direct mode
```

## Tips

- The script is respectful to servers with a 2-second delay between requests
- Large sites may take considerable time to clone completely
- Check robots.txt and site terms of service before cloning
- The cloned site will work offline with converted links
- Use link conversion if the original site used clean URLs without extensions

## Troubleshooting

**"wget not found"**: Install wget using your package manager
**"Invalid domain"**: Ensure domain format like `example.com` (no http://)
**"Permission denied"**: Make sure the script is executable (`chmod +x`)
**Slow cloning**: This is normal for large sites due to the respectful 2-second delay

## License

Free to use and modify as needed.