https://github.com/joelstephen97/tracr
useful for searching for images in a recursive fashion given starting url
https://github.com/joelstephen97/tracr
beautifulsoup4 image-extraction python3
Last synced: about 1 year ago
JSON representation
useful for searching for images in a recursive fashion given starting url
- Host: GitHub
- URL: https://github.com/joelstephen97/tracr
- Owner: joelstephen97
- Created: 2025-03-27T22:44:20.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2025-03-27T22:46:16.000Z (about 1 year ago)
- Last Synced: 2025-03-27T23:28:36.130Z (about 1 year ago)
- Topics: beautifulsoup4, image-extraction, python3
- Language: Python
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Tracr Tool
Tracer is a Python tool that follows a given URL, scrapes images from the website along with their metadata (EXIF), and organizes them into a folder tree reflecting the link hierarchy. If the link contains further links, Tracer will recursively follow them up to a specified depth (default is 5, configurable via command-line arguments).
## Features
- **Recursive Traversal:** Follow links up to a configurable maximum depth.
- **Image Downloading:** Downloads all images found on each page.
- **Metadata Extraction:** Saves image metadata (format, size, mode, and EXIF data if available) into a corresponding text file.
- **Folder Organization:** Creates a folder structure that mirrors the link hierarchy.
## Requirements
- Python 3.x
- [requests](https://pypi.org/project/requests/)
- [beautifulsoup4](https://pypi.org/project/beautifulsoup4/)
- [Pillow](https://pypi.org/project/Pillow/)
## Installation
1. **Clone the Repository:**
```bash
git clone
cd tracr
```
2. **Create a Virtual Environment:**
On Windows, open PowerShell and run:
```powershell
python -m venv venv
```
3. **Activate the Virtual Environment:**
- **Using PowerShell:**
If you encounter an execution policy error, open PowerShell as Administrator and run:
```powershell
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
```
Then activate the virtual environment:
```powershell
.\venv\Scripts\Activate.ps1
```
- **Using Command Prompt:**
```cmd
venv\Scripts\activate
```
Once activated, you should see `(venv)` prefixed on your command line.
4. **Install Dependencies:**
With the virtual environment activated, install all required packages using:
```bash
pip install -r requirements.txt
```
## Usage
Run the Tracer tool with the following command:
```bash
python tracer.py --depth --output
```
- ``: The URL where the tracer begins.
- `--depth `: (Optional) The maximum depth to traverse. Default is 5.
- `--output `: (Optional) The folder where output will be stored. Default is `output`.
Example:
```bash
python tracer.py https://example.com --depth 5 --output tracer_output
```
## Virtual Environment Management
### Activating the Virtual Environment
- **PowerShell:**
```powershell
.\venv\Scripts\Activate.ps1
```
- **Command Prompt:**
```cmd
venv\Scripts\activate
```
### Deactivating the Virtual Environment
To deactivate the virtual environment, simply run:
```bash
deactivate
```