An open API service indexing awesome lists of open source software.

https://github.com/xorbit01/webpalm

πŸ•ΈοΈ Crawl in the web network
https://github.com/xorbit01/webpalm

crawler crawling data data-science datamining go golang hack mining osint redteam spider tool

Last synced: 2 months ago
JSON representation

πŸ•ΈοΈ Crawl in the web network

Awesome Lists containing this project

README

          

WebPalm Banner

🌐 Advanced Web Traversal & Data Extraction πŸš€

[![GitHub release](https://img.shields.io/github/v/release/XORbit01/webpalm?color=blue&label=release)]()
[![GitHub license](https://img.shields.io/github/license/XORbit01/webpalm?color=green)]()
[![GitHub issues](https://img.shields.io/github/issues/XORbit01/webpalm?color=red)]()
[![GitHub stars](https://img.shields.io/github/stars/XORbit01/webpalm?color=yellow)]()
[![GitHub forks](https://img.shields.io/github/forks/XORbit01/webpalm?color=orange)]()
[![GitHub watchers](https://img.shields.io/github/watchers/XORbit01/webpalm?color=blue)]()

πŸ” **Crawl websites efficiently, extract structured data, and visualize connections.** πŸ•΅οΈβ€β™‚οΈ

WebPalm Preview

---

## πŸ—ΊοΈ Table of Contents
- [`πŸ“¦ Installation`](#-installation)
- [`⚑ Features`](#-features)
- [`πŸš€ Usage`](#-usage)
- [`πŸ“Œ Examples`](#-examples)
- [`πŸ“œ Regex Patterns`](#-regex-patterns)
- [`🀝 Contributing`](#-contributing)

---

## ⚑ Features

- 🌳 **Structured Web-Tree Generation**

- πŸ•΅οΈ **Regex-Based Data Extraction**

- ⚑ **High-Speed Multi-threading**

- πŸ“‚ **Multiple Export Formats**

- 🎨 **Colorized Output & Robust Error Handling**

---

## πŸ“¦ Installation

### πŸ“₯ Download Binary

### πŸ“₯ Compile from Source
```sh
git clone https://github.com/XORbit01/webpalm.git
cd webpalm
go build -o webpalm && ./webpalm
```
πŸ‘‰ [Download Latest Release](https://github.com/XORbit01/webpalm/releases/latest)

### πŸ“₯ Install via Go
```sh
go install github.com/XORbit01/webpalm/v2@latest
```

---

## πŸš€ Usage

```sh
webpalm -h
```

### βš™οΈ Common Flags
```yaml
🌎 -i, --include # Include only specific domains (e.g., google.com, facebook.com)
πŸ”— -u, --url # Target website
πŸ“ -l, --level # Depth of traversal
❌ -x, --exclude # Exclude status codes (e.g., 404, 500)
πŸ’Ύ -o, --output # Save results (JSON, XML, TXT)
πŸš€ -w, --worker # Multi-threading workers
πŸ” --regexes # Extract data using regex
```

---

## πŸ“Œ Examples

### 🌲 Generate a Website Map
```sh
webpalm -u https://example.com -l2
```

### πŸ’¬ Extract Comments from Pages
```sh
webpalm -u https://example.com -l1 --regexes comments="\<\!--.*?-->" -o results.json
```

### πŸš€ Crawl with Multi-threading
```sh
webpalm -u https://example.com -l3 -w 50
```

### πŸ’Ύ Export Results
```sh
webpalm -u https://example.com -l2 -o output.xml
```

---

## πŸ“œ Regex Patterns

| πŸ” Purpose | πŸ“œ Regex Pattern |
|-----------|--------------|
| πŸ“§ Emails | `[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+` |
| πŸ’¬ Comments | `\<\!--.*?-->` |
| πŸ”‘ Tokens | `[a-zA-Z0-9]{32}` |
| πŸ” Passwords | `\bpassword\b.{0,10}` |

πŸ“Œ *Escape special characters if needed.*

---

## 🀝 Contributing
πŸ’‘ Pull requests are welcome! Open an issue before major changes.
πŸ“’ **Discord:** `xorbit.`

---