https://github.com/xorbit01/webpalm
πΈοΈ Crawl in the web network
https://github.com/xorbit01/webpalm
crawler crawling data data-science datamining go golang hack mining osint redteam spider tool
Last synced: 2 months ago
JSON representation
πΈοΈ Crawl in the web network
- Host: GitHub
- URL: https://github.com/xorbit01/webpalm
- Owner: XORbit01
- License: gpl-3.0
- Created: 2023-04-22T14:47:32.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-20T20:28:34.000Z (7 months ago)
- Last Synced: 2025-08-02T22:15:46.467Z (2 months ago)
- Topics: crawler, crawling, data, data-science, datamining, go, golang, hack, mining, osint, redteam, spider, tool
- Language: Go
- Homepage:
- Size: 5.07 MB
- Stars: 371
- Watchers: 3
- Forks: 39
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
π Advanced Web Traversal & Data Extraction π
[]()
[]()
[]()
[]()
[]()
[]()π **Crawl websites efficiently, extract structured data, and visualize connections.** π΅οΈββοΈ
---
## πΊοΈ Table of Contents
- [`π¦ Installation`](#-installation)
- [`β‘ Features`](#-features)
- [`π Usage`](#-usage)
- [`π Examples`](#-examples)
- [`π Regex Patterns`](#-regex-patterns)
- [`π€ Contributing`](#-contributing)---
## β‘ Features
- π³ **Structured Web-Tree Generation**
- π΅οΈ **Regex-Based Data Extraction**
- β‘ **High-Speed Multi-threading**
- π **Multiple Export Formats**
- π¨ **Colorized Output & Robust Error Handling**
---
## π¦ Installation
### π₯ Download Binary
### π₯ Compile from Source
```sh
git clone https://github.com/XORbit01/webpalm.git
cd webpalm
go build -o webpalm && ./webpalm
```
π [Download Latest Release](https://github.com/XORbit01/webpalm/releases/latest)### π₯ Install via Go
```sh
go install github.com/XORbit01/webpalm/v2@latest
```---
## π Usage
```sh
webpalm -h
```### βοΈ Common Flags
```yaml
π -i, --include # Include only specific domains (e.g., google.com, facebook.com)
π -u, --url # Target website
π -l, --level # Depth of traversal
β -x, --exclude # Exclude status codes (e.g., 404, 500)
πΎ -o, --output # Save results (JSON, XML, TXT)
π -w, --worker # Multi-threading workers
π --regexes # Extract data using regex
```---
## π Examples
### π² Generate a Website Map
```sh
webpalm -u https://example.com -l2
```### π¬ Extract Comments from Pages
```sh
webpalm -u https://example.com -l1 --regexes comments="\<\!--.*?-->" -o results.json
```### π Crawl with Multi-threading
```sh
webpalm -u https://example.com -l3 -w 50
```### πΎ Export Results
```sh
webpalm -u https://example.com -l2 -o output.xml
```---
## π Regex Patterns
| π Purpose | π Regex Pattern |
|-----------|--------------|
| π§ Emails | `[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+` |
| π¬ Comments | `\<\!--.*?-->` |
| π Tokens | `[a-zA-Z0-9]{32}` |
| π Passwords | `\bpassword\b.{0,10}` |π *Escape special characters if needed.*
---
## π€ Contributing
π‘ Pull requests are welcome! Open an issue before major changes.
π’ **Discord:** `xorbit.`---