An open API service indexing awesome lists of open source software.

https://github.com/rahulsdevloper/gosearch-search-engine-scraper

Get search results from google, bing, duckduckgo, etc easily using GoSearch
https://github.com/rahulsdevloper/gosearch-search-engine-scraper

golang golang-library golang-module golang-package golang-scraper golang-search golang-searcher scraper scraper-engine search search-engine search-engine-scraper

Last synced: about 1 month ago
JSON representation

Get search results from google, bing, duckduckgo, etc easily using GoSearch

Awesome Lists containing this project

README

        

# 🔍 Search Engine Scraper - GoSearch

![GoSearch](https://github.com/user-attachments/assets/6a5cd17b-1494-4804-942e-2d104d1d533d)

```
_____ _ _____ _
/ ____| | | | ___| (_)
| (___ ___ __ _ _ __ ___| |__ | |__ _ __ __ _ _ _ __ ___
\___ \ / _ \/ _` | '__/ __| '_ \| __| '_ \ / _` || | '_ \ / _ \
____) | __/ (_| | | | (__| | | | |__| | | | (_| || | | | | __/
|_____/ \___|\__,_|_| \___|_| |_\____/_| |_|\__, ||_|_| |_|\___|
_____ __/ |
/ ____| |___/
| (___ ___ _ __ __ _ _ __ ___ _ __
\___ \ / __| '__/ _` | '_ \ / _ \ '__|
____) | (__| | | (_| | |_) | __/ |
|_____/ \___|_| \__,_| .__/ \___|_|
| |
|_|
```

Go Version
Powered by Chromedp
Version

Search Engine Scraper Demo


High-performance, anti-detection search engine scraper - Built with advanced Go concurrency patterns


✨ Features
🚀 Install
🔧 Usage
🌟 Examples
🧠 Advanced
🐞 Debug

---

## ✨ Key Features



Google, Bing & DuckDuckGo



Bypass CAPTCHAs & Blocks



Chrome-Based Scraping



Domain, Keyword & More



Keyword Extraction & Ad Detection



Avoid Rate Limiting

## 🚀 Installation

MethodCommands

From Binary

```bash
# Download the latest release
curl -sSL https://github.com/RahulSDevloper/Search-Engine-Scraper---Golang/releases/download/v1.0.0/gosearch-linux-amd64 -o gosearch
chmod +x gosearch
./gosearch --query "golang programming"
```

From Source

```bash
git clone https://github.com/RahulSDevloper/Search-Engine-Scraper---Golang.git
cd Search-Engine-Scraper---Golang
go build -ldflags="-s -w" -o gosearch
./gosearch --query "golang programming"
```

Using Docker

```bash
docker pull rahulsdevloper/gosearch:latest
docker run rahulsdevloper/gosearch --query "golang programming"
```

## 🔧 Usage

```
Usage: gosearch [OPTIONS] [QUERY]

Options:
--query string Search query
--engine string Search engine (google, bing, duckduckgo, all) (default "google")
--max int Maximum results to fetch (default 10)
--ads Include advertisements in results
--timeout duration Search timeout (default 30s)
--proxy string Proxy URL (e.g., http://user:pass@host:port)
--headless Use headless browser (recommended for avoiding detection)
--lang string Language code (default "en")
--region string Region code (default "us")
--format string Output format (json, csv, table) (default "json")
--output string Output file (default: stdout)
--page int Result page number (default 1)
--min-words int Minimum word count in description
--max-words int Maximum word count in description
--domain string Filter results by domain (include)
--exclude-domain string Filter results by domain (exclude)
--keyword string Filter results by keyword
--type string Filter by result type (organic, special, etc.)
--site string Limit results to specific site
--filetype string Limit results to specific file type
--verbose Enable verbose logging
--debug Enable debug mode (saves HTML responses)
--log string Log file path
--stats string Statistics output file
--help Show help
```

## 🌟 Examples

Basic Search with Google 🔍

```bash
./gosearch --query "golang programming"
```

Basic Search Example

Search with Advanced Filters 🧰

```bash
./gosearch --query "machine learning" --engine bing --domain edu --format table
```

Advanced Search Example

Multi-Engine Search with Headless Browser 🌐

```bash
./gosearch --query "climate science" --engine all --headless --output results.json
```

Multi-Engine Example

Filetype Specific Search 📄

```bash
./gosearch --query "research papers" --filetype pdf --site edu --max 20
```

## 🧠 Advanced Techniques


Advanced Features

### Using as a Library

```go
package main

import (
"context"
"fmt"
"time"

"github.com/RahulSDevloper/Search-Engine-Scraper---Golang/pkg/engines"
"github.com/RahulSDevloper/Search-Engine-Scraper---Golang/pkg/models"
)

func main() {
// Create a new Google search engine
engine := engines.NewGoogleSearchEngine()

// Configure search request with optimization strategy
request := models.SearchRequest{
Query: "golang concurrency patterns",
MaxResults: 10,
Timeout: 30 * time.Second,
UseHeadless: true,
Debug: true,
}

// Execute search with context for cancellation
ctx, cancel := context.WithTimeout(context.Background(), 45*time.Second)
defer cancel()

results, err := engine.Search(ctx, request)
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}

// Process and analyze results
for i, result := range results {
fmt.Printf("%d. %s\n%s\n\n", i+1, result.Title, result.URL)
}
}
```

### Custom Rate Limiting

```yaml
# ~/.config/gosearch/config.yaml
rate_limits:
google: 10 # requests per minute
bing: 15
duckduckgo: 20

proxy_rotation:
enabled: true
proxies:
- http://proxy1:8080
- http://proxy2:8080
rotation_strategy: round-robin # or random
```

## 🐞 Debugging


Debugging Techniques

### No Results Found?

If you're not getting any results, try these solutions:

1. **Use Headless Mode** to avoid detection
```bash
./gosearch --query "your search" --headless
```

2. **Use a Proxy** to route through a clean IP address
```bash
./gosearch --query "your search" --proxy http://your-proxy-server:port
```

3. **Enable Debug Mode** to examine the HTML response
```bash
./gosearch --query "your search" --debug
```

### Debugging Process Flow

```mermaid
graph TD
A[Run Search] --> B{Results Found?}
B -->|Yes| C[Process Results]
B -->|No| D[Enable Debug Mode]
D --> E[Check HTML Responses]
E --> F{Captcha Present?}
F -->|Yes| G[Use Headless + Proxy]
F -->|No| H[Check Selectors]
H --> I[Update Selectors]
I --> A
G --> A
```

## 📊 Performance Benchmarks

EngineResults/SecondMemory UsageDetection Avoidance
Google6.5LowHigh
Bing8.2LowMedium
DuckDuckGo7.3LowVery High
All (Concurrent)4.8MediumMedium

## 📚 Design Philosophy

The Search Engine Scraper follows these core principles:

1. **Resilience First**: Designed to handle the constantly changing DOM structures of search engines
2. **Performance Focused**: Optimized for speed while maintaining low resource usage
3. **Privacy Conscious**: Minimal footprint to avoid detection
4. **Developer Friendly**: Clean API for integration into other Go applications

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

Made with love by RahulSDevloper

⭐ Star this project if you find it useful! ⭐