Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iarsham/scrapify
Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.
https://github.com/iarsham/scrapify
403-bypass arkose cloudflare crawler golang http-client scraper
Last synced: about 1 month ago
JSON representation
Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.
- Host: GitHub
- URL: https://github.com/iarsham/scrapify
- Owner: iarsham
- License: mit
- Created: 2024-08-29T12:07:38.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2024-09-02T19:05:00.000Z (5 months ago)
- Last Synced: 2024-10-25T03:44:04.799Z (3 months ago)
- Topics: 403-bypass, arkose, cloudflare, crawler, golang, http-client, scraper
- Language: Go
- Homepage:
- Size: 24.4 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Scrapifyπ: A Go Library for Web Scraping
This library provides tools for building web scrapers in Go. It allows you to create custom HTTP requests with advanced features for bypassing basic anti-scraping measures.
## Features
- **Customizable Headers**: Set various headers like User-Agent and sec-ch-ua to mimic a real browser.
- **TLS Configuration**: Customize the Transport Layer Security (TLS) configuration for secure connections.
- **Browser Emulation**: Specify different browsers (Chrome, Firefox, Edge) to influence the cipher suites offered.
- **Default Headers**: Includes a set of common headers to improve compatibility.## Installation
```bash
go get -u github.com/iarsham/scrapify@latest
```## Usage
```go
package mainimport (
"fmt"
"github.com/iarsham/scrapify"
"net/http"
)func main() {
client := &http.Client{
Transport: scrapify.NewTransport(scrapify.Chrome),
}
req, err := http.NewRequest(http.MethodGet, "https://cloudflare.com", nil)
if err != nil {
panic(err)
}
scrapify.SetHeaders(req, nil)
resp, err := client.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
fmt.Println(resp.StatusCode)
}
``````go
package mainimport (
"fmt"
"github.com/iarsham/scrapify"
)func main() {
c := colly.NewCollector()
c.WithTransport(scrapify.NewTransport(scrapify.Chrome))c.OnRequest(func(r *colly.Request) {
scrapify.SetCollyHeaders(r, nil)
})c.OnHTML("body", func(e *colly.HTMLElement) {
fmt.Println(e.Text)
})c.OnResponse(func(r *colly.Response) {
fmt.Println(r.StatusCode)
})if err := c.Visit("https://chatgpt.com"); err != nil {
panic(err)
}
}```
## ContributingWe welcome contributions to this library. Please feel free to submit pull requests with improvements and bug fixes.
## License
This library is licensed under the MIT License. See the LICENSE file for details.