Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/noahcardoza/cloudproxy

Proxy server to bypass Cloudflare protection.
https://github.com/noahcardoza/cloudproxy

anti-bot-page cloudflare cloudflare-bypass cloudflare-scrape hacktoberfest sneakerbot

Last synced: 2 days ago
JSON representation

Proxy server to bypass Cloudflare protection.

Awesome Lists containing this project

README

        

# CloudProxy

Proxy server to bypass Cloudflare protection

:warning: This project is in beta state. Some things may not work and the API can change at any time.
See the known issues section.

# 2captcha Alternatives

## [CapSolver.com](https://dashboard.capsolver.com/passport/register?inviteCode=45JrWKeetsQa)



Capsolver's Banner

Capsolver offers an affordable and quick automatic captcha solving solution with a success rate of 99.15% and the ability to solve a variety of captchas, including reCAPTCHA V2, hCaptcha, FunCaptcha, and more. Integration with various API clients is also supported, and a free trial balance is available with upgraded personal details.

## Discord

If you need help feel free to swing by my [Discord](https://discord.gg/gTq2VmUMsE)!

## How it works

CloudProxy starts a proxy server and it waits for user requests in an idle state using few resources.
When some request arrives, it uses [puppeteer](https://github.com/puppeteer/puppeteer) with the
[stealth plugin](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth)
to create a headless browser (Chrome). It opens the URL with user parameters and waits until the
Cloudflare challenge is solved (or timeout). The HTML code and the cookies are sent back to the
user and those cookies can be used to bypass Cloudflare using other HTTP clients.

**NOTE**: Web browsers consume a lot of memory. If you are running CloudProxy on a machine with few RAM,
do not make many requests at once. With each request a new browser is launched unless you use a session ID which is strongly recommended. However, if you use sessions, you should make sure to close them as soon as you are done using them.

## Installation

It requires NodeJS.

Run `PUPPETEER_PRODUCT=chrome npm install` to install CloudProxy dependencies.

## Usage

First run `npm run build`. Once the TypeScript is compiled, you can use `npm start` to start CloudProxy.

Example request:

```bash
curl -L -X POST 'http://localhost:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
"cmd": "request.get",
"url":"http://www.google.com/",
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.0 Safari/537.36",
"maxTimeout": 60000,
"headers": {
"X-Test": "Testing 123..."
}
}'
```

### Commands

#### + `sessions.create`

This will launch a new browser instance which will retain cookies until you destroy it
with `sessions.destroy`. This comes in handy so you don't have to keep solving challenges
over and over and you won't need to keep sending cookies for the browser to use.

This also speeds up the requests since it won't have to launch a new browser instance for
every request.

Parameter | Notes
|--|--|
session | Optional. The session ID that you want to be assinged to the instance. If one isn't set a random UUID will be assigned.
userAgent | Optional. Will be used by the headless browser.

#### + `sessions.list`

Returns a list of all the active sessions. More for debuging if you are curious to see
how many sessions are running. You should always make sure to properly close each
session when you are done using them as too many may slow your computer down.

Example response:

```json
{
"sessions": [
"session_id_1",
"session_id_2",
"session_id_3..."
]
}
```

#### + `sessions.destroy`

This will properly shutdown a browser instance and remove all files associaded with it
to free up resources for a new session. Whenever you no longer need to use a session you
should make sure to close it.

Parameter | Notes
|--|--|
session | The session ID that you want to be destroyed.

#### + `request.get`

Parameter | Notes
|--|--|
url | Mandatory
session | Optional. Will send the request from and existing browser instance. If one is not sent it will create a temporary instance that will be destroyed immediately after the request is completed.
headers | Optional. To specify user headers.
maxTimeout | Optional. Max timeout to solve the challenge
cookies | Optional. Will be used by the headless browser. Follow [this](https://github.com/puppeteer/puppeteer/blob/v3.3.0/docs/api.md#pagesetcookiecookies) format
encode | Optional. Add to header list `'Content-Type': 'application/x-www-form-urlencoded'` — can be useful if you need to send a JSON in `postData`.
Example response from running the `curl` above:

```json
{
"solution": {
"url": "https://www.google.com/?gws_rd=ssl",
"status": 200,
"headers": {
"status": "200",
"date": "Thu, 16 Jul 2020 04:15:49 GMT",
"expires": "-1",
"cache-control": "private, max-age=0",
"content-type": "text/html; charset=UTF-8",
"strict-transport-security": "max-age=31536000",
"p3p": "CP=\"This is not a P3P policy! See g.co/p3phelp for more info.\"",
"content-encoding": "br",
"server": "gws",
"content-length": "61587",
"x-xss-protection": "0",
"x-frame-options": "SAMEORIGIN",
"set-cookie": "1P_JAR=2020-07-16-04; expires=Sat, 15-Aug-2020 04:15:49 GMT; path=/; domain=.google.com; Secure; SameSite=none\nNID=204=QE3Ocq15XalczqjuDy52HeseG3zAZuJzID3R57g_oeQHyoV5DuvDhpWc4r9IcPoeIYmkr_ZTX_MNOU8IAbtXmVO7Bmq0adb-hpIHaTBIdBk3Ofifp4gO6vZleVuFYfj7ePkHeHdzGoX-en0FvKtd9iofX4O6RiAdEIAnpL7Wge4; expires=Fri, 15-Jan-2021 04:15:49 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=none",
"alt-svc": "h3-29=\":443\"; ma=2592000,h3-27=\":443\"; ma=2592000,h3-25=\":443\"; ma=2592000,h3-T050=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\""
},
"response":"...",
"cookies": [
{
"name": "NID",
"value": "204=QE3Ocq15XalczqjuDy52HeseG3zAZuJzID3R57g_oeQHyoV5DuvDhpWc4r9IcPoeIYmkr_ZTX_MNOU8IAbtXmVO7Bmq0adb-hpIHaTBIdBk3Ofifp4gO6vZleVuFYfj7ePkHeHdzGoX-en0FvKtd9iofX4O6RiAdEIAnpL7Wge4",
"domain": ".google.com",
"path": "/",
"expires": 1610684149.307722,
"size": 178,
"httpOnly": true,
"secure": true,
"session": false,
"sameSite": "None"
},
{
"name": "1P_JAR",
"value": "2020-07-16-04",
"domain": ".google.com",
"path": "/",
"expires": 1597464949.307626,
"size": 19,
"httpOnly": false,
"secure": true,
"session": false,
"sameSite": "None"
}
],
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.0 Safari/537.36"
},
"status": "ok",
"message": "",
"startTimestamp": 1594872947467,
"endTimestamp": 1594872949617,
"version": "1.0.0"
}
```

### + `request.post`

This is the same as `request.get` but it takes one more param:

Example request:

```bash
curl -L -X POST 'http://localhost:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
"cmd": "request.post",
"url":"http://www.google.com/",
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.0 Safari/537.36",
"maxTimeout": 60000,
"postData": { "string": "string", "number": 10, "boolean": false },
"headers": {
"X-Test": "Testing 123..."
}
}'
```
Parameter | Notes
|--|--|
postData | Must be a object.

## Downloading Images and PDFs (small files)

If you need to access an image/pdf or small file, you should pass the `download` parameter to
`request.get` setting it to `true`. Rather than access the html and return text it will
return a the buffer **base64** encoded which you will be able to decode and save the image/pdf.

This method isn't recommended for videos or anything larger. As that should be streamed back to
the client and at the moment there is nothing setup to do so. If this is something you need feel
free to create an issue and/or submit a PR.

## Environment variables

To set the environment vars in Linux run `export LOG_LEVEL=debug` and then start CloudProxy in the same shell.

Name | Default | Notes
|--|--|--|
LOG_LEVEL | info | Used to change the verbosity of the logging.
LOG_HTML | false | Used for debugging. If `true` all html that passes through the proxy will be logged to the console.
PORT | 8191 | Change this if you already have a process running on port `8191`.
HOST | 0.0.0.0 | This shouldn't need to be messed with but if you insist, it's here!
CAPTCHA_SOLVER | None | This is used to select which captcha solving method it used when a captcha is encounted.
HEADLESS | true | This is used to debug the browser by not running it in headless mode.

## Captcha Solvers

Sometimes CF not only gives mathmatical computations and browser tests, sometimes they also require
the user to solve a captcha. If this is the case, CloudProxy will return the captcha page. But that's
not very helpful to you is it?

CloudProxy can be customized to solve the captcha's automatically by setting the environment variable
`CAPTCHA_SOLVER` to the file name of one of the adapters inside the [/captcha](src/captcha) directory.

### [CaptchaHarvester](https://github.com/NoahCardoza/CaptchaHarvester)

This method makes use of the [CaptchaHarvester](https://github.com/NoahCardoza/CaptchaHarvester) project which allows users to collect thier own tokens from ReCaptcha V2/V3 and hCaptcha for free.

To use this method you must set these ENV variables:

```bash
CAPTCHA_SOLVER=harvester
HARVESTER_ENDPOINT=https://127.0.0.1:5000/token
```

**Note**: above I set `HARVESTER_ENDPOINT` to the default configureation
of the captcha harvester's server, but that could change if
you customize the command line flags. Simply put, `HARVESTER_ENDPOINT`
should be set to the URI of the route that returns a token in plain text when called.

### [hcaptcha-solver](https://github.com/JimmyLaurent/hcaptcha-solver)

This method makes use of the [hcaptcha-solver](https://github.com/JimmyLaurent/hcaptcha-solver) project which attempts to solve hcaptcha by randomly selecting images.

To use this solver you must first install it and then set it as the `CAPTCHA_SOLVER`.

```bash
npm i hcaptcha-solver
CAPTCHA_SOLVER=hcaptcha-solver
```

### Other Options

Everyone likes more options to choose from. Help contribute to the projects by submitting
PR requests for other 3rd party captcha solves or your own projects.
PR's are welcome for any and all captcha solving methods and services.

## Docker

You may edit the `./Dockerfile` as well as `./docker-compose.yml` as you see fit.

```bash
# To build the image & run it using `docker compose` (detched mode)
docker compose up -d

# To stop & remove containers:
docker compose down

# You may also build and run manually, however the configuration is
# already set in the compose file, that way you dont have to remember it.
docker build -t cloudproxy:latest .
docker run --restart=always --name cloudproxy -p 8191:8191 -d cloudproxy:latest
```

## TypeScript

I'm quite new to TypeScript. If you spot any funny business or anything that is or isn't being
used properly feel free to submit a PR or open an issue.

## Known issues / Roadmap

The current implementation seems to be working on the sites I have been testing them on. However, if you find it unable to access a site, open an issue and I'd be happy to investigate.

That being said, the project uses the [puppeteer stealth plugin](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth). If Cloudflare is able to detect the headless browser, it's more that projects domain to fix.

TODO:

* Fix remaining issues in the code (see TODOs in code)
* Make the maxTimeout more accurate (count the time to open the first page / maybe count the captcha solve time?)
* Hide sensitive information in logs
* Reduce Docker image size
* Docker image for ARM architecture
* Install instructions for Windows

## Credits

Based off of ngosang's [FlareSolverr](https://github.com/ngosang/FlareSolverr).

For help contact @`MacHacker#7322` (Discord)

Has CloudProxy saved or made you money on your project? Consider buying me a coffee!

[![Buy Me A Coffee](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/noahcardoza)