https://github.com/imWildCat/scylla
Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era
https://github.com/imWildCat/scylla
crawler proxy-pool python python3 scylla
Last synced: 19 days ago
JSON representation
Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era
- Host: GitHub
- URL: https://github.com/imWildCat/scylla
- Owner: imWildCat
- License: apache-2.0
- Created: 2018-04-10T09:55:11.000Z (about 7 years ago)
- Default Branch: main
- Last Pushed: 2025-02-20T16:27:00.000Z (about 2 months ago)
- Last Synced: 2025-03-18T19:27:26.740Z (25 days ago)
- Topics: crawler, proxy-pool, python, python3, scylla
- Language: Python
- Homepage:
- Size: 680 KB
- Stars: 3,987
- Watchers: 77
- Forks: 477
- Open Issues: 47
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-starts - imWildCat/scylla - Intelligent proxy pool for Humans™ (Maintainer needed) (Python)
- awesome-hacking-lists - imWildCat/scylla - Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era (Python)
- awesome-github - 爬虫代理池
README
 [](https://travis-ci.org/imWildCat/scylla)
[](https://codecov.io/gh/imWildCat/scylla)
[](https://scylla.wildcat.io/en/latest/?badge=latest)
[](https://badge.fury.io/py/scylla)
[](https://hub.docker.com/r/wildcat/scylla/)
[](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=5DXFA7WGWPZBN)# Scylla
An intelligent proxy pool for humanities, to extract content from the internet and build your own Large Language Models in this new AI era.
Key features:
- Automatic proxy ip crawling and validation
- Easy-to-use JSON API
- Simple but beautiful web-based user interface (eg. geographical
distribution of proxies)
- Get started with only **1 command** minimally
- Simple HTTP Forward proxy server
- [Scrapy] and [requests] integration with only 1 line of code
minimally
- Headless browser crawlingGet started
===========Installation
------------### Install with Docker (highly recommended)
```bash
docker run -d -p 8899:8899 -p 8081:8081 -v /var/www/scylla:/var/www/scylla --name scylla wildcat/scylla:latest
```### Install directly via pip
```bash
pip install scylla
scylla --help
scylla # Run the crawler and web server for JSON API
```### Install from source
```bash
git clone https://github.com/imWildCat/scylla.git
cd scyllapip install -r requirements.txt
cd frontend
npm install
cd ..make assets-build
python -m scylla
```Usage
-----This is an example of running a service locally (`localhost`), using
port `8899`.Note: You might have to wait for 1 to 2 minutes in order to get some proxy ips populated in the database for the first time you use Scylla.
### JSON API
#### Proxy IP List
```bash
http://localhost:8899/api/v1/proxies
```Optional URL parameters:
| Parameters | Default value | Description |
| ----------- | ------------- | ------------------------------------------------------------ |
| `page` | `1` | The page number |
| `limit` | `20` | The number of proxies shown on each page |
| `anonymous` | `any` | Show anonymous proxies or not. Possible values:`true`, only anonymous proxies; `false`, only transparent proxies |
| `https` | `any` | Show HTTPS proxies or not. Possible values:`true`, only HTTPS proxies; `false`, only HTTP proxies |
| `countries` | None | Filter proxies for specific countries. Format example: ``US``, or multi-countries: `US,GB` |Sample result:
```json
{
"proxies": [{
"id": 599,
"ip": "91.229.222.163",
"port": 53281,
"is_valid": true,
"created_at": 1527590947,
"updated_at": 1527593751,
"latency": 23.0,
"stability": 0.1,
"is_anonymous": true,
"is_https": true,
"attempts": 1,
"https_attempts": 0,
"location": "54.0451,-0.8053",
"organization": "AS57099 Boundless Networks Limited",
"region": "England",
"country": "GB",
"city": "Malton"
}, {
"id": 75,
"ip": "75.151.213.85",
"port": 8080,
"is_valid": true,
"created_at": 1527590676,
"updated_at": 1527593702,
"latency": 268.0,
"stability": 0.3,
"is_anonymous": true,
"is_https": true,
"attempts": 1,
"https_attempts": 0,
"location": "32.3706,-90.1755",
"organization": "AS7922 Comcast Cable Communications, LLC",
"region": "Mississippi",
"country": "US",
"city": "Jackson"
},
...
],
"count": 1025,
"per_page": 20,
"page": 1,
"total_page": 52
}
```#### System Statistics
```bash
http://localhost:8899/api/v1/stats
```Sample result:
```json
{
"median": 181.2566407083,
"valid_count": 1780,
"total_count": 9528,
"mean": 174.3290085201
}
```### HTTP Forward Proxy Server
By default, Scylla will start a HTTP Forward Proxy Server on port
`8081`. This server will select one proxy updated recently from the
database and it will be used for forward proxy. Whenever an HTTP request
comes, the proxy server will select a proxy randomly.Note: HTTPS requests are not supported at present.
The example for `curl` using this proxy server is shown below:
```bash
curl http://api.ipify.org -x http://127.0.0.1:8081
```You could also use this feature with [requests][]:
```python
requests.get('http://api.ipify.org', proxies={'http': 'http://127.0.0.1:8081'})
```### Web UI
Open `http://localhost:8899` in your browser to see the Web UI of this
project.#### Proxy IP List
```
http://localhost:8899/
```Screenshot:

#### Globally Geographical Distribution Map
```
http://localhost:8899/#/geo
```Screenshot:

API Documentation
=================Please read [Module
Index](https://scylla.wildcat.io/en/latest/py-modindex.html).Roadmap
=======Please see [Projects](https://github.com/imWildCat/scylla/projects).
Development and Contribution
============================```bash
git clone https://github.com/imWildCat/scylla.git
cd scyllapip install -r requirements.txt
npm install
make assets-build
```Testing
=======If you wish to run tests locally, the commands are shown below:
```bash
pip install -r tests/requirements-test.txt
pytest tests/
```You are welcomed to add more test cases to this project, increasing the
robustness of this project.Naming of This Project
======================[Scylla](http://prisonbreak.wikia.com/wiki/Scylla) is derived from the
name of a group of memory chips in the American TV series, [Prison
Break](https://en.wikipedia.org/wiki/Prison_Break). This project was
named after this American TV series to pay tribute to it.Help
======================
[How to install Python Scylla on CentOS7](https://digcodes.com/how-to-install-python-scylla-on-centos7/)Donation
========If you find this project useful, could you please donate some money to
it?No matter how much the money is, Your donation will inspire the author
to develop new features continuously! 🎉 Thank you!The ways for donation are shown below:
GitHub Sponsor
------I super appreciate if you can join my sponsors here.
PayPal
------[](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=5DXFA7WGWPZBN)
License
=======Apache License 2.0. For more details, please read the
[LICENSE](https://github.com/imWildCat/scylla/blob/master/LICENSE) file.[Alipay and WeChat Donation]: https://user-images.githubusercontent.com/2396817/40589594-cfb0e49e-61e7-11e8-8f7d-c55a29676c40.png
[Scrapy]: https://scrapy.org
[requests]: http://docs.python-requests.org/