{"id":15030325,"url":"https://github.com/imwildcat/scylla","last_synced_at":"2025-05-13T00:06:52.264Z","repository":{"id":37406037,"uuid":"128911431","full_name":"imWildCat/scylla","owner":"imWildCat","description":"Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era","archived":false,"fork":false,"pushed_at":"2025-02-20T16:27:00.000Z","size":696,"stargazers_count":4003,"open_issues_count":47,"forks_count":476,"subscribers_count":77,"default_branch":"main","last_synced_at":"2025-05-13T00:06:38.687Z","etag":null,"topics":["crawler","proxy-pool","python","python3","scylla"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/imWildCat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-10T09:55:11.000Z","updated_at":"2025-05-10T13:53:08.000Z","dependencies_parsed_at":"2023-11-18T16:28:25.764Z","dependency_job_id":"9bcaaf91-a5f9-48d8-bb41-51f2b4f9e417","html_url":"https://github.com/imWildCat/scylla","commit_stats":{"total_commits":368,"total_committers":16,"mean_commits":23.0,"dds":"0.19836956521739135","last_synced_commit":"b051fd586f2e3268bb07f8d94a0b27dce01dea12"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imWildCat%2Fscylla","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imWildCat%2Fscylla/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imWildCat%2Fscylla/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imWildCat%2Fscylla/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/imWildCat","download_url":"https://codeload.github.com/imWildCat/scylla/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253843215,"owners_count":21972873,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","proxy-pool","python","python3","scylla"],"created_at":"2024-09-24T20:13:06.402Z","updated_at":"2025-05-13T00:06:52.243Z","avatar_url":"https://github.com/imWildCat.png","language":"Python","funding_links":["https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=5DXFA7WGWPZBN","https://github.com/sponsors/imWildCat"],"categories":[],"sub_categories":[],"readme":"![banner_scylla](https://github.com/imWildCat/scylla/assets/2396817/62498a29-8105-4281-8eb0-73436d4ed5b0) [![Build Status](https://travis-ci.org/imWildCat/scylla.svg?branch=master)](https://travis-ci.org/imWildCat/scylla)\n[![codecov](https://codecov.io/gh/imWildCat/scylla/branch/master/graph/badge.svg)](https://codecov.io/gh/imWildCat/scylla)\n[![Documentation Status](https://readthedocs.org/projects/scylla-py/badge/?version=latest)](https://scylla.wildcat.io/en/latest/?badge=latest)\n[![PyPI version](https://badge.fury.io/py/scylla.svg)](https://badge.fury.io/py/scylla)\n[![Docker Pull](https://img.shields.io/docker/pulls/wildcat/scylla.svg)](https://hub.docker.com/r/wildcat/scylla/)\n[![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=5DXFA7WGWPZBN)\n\n\n# Scylla\n\nAn intelligent proxy pool for humanities, to extract content from the internet and build your own Large Language Models in this new AI era.\n\nKey features:\n\n- Automatic proxy ip crawling and validation\n- Easy-to-use JSON API\n- Simple but beautiful web-based user interface (eg. geographical\n    distribution of proxies)\n- Get started with only **1 command** minimally\n- Simple HTTP Forward proxy server\n- [Scrapy] and [requests] integration with only 1 line of code\n    minimally\n- Headless browser crawling\n\n\nGet started\n===========\n\nInstallation\n------------\n\n### Install with Docker (highly recommended)\n\n```bash\ndocker run -d -p 8899:8899 -p 8081:8081 -v /var/www/scylla:/var/www/scylla --name scylla wildcat/scylla:latest\n```\n\n### Install directly via pip\n\n```bash\npip install scylla\nscylla --help\nscylla # Run the crawler and web server for JSON API\n```\n\n### Install from source\n\n```bash\ngit clone https://github.com/imWildCat/scylla.git\ncd scylla\n\npip install -r requirements.txt\n\ncd frontend\nnpm install\ncd ..\n\nmake assets-build\n\npython -m scylla\n```\n\nUsage\n-----\n\nThis is an example of running a service locally (`localhost`), using\nport `8899`.\n\nNote: You might have to wait for 1 to 2 minutes in order to get some proxy ips populated in the database for the first time you use Scylla.\n\n### JSON API\n\n#### Proxy IP List\n\n```bash\nhttp://localhost:8899/api/v1/proxies\n```\n\nOptional URL parameters:\n\n| Parameters  | Default value | Description                                                  |\n| ----------- | ------------- | ------------------------------------------------------------ |\n| `page`      | `1`           | The page number                                              |\n| `limit`     | `20`          | The number of proxies shown on each page                     |\n| `anonymous` | `any`         | Show anonymous proxies or not. Possible values：`true`, only anonymous proxies; `false`, only transparent proxies |\n| `https`     | `any` | Show HTTPS proxies or not. Possible values：`true`, only HTTPS proxies; `false`, only HTTP proxies |\n| `countries`   | None | Filter proxies for specific countries. Format example: ``US``, or multi-countries: `US,GB` |\n\nSample result:\n\n```json\n{\n    \"proxies\": [{\n        \"id\": 599,\n        \"ip\": \"91.229.222.163\",\n        \"port\": 53281,\n        \"is_valid\": true,\n        \"created_at\": 1527590947,\n        \"updated_at\": 1527593751,\n        \"latency\": 23.0,\n        \"stability\": 0.1,\n        \"is_anonymous\": true,\n        \"is_https\": true,\n        \"attempts\": 1,\n        \"https_attempts\": 0,\n        \"location\": \"54.0451,-0.8053\",\n        \"organization\": \"AS57099 Boundless Networks Limited\",\n        \"region\": \"England\",\n        \"country\": \"GB\",\n        \"city\": \"Malton\"\n    }, {\n        \"id\": 75,\n        \"ip\": \"75.151.213.85\",\n        \"port\": 8080,\n        \"is_valid\": true,\n        \"created_at\": 1527590676,\n        \"updated_at\": 1527593702,\n        \"latency\": 268.0,\n        \"stability\": 0.3,\n        \"is_anonymous\": true,\n        \"is_https\": true,\n        \"attempts\": 1,\n        \"https_attempts\": 0,\n        \"location\": \"32.3706,-90.1755\",\n        \"organization\": \"AS7922 Comcast Cable Communications, LLC\",\n        \"region\": \"Mississippi\",\n        \"country\": \"US\",\n        \"city\": \"Jackson\"\n    },\n    ...\n    ],\n    \"count\": 1025,\n    \"per_page\": 20,\n    \"page\": 1,\n    \"total_page\": 52\n}\n```\n\n#### System Statistics\n\n```bash\nhttp://localhost:8899/api/v1/stats\n```\n\nSample result:\n\n```json\n{\n    \"median\": 181.2566407083,\n    \"valid_count\": 1780,\n    \"total_count\": 9528,\n    \"mean\": 174.3290085201\n}\n```\n\n### HTTP Forward Proxy Server\n\nBy default, Scylla will start a HTTP Forward Proxy Server on port\n`8081`. This server will select one proxy updated recently from the\ndatabase and it will be used for forward proxy. Whenever an HTTP request\ncomes, the proxy server will select a proxy randomly.\n\nNote: HTTPS requests are not supported at present.\n\nThe example for `curl` using this proxy server is shown below:\n\n```bash\ncurl http://api.ipify.org -x http://127.0.0.1:8081\n```\n\nYou could also use this feature with [requests][]:\n\n```python\nrequests.get('http://api.ipify.org', proxies={'http': 'http://127.0.0.1:8081'})\n```\n\n### Web UI\n\nOpen `http://localhost:8899` in your browser to see the Web UI of this\nproject.\n\n#### Proxy IP List\n\n```\nhttp://localhost:8899/\n```\n\nScreenshot:\n\n![screenshot-proxy-list](https://user-images.githubusercontent.com/2396817/40653600-946eae6e-6333-11e8-8bbd-9d2f347c5461.png)\n\n#### Globally Geographical Distribution Map\n\n```\nhttp://localhost:8899/#/geo\n```\n\nScreenshot:\n\n![screenshot-geo-distribution](https://user-images.githubusercontent.com/2396817/40653599-9458b6b8-6333-11e8-8e6e-1d90271fc083.png)\n\nAPI Documentation\n=================\n\nPlease read [Module\nIndex](https://scylla.wildcat.io/en/latest/py-modindex.html).\n\nRoadmap\n=======\n\nPlease see [Projects](https://github.com/imWildCat/scylla/projects).\n\nDevelopment and Contribution\n============================\n\n```bash\ngit clone https://github.com/imWildCat/scylla.git\ncd scylla\n\npip install -r requirements.txt\n\nnpm install\nmake assets-build\n```\n\nTesting\n=======\n\nIf you wish to run tests locally, the commands are shown below:\n\n```bash\npip install -r tests/requirements-test.txt\npytest tests/\n```\n\nYou are welcomed to add more test cases to this project, increasing the\nrobustness of this project.\n\nNaming of This Project\n======================\n\n[Scylla](http://prisonbreak.wikia.com/wiki/Scylla) is derived from the\nname of a group of memory chips in the American TV series, [Prison\nBreak](https://en.wikipedia.org/wiki/Prison_Break). This project was\nnamed after this American TV series to pay tribute to it.\n\nHelp\n======================\n[How to install Python Scylla on CentOS7](https://digcodes.com/how-to-install-python-scylla-on-centos7/)\n\n\nDonation\n========\n\nIf you find this project useful, could you please donate some money to\nit?\n\nNo matter how much the money is, Your donation will inspire the author\nto develop new features continuously! 🎉 Thank you!\n\nThe ways for donation are shown below:\n\nGitHub Sponsor\n------\n\nI super appreciate if you can join my sponsors here.\n\n\u003chttps://github.com/sponsors/imWildCat\u003e\n\nPayPal\n------\n\n[![paypal_donation](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=5DXFA7WGWPZBN)\n\n\nLicense\n=======\n\nApache License 2.0. For more details, please read the\n[LICENSE](https://github.com/imWildCat/scylla/blob/master/LICENSE) file.\n\n[Alipay and WeChat Donation]: https://user-images.githubusercontent.com/2396817/40589594-cfb0e49e-61e7-11e8-8f7d-c55a29676c40.png\n\n\n  [Scrapy]: https://scrapy.org\n  [requests]: http://docs.python-requests.org/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimwildcat%2Fscylla","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimwildcat%2Fscylla","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimwildcat%2Fscylla/lists"}