{"id":27398835,"url":"https://github.com/abdulrahman-mh/get-proxy","last_synced_at":"2025-06-23T02:06:04.209Z","repository":{"id":286815192,"uuid":"834805702","full_name":"abdulrahman-mh/get-proxy","owner":"abdulrahman-mh","description":"Collecting, validating, and caching free proxies, very fast!","archived":false,"fork":false,"pushed_at":"2024-07-30T21:31:28.000Z","size":141,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-14T01:55:16.753Z","etag":null,"topics":["aiohttp","asyncio","concurrency","free-proxy-list","proxy","proxy-checker","proxy-list","proxy-scraper","python","rate-limit","rate-limiting","scraping","simultaneously"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abdulrahman-mh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-28T12:20:58.000Z","updated_at":"2025-04-08T22:48:48.000Z","dependencies_parsed_at":"2025-04-08T14:39:27.969Z","dependency_job_id":null,"html_url":"https://github.com/abdulrahman-mh/get-proxy","commit_stats":null,"previous_names":["abdelrahman-mh/get-proxy"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abdulrahman-mh/get-proxy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahman-mh%2Fget-proxy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahman-mh%2Fget-proxy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahman-mh%2Fget-proxy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahman-mh%2Fget-proxy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abdulrahman-mh","download_url":"https://codeload.github.com/abdulrahman-mh/get-proxy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahman-mh%2Fget-proxy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261397372,"owners_count":23152488,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aiohttp","asyncio","concurrency","free-proxy-list","proxy","proxy-checker","proxy-list","proxy-scraper","python","rate-limit","rate-limiting","scraping","simultaneously"],"created_at":"2025-04-14T01:55:14.756Z","updated_at":"2025-06-23T02:05:59.196Z","avatar_url":"https://github.com/abdulrahman-mh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# أهلا وسهلا 👋\n\n**What you will find here**: Unlimited collecting, validating, and caching free proxies. Collect form any endpoints includes: Text APIs, JSON APIs, or web pages, by simply adding URLs to the **proxy_sources.txt** file, Automatically handle collecting (scraping) validation, and caching. All this done very fast ✨\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./docs/screenshot.png\" alt=\"Description of Image\" /\u003e\n\u003c/p\u003e\n\n## Features\n\nWe support **HTTP**, **HTTPS** proxies for validating (soon Socks4 \u0026 5)\n\n- ✨ **Unique IP!**: Ensure only proxies with unique IP addresses are returned.\n\n- ⚡ **Asynchronous Power**: Asynchronously scrape URLs and validate proxies simultaneously, which will result in a very fast processing time 🚀.\n\n- 🧹 **Scraping \u0026 Collect**: Extract proxies from URLs listed in proxy_source.txt using regular expressions for Webpages, JSON, and Text content.\n\n- ✅ **Validating**: Validate proxies concurrently. We don't wait for all URLs to finish; validation happens as soon as each proxy is ready 💪.\n\n- 💾 **Caching**: Optionally cache valid proxies and set a duration for automatic revalidation.\n\n- 🐞 **Monitoring**: Track runtime details, including valid/invalid proxies, scraping status, source-specific proxy counts, and errors.\n\n## Table Of Content\n\n- [Examples](#examples-)\n- [Use ProxyConfig](#use-proxyconfig)\n- [Supported Content Types](#supported-content-types)\n- [JSON APIs](#json-apis)\n- [How To Use It](#how-to-use-it)\n- [Config](#config-)\n- [Options](#options)\n- [TO-DO List](#to-do-list-)\n\n## Examples 💡\n\nHere's basics example without any options or configuration:\n\n```py\nimport asyncio\nfrom get_proxy import ProxyFetcher  # import the module\n\n\nasync def main():\n  async with ProxyFetcher() as proxy_fetcher:\n      valid_proxies = await proxy_fetcher.get_valid_proxies()\n      # process proxies as you want\n      print(valid_proxies)\n\n\nasyncio.run(main())\n```\n\n### Use ProxyConfig():\n\nLets enable proxy caching, and set cache duration to 5m.\n\nSo, proxies will reuse as long the cache is valid, else will revalidate it.\n\n```py\nimport asyncio\nfrom get_proxy import ProxyFetcher, ProxyConfig\n\n\nasync def main():\n    config = ProxyConfig(\n        cache_enabled=True,\n        enforce_unique_ip=False,\n        cache_duration_minutes=5,\n    )\n    proxy_fetcher = ProxyFetcher(config)\n    proxies = await proxy_fetcher.get_valid_proxies()\n    print(proxies)\n\n    # after end!\n    await proxy_fetcher.close()\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n## Supported Content Types\n\nWe handle various types of content: Webpages, JSON APIs, and Text APIs.\n\n### Webpages\n\n- [https://free-proxy-list.net/](https://free-proxy-list.net/)\n- [https://www.sslproxies.org/](https://www.sslproxies.org/)\n\n### Text APIs\n\n- [https://api.proxyscrape.com/v3/free-proxy-list/get?request=displayproxies\u0026protocol=http\u0026proxy_format=protocolipport\u0026format=text\u0026timeout=20000](https://api.proxyscrape.com/v3/free-proxy-list/get?request=displayproxies\u0026protocol=http\u0026proxy_format=protocolipport\u0026format=text\u0026timeout=20000)\n- [https://spys.me/proxy.txt](https://spys.me/proxy.txt)\n\n### JSON APIs\n\nJSON sources might provide IP and port numbers in different fields. Here’s how to configure them:\n\n1. **Add a URL to your proxy resources file.**\n2. **Add the following after the URL: `json=true\u0026ip=\u003cip_field\u003e\u0026port=\u003cport_field\u003e`**\n   - Replace `\u003cip_field\u003e` with the key for the IP address.\n   - Replace `\u003cport_field\u003e` with the key for the port number.\n   - Make sure there is a space between the URL and the parameters.\n\n**Example:**\n\nIf your JSON response looks like this:\n\n```json\n[\n  {\n    \"IP\": \"314.235.43.2\",\n    \"PORT\": \"80\",\n    \"foo\": \"bar\"\n  },\n  {\"...\"},\n]\n```\n\nAnd your URL is `http://example.com/api/free-proxy?format=json`, you should write:\n\n```text\nhttp://example.com/api/free-proxy?format=json json=true\u0026ip=IP\u0026port=PORT\n```\n\n\u003e **INFO:** Ensure there is a space between the URL and the parameters.\n\n## How To Use It:\n\n- **Requirements**📋:\n\n  - aiohttp\n\n- Clone repo, and navigate to working director:\n\n```bash\ngit clone https://github.com/abdelrahman-mh/get-proxy\ncd get-proxy\n```\n\n- Setup working directory:\n\n```bash\n# create python venv (optional!) and activate it\npython3 -m venv .venv \u0026\u0026 source .venv/bin/activate\n\n# install requirement\npip install -r requirements.txt\n```\n\n- Try it!:\n\n```bash\npython3 get_proxy.py\n```\n\n## Reference 📚\n\n### `ProxyFetcher()`:\n\n```python\nProxyFetcher(config: ProxyConfig = ProxyConfig())\n```\n\n**Options**\n\n- **`config`**: ProxyConfig class! (default: `ProxyConfig()`)\n\n**Methods**\n\n- **`ProxyFetcher.get_valid_proxies() -\u003e list[str]`**: return valid proxy list ready to use\n  - Asynchronous, must call with `await` keyword\n\n### `ProxyConfig()`:\n\n```python\nProxyConfig(\n    prefix: str = \"http://\",\n    user_agent: str = \"Mozil...\",\n    ip_check_api: str = \"http://httpbin.org/ip\",\n    request_timeout: int = 15,\n    retry: int = 0,\n    concurrency_limit: int = 500,\n    proxy_sources_file: str = \"proxy_sources.txt\",\n    proxy_cache_file: str = \"proxy_cache.txt\",\n    cache_enabled: bool = False,\n    cache_duration_minutes: int = 20,\n    enforce_unique_ip: bool = True,\n    strict_x_forwarded_for: bool = False\n)\n```\n\n**Options**\n\n- **`prefix`**: Proxy URL prefix (default: `\"http://\"`).\n- **`user_agent`**: User-agent string (default: `\"Mozil...\"`).\n- **`ip_check_api`**: API for public IP check and proxy validation (default: `\"http://httpbin.org/ip\"`).\n- **`request_timeout`**: Timeout for proxy validity checks (default: `15` seconds).\n- **`retry`**: Number of retries for failed proxy requests (default: `0`).\n- **`concurrency_limit`**: Maximum concurrent proxy validation requests (default: `500`).\n- **`proxy_sources_file`**: File containing proxy source URLs (default: `\"proxy_sources.txt\"`).\n- **`proxy_cache_file`**: File for storing cached proxies (default: `\"proxy_cache.txt\"`).\n- **`cache_enabled`**: Whether to enable caching (default: `False`).\n- **`cache_duration_minutes`**: Duration for caching proxies (default: `20` minutes).\n- **`enforce_unique_ip`**: Ensure each proxy has a unique IP (default: `True`).\n- **`strict_x_forwarded_for`**: Enforce strict handling of `X-Forwarded-For` headers, there's some proxies not really hide your IP! (default: `False`).\n\n## For Developers 🛠️\n\n`PRs` are welcoming!\n\n### To-Do List 📝:\n\n- [ ] Add an option to limit the number of working proxies that returns.\n- [ ] **Design Patterns**:\n  - Use **caching** to store configurations during initialization, avoiding repeated checks at runtime.\n  - Consider patterns like **Strategy** or **Factory** to manage varying behaviors based on configuration.\n  - Implement a method for handling proxy limits and use **asyncio.as_completed()** for processing results as they finish, instead of **asyncio.gather()**.\n  - Apply these patterns to improve configuration handling for options like **enforce_unique_ip** and **cache_enabled**.\n- [ ] **Socks 4 \u0026 5**: Add support for Socks4 and Socks5 proxies.\n- [ ] Separate proxy **scraping** and **validating**\n- [ ] Add type annotations and hints to the code.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdulrahman-mh%2Fget-proxy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabdulrahman-mh%2Fget-proxy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdulrahman-mh%2Fget-proxy/lists"}