Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/clemmy/proxyfarm
Command line utility that intelligently scrapes proxy lists from various sources.
https://github.com/clemmy/proxyfarm
list node proxy scrape
Last synced: 3 months ago
JSON representation
Command line utility that intelligently scrapes proxy lists from various sources.
- Host: GitHub
- URL: https://github.com/clemmy/proxyfarm
- Owner: clemmy
- License: mit
- Created: 2017-03-30T17:04:10.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-04-01T11:54:09.000Z (almost 8 years ago)
- Last Synced: 2024-06-14T21:01:18.986Z (7 months ago)
- Topics: list, node, proxy, scrape
- Language: JavaScript
- Homepage:
- Size: 1.33 MB
- Stars: 3
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Proxy Farm
`proxyfarm` is a node script that scrapes proxy lists from websites without caring for its underlying HTML structure. This allows proxy lists to be easily harvested from a large amount of sources, without implementing custom scraping logic for each source. It does this via using a PhantomJS driver along with the [Javascript Selection API](https://developer.mozilla.org/en-US/docs/Web/API/Selection). This strips away all HTML tags and makes regex matching trivial. Proxy lists can be used with things like [scrapy-proxies](https://github.com/aivarsk/scrapy-proxies) in order to bypass IP restrictions and improve web crawling speed.
![demo](demo.gif)
## Getting Started
Simply clone the repository, run `npm install`, and `node --harmony proxyfarm --in sources.txt --out proxies.txt`
NPM module coming soon!
### Arguments
| Parameter | Description |
| --- | --- |
| in | A text file with line delimited urls to scrape proxies from. See [defaults/sources.txt](defaults/sources.txt) for an example. |
| out | The path to save the scraped proxy list to, in the format `:` |### Prerequisites
- Node.js v6.x and later
## Running the tests
Coming soon!
## Contributing
There are many ways that you can contribute:
- **Improving documentation** - Submit a pull request with the fixes.
- **Requesting a feature** - Simply create a new issue with the said feature.
- **Suggesting a proxy list source** - Create a new issue mentioning the new source.
- **Report a bug** - Find a problem? Create an issue with your environment, screenshot of the error, and reproduction steps.
- **Fix a bug** - All help appreciated!## Future Roadmap
- Validating the scraped proxy list
- Detecting anonymity, speed, and country of the proxy list
- Automatic crawling of websites rather than manually specifying all proxy lists
- Handling of ajax pages## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details