Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/octoparse/Octoparse
A free, client-side web scraper that turns websites into structured data without having to use code.
https://github.com/octoparse/Octoparse
Last synced: about 1 month ago
JSON representation
A free, client-side web scraper that turns websites into structured data without having to use code.
- Host: GitHub
- URL: https://github.com/octoparse/Octoparse
- Owner: octoparse
- Created: 2016-06-27T10:47:50.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-08-23T09:00:42.000Z (over 8 years ago)
- Last Synced: 2024-08-03T01:24:27.910Z (4 months ago)
- Size: 7.81 KB
- Stars: 43
- Watchers: 2
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-rainmana - octoparse/Octoparse - A free, client-side web scraper that turns websites into structured data without having to use code. (Others)
README
# Octoparse, a free web scraper
Octoparse -- A free client-side Windows web scraping software that turns unstructured or semi-structured data from websites into a structured Dataset without coding.
### [Website](http://www.octoparse.com/) [Getting Started](http://www.octoparse.com/Tutorial) [Download](http://www.octoparse.com/download/) [Contact Us](http://m.me/Octoparse)
![image](http://www.octoparse.com/media/2325/octoparse.jpg)
## [Collect Data from The Web](http://www.octoparse.com/download/)
If you can use a web browser, you can use Octoparse.Crawlers run in Octoparse are determined by the [rules](http://www.octoparse.com/blog/what-is-configuration-rule-in-octoparse/) configured. The extraction rule would tell Octoparse: which website is to be open; where is the data you plan to crawl; what kind of data you want, etc.Octoparse simulates web browsing behavior such as opening a web page, logging into an account, entering a text, pointing-and-clicking the web element, etc. Our tool allows users to easily get data by clicking the information in the built in browser.
![image](http://www.octoparse.com/media/1924/%E5%9B%BE%E7%89%8725.png)
## [Why use Octoparse](http://www.octoparse.com/product/)### Point-and-Click Interface
- Simply point and click web data
- Automatically extract all the data in similar layout.
- No coding required for most 98% websites### Deal with almost all the websites - dynamic or static
- Extract text, image URLs, links, etc.
- Extract data from listing pages, sites with infinite scrolling, pagination, etc.
- Extract data from dropdown menus
- Extract data behind log in
- Extract data loaded with AJAX, JavaScript, etc.### Extract data from websites precisely
- Automatically generates XPath
- Built-in XPath tool
- Built-in RegEx tool### [Cloud Service](http://www.octoparse.com/pricing/)
- Extract data using cloud servers 24/7
- Extract and store your data in the cloud platform
- Automatic IP rotation -- Avoiding IP being blacklisted.
- Scheduled extraction tasks### Export data in any format you like
Store the data Octoparse extracts on our cloud platform. Or export the data in any format you like:
- API
- CVS
- Excel
- HTML
- TXT
- Database(MySQL,SQLServer,Oracle)## Links
- [Website](http://www.octoparse.com/)
- [Facebook Page](https://www.facebook.com/Octoparse/)
- [Facebook group](https://www.facebook.com/groups/1700643603550408/)
- [Twitter](https://twitter.com/Octoparse)
- [G+](https://plus.google.com/u/0/106959685400674392220/posts)
- [Youtube chanel](https://www.youtube.com/channel/UCweDWm1QY2G67SDAKX7nreg)