Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/speed/newcrawler
Free Web Scraping Tool with Java
https://github.com/speed/newcrawler
crawler docker scraping spider
Last synced: 3 months ago
JSON representation
Free Web Scraping Tool with Java
- Host: GitHub
- URL: https://github.com/speed/newcrawler
- Owner: speed
- Created: 2015-12-03T07:37:35.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2023-11-25T09:09:47.000Z (about 1 year ago)
- Last Synced: 2024-02-14T23:39:40.133Z (11 months ago)
- Topics: crawler, docker, scraping, spider
- Language: JavaScript
- Homepage: http://www.newcrawler.com
- Size: 144 MB
- Stars: 584
- Watchers: 31
- Forks: 115
- Open Issues: 25
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
NewCrawler
=========================Free Web Scraping Tool
NewCrawler Quick Start
==============>www.newcrawler.com
Linux
----Installing software packages on Centos / Fedora servers:
>x86
>curl -fsSL https://raw.githubusercontent.com/speed/newcrawler/master/install_i586.sh | sh
>x64
>curl -fsSL https://raw.githubusercontent.com/speed/newcrawler/master/install_x86_64.sh | sh
Installing software packages on Ubuntu / Debian servers:
>x86
>curl -fsSL https://raw.githubusercontent.com/speed/newcrawler/master/install_Debian_i586.sh | sh
>x64
>curl -fsSL https://raw.githubusercontent.com/speed/newcrawler/master/install_Debian_x86_64.sh | sh
Installing NewCrawler and Chrome software packages on Centos / Fedora servers:
>x86
>curl -fsSL https://raw.githubusercontent.com/speed/newcrawler/master/install_NewCrawler_Chrome_MySQL_x86_64.sh | sh
# OS Version 、 NewCrawler Directory
[root@localhost ~]# rpm -q centos-release
centos-release-7-0.1406.el7.centos.2.5.x86_64[root@localhost ~]# ls
install.sh newcrawler[root@localhost ~]# ls newcrawler
db jetty jre phantomjs start.sh stop.sh warModify the database to MySQL or use the default file database
#edit 'war/WEB-INF/classes/datanucleus.properties'
javax.jdo.option.ConnectionURL=jdbc:mysql://127.0.0.1:3306/newcrawler?characterEncoding=UTF-8
javax.jdo.option.ConnectionUserName=root
javax.jdo.option.ConnectionPassword=123456
Windows
---->x86
>https://github.com/speed/windows-32bit-jetty-jre
>x64
>https://github.com/speed/windows-64bit-jetty-jre
Google App Engine
---->https://github.com/speed/newcrawler-gae-shell
Docker
---->docker pull newcrawler/spider
>docker run -itd -p --net=host 8500:8500 --name=newcrawler newcrawler/spider
>docker logs -f newcrawler
Docker aliyun
---->docker run -itd -p --net=host 8500:8500 --name=newcrawler registry.cn-shenzhen.aliyuncs.com/speed/spider
Startup NewCrawler
---->sh newcrawler/start.sh &
http://127.0.0.1:8500
Shutdown NewCrawler
---->sh newcrawler/stop.sh
Upgrade NewCrawler
---->sh newcrawler/upgrade.sh
Install Chrome
----
https://github.com/speed/selenium[![ScreenShot](https://raw.githubusercontent.com/speed/resources/master/images/NewCrawler_Video.jpg)](http://www.newcrawler.com/demo.html)
NewCrawler Cluster
=========================![ScreenShot](https://raw.githubusercontent.com/speed/resources/master/images/NewCrawler%20Cluster2.png)