Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hktalent/scrapysite
ScrapySite,go Web Crawler(spider), scraping,intelligence gathering
https://github.com/hktalent/scrapysite
crawler elasticsearch go scraping site spider web
Last synced: 1 day ago
JSON representation
ScrapySite,go Web Crawler(spider), scraping,intelligence gathering
- Host: GitHub
- URL: https://github.com/hktalent/scrapysite
- Owner: hktalent
- Created: 2022-04-10T04:48:39.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-11T21:54:42.000Z (about 1 year ago)
- Last Synced: 2024-06-20T02:00:40.776Z (5 months ago)
- Topics: crawler, elasticsearch, go, scraping, site, spider, web
- Language: Go
- Homepage: https://crawler.51pwn.com
- Size: 118 KB
- Stars: 4
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ScrapySite
ScrapySite
# how build
```bash
git clone [email protected]:hktalent/scrapysite.git
cd scrapysite
go build main.go
#or build for all palteform
make all -f Makefile.cross-compiles
ls -lah release/
# or build
make all
ls -lah bin/
ls -lah main
```# how use Elasticsearch
http://127.0.0.1:9200/_cat/indices?v
1、create index
```bash
./tools/CreateEs.sh scrapy
```http://127.0.0.1:9200/scrapy_index/_doc/
# how use
```bash
./main -url="http://www.xxx1.cn;http://www.xx2.cn" -resUrl="http://127.0.0.1:9200/st_index/_doc/"
```
http://127.0.0.1:9200/st_index/_search?q=edu&pretty=true