Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/namgold/crawl-tool
https://github.com/namgold/crawl-tool
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/namgold/crawl-tool
- Owner: namgold
- Created: 2020-04-05T10:09:47.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-11T00:56:07.000Z (about 2 years ago)
- Last Synced: 2023-03-09T07:31:52.968Z (almost 2 years ago)
- Language: JavaScript
- Size: 77.1 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Crawl tool
This is the static-website crawl tool. You can describe the website in JSON file and the tool will run and crawl the data.
Example:#### **`input.json`**
```javascript
{
"data": [
{
"key": "news",
"url": {
"link": "http://bkenglish.edu.vn/tin-tuc.html/p-{1..15}"
},
"url2": "http://bkenglish.edu.vn/tin-tuc.html/p-{1..15}",
"selector": ".news",
"isSelectAll": true,
"data": [
{
"key": "title",
"seletor": "h3"
},
{
"key": "description",
"seletor": ".tomtat",
"seletorToValue": "innerText"
},
{
"url": {
"selector": "h3 a",
"seletorToValue": "href"
},
"seletor": ".news-content",
"data": [
{
"key": "content",
"selector": "div ~ div"
},
{
"key": "createdDate",
"selector": "div"
}
]
}
]
}
]
}
```#### **`output.json`**
```javascript
{
"news": [
{
"title": "Khai giang chuong trinh online toeic",
"description": "Neu ban muon nhanh chong thi de ra truong thi nhanh tay dang ky chuong trinh hoc huu ich",
"image": "https://...",
"views": "1234",
"date": "03/04/2020",
"content": ""...
},
{
"title": "Khai giang chuong trinh online toeic 2",
"description": "Neu ban muon nhanh chong thi de ra truong thi nhanh tay dang ky chuong trinh hoc huu ich",
"image": "https://...",
"views": "1234",
"content": ""...
},
{
"title": "Khai giang chuong trinh online toeic 3",
"description": "Neu ban muon nhanh chong thi de ra truong thi nhanh tay dang ky chuong trinh hoc huu ich",
"image": "https://...",
"views": "1234",
"content": ""...
},
{
"title": "Khai giang chuong trinh online toeic",
"description": "Neu ban muon nhanh chong thi de ra truong thi nhanh tay dang ky chuong trinh hoc huu ich",
"image": "https://...",
"views": "1234",
"content": ""...
},
{
"title": "Khai giang chuong trinh online toeic 2",
"description": "Neu ban muon nhanh chong thi de ra truong thi nhanh tay dang ky chuong trinh hoc huu ich",
"image": "https://...",
"views": "1234",
"content": ""...
},
{
"title": "Khai giang chuong trinh online toeic 3",
"description": "Neu ban muon nhanh chong thi de ra truong thi nhanh tay dang ky chuong trinh hoc huu ich",
"image": "https://...",
"views": "1234",
"content": ""...
}
]
}
```# JSON configuration
key
Type: String
Specific key name for the output.
Example:#### **`input.json`**
```javascript
{
"key": "title",
...
}
```
#### **`output.json`**
```javascript
{
"title": "..."
}
```