https://github.com/thiti-dev/scraperor-v2
Made scraping easy by the human-understandable $pointer to look for the interested data
https://github.com/thiti-dev/scraperor-v2
dotnet-api dotnet-core dotnet7 htmlagilitypack scraper-api
Last synced: 7 months ago
JSON representation
Made scraping easy by the human-understandable $pointer to look for the interested data
- Host: GitHub
- URL: https://github.com/thiti-dev/scraperor-v2
- Owner: Thiti-Dev
- Created: 2023-01-01T11:40:20.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-01T12:12:02.000Z (about 3 years ago)
- Last Synced: 2025-02-24T04:30:56.601Z (12 months ago)
- Topics: dotnet-api, dotnet-core, dotnet7, htmlagilitypack, scraper-api
- Language: C#
- Homepage: https://soon.wait.a.bit
- Size: 13.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## 🎓 Scraperor-v2 [scraping service]
You already know what this is
## 💡 Usage
##### ENDPOINT
```url
{{POST}}: ($DOMAIN)/api/scrape
```
### Example
##### Example body -> (Extract the bio-text from github user page)
```json
{
"website": "https://github.com/Thiti-Dev",
"pointer": {
"look_for": {
"tag": "div",
"has_classes": [
"user-profile-bio"
],
"then_look_for": {
"tag": "div"
}
}
}
}
```
###### Response
```json
{
"success": true,
"contents": [
"My github's bio, it can be any as I can change it anytime lol but for now at this commit date it was `I'm backkkk`",
]
}
```
##### Example body -> (Extract the definition from the longdo dict with the word ```kind```)
```json
{
"website": "https://dict.longdo.com/search/kind",
"pointer": {
"look_for": {
"tag": "tr",
"has_classes": ["lang-rows","lang-TH"],
"then_look_for": {
"tag": "table",
"has_classes": [
"search-result-table"
],
"then_look_for": {
"tag": "td",
"then_look_for": {
"tag": "a"
}
}
}
}
}
}
```
###### Response
```json
{
"success": true,
"contents": [
"ใจบุญ",
"เกื้อกูล",
"เมตตา",
"กรุณา"
]
}
```
## 📕 CookBook
- The ```then_look_for``` prop can be nested infinitely
- you can exclude the ```tag``` property if you are intending to look for (*)wildcard tag element
- These 2 is in implementation backlog (too lazy for now, feel free to open PRs)
- Custom Attribute-$LOOKUP
- ID-$LOOKUP