Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tetreum/xupopter_runner
Executes crawling recipes coming from Xupopter Chrome Extension.
https://github.com/tetreum/xupopter_runner
crawler scrapper scrapping webscraper
Last synced: about 1 month ago
JSON representation
Executes crawling recipes coming from Xupopter Chrome Extension.
- Host: GitHub
- URL: https://github.com/tetreum/xupopter_runner
- Owner: tetreum
- Created: 2023-01-30T18:49:58.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-02-21T14:31:13.000Z (11 months ago)
- Last Synced: 2024-02-21T16:05:27.755Z (11 months ago)
- Topics: crawler, scrapper, scrapping, webscraper
- Language: TypeScript
- Homepage:
- Size: 110 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Xupopter Runner
Executes crawling recipes coming from [Xupopter Chrome Extension](https://github.com/tetreum/xupopter_chrome_extension).## Usage
You can either use the docker container (recommended as contains both the backend and a runner) or manually run it.
### Docker
```
version: "3.3"
services:
xupopter-runner:
image: ghcr.io/tetreum/xupopter_runner:latest
container_name: xupopter-runner
ports:
- 8089:8089
environment:
- JWT_ACCESS_SECRET=SAME_SECRET_AS_XUPOPTER_CLIENT # Write the same secret that xupopter client .env has
volumes:
- /path/to/config:/app/config # Make sure your local config directory exists
- /where/i/want/to/store/scrapped_data:/app/public # Make sure your local config directory exists
```The runner will be available at `http://localhost:8089`
### Non-docker
`JWT_ACCESS_SECRET=test npm start`
The runner will be available at `http://localhost:8089`
Request sample:
```curl
curl --request POST \
--url http://localhost:8089/ \
--header 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJPbmxpbmUgSldUIEJ1aWxkZXIiLCJpYXQiOjE3MDEzNDk3NzUsImV4cCI6MTczMjg4NTc3NSwiYXVkIjoid3d3LmV4YW1wbGUuY29tIiwic3ViIjoianJvY2tldEBleGFtcGxlLmNvbSIsIkdpdmVuTmFtZSI6IkpvaG5ueSIsIlN1cm5hbWUiOiJSb2NrZXQiLCJFbWFpbCI6Impyb2NrZXRAZXhhbXBsZS5jb20iLCJSb2xlIjpbIk1hbmFnZXIiLCJQcm9qZWN0IEFkbWluaXN0cmF0b3IiXX0.Ah4sSyoF1QUD65RyMCRjYKta9dOWdEEyCNvd00CqBzM' \
--header 'Content-Type: application/json'
--data '{
"id": "home-crawler",
"recipe": {
"id": "a57ddd92-f32f-4bab-98d5-747a7193d924",
"name": "Local test",
"expected_output": "item",
"schema": 1,
"blocks": [
{
"id": "f3b85729-b967-4a3a-8cb4-f5c8d465be34",
"type": "start",
"details": {
"type": "url",
"source": "https://localhost:8080/"
}
},
{
"id": "40a9215d-7888-4ae1-b192-a2ae9ce21097",
"type": "extract",
"details": {
"name": "title",
"selector": "#results-container [class=\"item row border-bottom p-2\"] h3",
"property": "text"
}
}
]
}
}'
```Runner results can be downloaded by visiting the following url: `http://localhost:8089/public/SENT_ID/result.json`
Ex: `http://localhost:8089/public/home-crawler/result.json`
A debugging log will also be available for each run: `http://localhost:8089/public/home-crawler/info.log`