https://github.com/saladtechnologies/crawling-service
A simple web service for managing web crawls
https://github.com/saladtechnologies/crawling-service
Last synced: 3 months ago
JSON representation
A simple web service for managing web crawls
- Host: GitHub
- URL: https://github.com/saladtechnologies/crawling-service
- Owner: SaladTechnologies
- License: mit
- Created: 2023-10-13T15:26:35.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-05T21:31:12.000Z (almost 2 years ago)
- Last Synced: 2025-07-18T09:54:02.772Z (3 months ago)
- Language: TypeScript
- Size: 101 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# crawling-service
A simple web service for managing web crawls## Docs
Docs are available at `/docs` when the service is running.
## Docker
```
saladtechnologies/crawling-service:latest
```## Provisioning AWS Resources
You will need [OpenTofu](https://opentofu.org/) or terraform for this.
`resources.tf` expects an aws profile named "tofu" that has adequate permissions to create all of the necessary resources. You will need to rename your s3 bucket in `resources.tf` as well.```bash
tofu init
tofu apply
```This also creates an IAM user with the appropriate permissions, and exports the access key and secret key. To copy them to the clipboard, run:
**Access Key Id**
```bash
tofu output --raw crawler-service-access-key | xclip -selection clipboard
```**Secret Access Key**
```bash
tofu output --raw crawler-service-secret-key | xclip -selection clipboard
```## Build
```bash
docker buildx build \
-t saladtechnologies/crawling-service:latest \
--platform linux/amd64 \
--output type=docker \
--provenance=false \
.
```## Run
```bash
docker run --rm -it \
-p 3000:3000 \
-e PORT=3000 \
-e HOST="0.0.0.0" \
-e AWS_DEFAULT_REGION=us-east-2 \
-e AWS_PROFILE=crawler-service \
-e S3_BUCKET_NAME=salad-crawler-page-data \
-e CRAWL_TABLE_NAME=crawls \
-e PAGES_TABLE_NAME=pages \
-v ~/.aws:/root/.aws \
saladtechnologies/crawling-service:latest
```OR
```bash
docker compose up
```