https://github.com/saladtechnologies/crawling-service

A simple web service for managing web crawls
https://github.com/saladtechnologies/crawling-service

Last synced: 3 months ago
JSON representation

A simple web service for managing web crawls

Host: GitHub
URL: https://github.com/saladtechnologies/crawling-service
Owner: SaladTechnologies
License: mit
Created: 2023-10-13T15:26:35.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-12-05T21:31:12.000Z (almost 2 years ago)
Last Synced: 2025-07-18T09:54:02.772Z (3 months ago)
Language: TypeScript
Size: 101 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# crawling-service
A simple web service for managing web crawls

## Docs

Docs are available at `/docs` when the service is running.

## Docker

```
saladtechnologies/crawling-service:latest
```

## Provisioning AWS Resources

You will need [OpenTofu](https://opentofu.org/) or terraform for this.
`resources.tf` expects an aws profile named "tofu" that has adequate permissions to create all of the necessary resources. You will need to rename your s3 bucket in `resources.tf` as well.

```bash
tofu init
tofu apply
```

This also creates an IAM user with the appropriate permissions, and exports the access key and secret key. To copy them to the clipboard, run:

**Access Key Id**
```bash
tofu output --raw crawler-service-access-key | xclip -selection clipboard
```

**Secret Access Key**
```bash
tofu output --raw crawler-service-secret-key | xclip -selection clipboard
```

## Build

```bash
docker buildx build \
-t saladtechnologies/crawling-service:latest \
--platform linux/amd64 \
--output type=docker \
--provenance=false \
.
```

## Run

```bash
docker run --rm -it \
-p 3000:3000 \
-e PORT=3000 \
-e HOST="0.0.0.0" \
-e AWS_DEFAULT_REGION=us-east-2 \
-e AWS_PROFILE=crawler-service \
-e S3_BUCKET_NAME=salad-crawler-page-data \
-e CRAWL_TABLE_NAME=crawls \
-e PAGES_TABLE_NAME=pages \
-v ~/.aws:/root/.aws \
saladtechnologies/crawling-service:latest
```

```bash
docker compose up
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/saladtechnologies/crawling-service

Awesome Lists containing this project

README