Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lblod/db-cleanup-service

Microservice that runs cleanup queries (both queries matching specific patterns and random queries) at set intervals defined by a cron pattern.
https://github.com/lblod/db-cleanup-service

mu-service

Last synced: 21 days ago
JSON representation

Microservice that runs cleanup queries (both queries matching specific patterns and random queries) at set intervals defined by a cron pattern.

Awesome Lists containing this project

README

        

# db-cleanup-service

Microservice that runs cleanup queries (both queries matching specific patterns and random queries) at set intervals defined by a cron pattern.

## Installation

To add the service to your stack, add the following snippet to `docker-compose.yml`:

```yaml
services:
dbcleanup:
image: lblod/db-cleanup-service:x.y.z
```

## Environment Variables

| Name | Description | Type | Default value |
| --------------------------- | ----------------------------------------------------------------------------- | --------- | ---------------------------------- |
| `MU_SPARQL_ENDPOINT` | The endpoint that will receive SPARQL queries | UrlString | 'http://database:8890/sparql' |
| `MU_APPLICATION_GRAPH` | Graph containing the cleanup jobs | UrlString | 'http://mu.semte.ch/graphs/public' |
| `CRON_PATTERN` | Default cron pattern for cleanup jobs that do not define one | String | '0 20 1 * *' |
| `PING_DB_INTERVAL` | Interval (sec) to wait before pinging to confirm if the database is running | Int | 2 |
| `SCHEDULE_ON_SERVICE_START` | Allows the cleanup jobs to be automatically scheduled when the service starts | Bool | true |

## Configuration

The cleanup service will execute cleanup jobs that are specified in the SPARQL endpoint. Each job should have the type `cleanup:Job` and at least the following properties:
- `mu:uuid`: an identifier for the job, typically the last part of its URI
- `dcterms:title`: a title describing the job
- `cleanup:selectPattern`: the pattern to match; resource to be deleted should be named `?resource`. This is used in COUNT queries and the cleanup query (a `DELETE...WHERE` query). Needs to specified in conjunction with `cleanup:deletePattern`.
- `cleanup:deletePattern`: the pattern to be deleted. Needs to specified in conjunction with `cleanup:selectPattern`.
- `cleanup:cronPattern` (optional): a cron pattern to schedule the job execution. If not provided, the default pattern will be used.
- `cleanup:randomQuery`: If a random query needs execution, specify the query here. Mutually exclusive with other patterns.
These jobs are located in the public graph (`http://mu.semte.ch/graphs/public`), and additional migrations can be added to the migration folder of your stack or directly through your SPARQL editor.

For example:

```sparql
PREFIX cleanup:
PREFIX mu:
PREFIX dcterms:

:job a cleanup:Job ;
mu:uuid "10724bc2-c9d0-4a35-a499-91a8b7cb023b" ;
dcterms:title "clean up dangling file uploads" ;
cleanup:selectPattern """
GRAPH {
?resource a ;
?source ;
?modified ;
?p ?o .

?source ?sourcep ?sourceo .

BIND(NOW() - xsd:dayTimeDuration("P1D") AS ?oneDayAgo)
FILTER(?modified <= ?oneDayAgo)
FILTER(NOT EXISTS { ?foo ?resource })
}
""" ;
cleanup:deletePattern """
GRAPH {
?resource ?p ?o .
?source ?sourcep ?sourceo .
}
""";
cleanup:cronPattern "0 0 * * *"; # Runs daily at midnight
```

Another example, which executes a random query:

```sparql
PREFIX cleanup:
PREFIX mu:
PREFIX dcterms:

:job a cleanup:Job ;
mu:uuid "ecf0c526-04bb-4e16-86a2-85b5a62cb849" ;
dcterms:title "Flush a triple in a graph" ;
cleanup:randomQuery """
DELETE DATA {
GRAPH {
.
}
}
""" ;
cleanup:cronPattern "0 0 * * *" . # Runs daily at midnight
```

**Note that a graph is specified in each pattern; this is needed in order to run the query.**

## Scheduling Feature

The service supports scheduling cleanup jobs using cron patterns. Each job can specify its own cron pattern using the `cleanup:cronPattern` property. If no pattern is specified, a default pattern from the environment variable `CRON_PATTERN` will be used.

### Default Cron Pattern

You can set a default cron pattern by defining the `CRON_PATTERN` environment variable in your configuration:

```env
CRON_PATTERN="0 5 1 * * *" # Example: runs at 5:00 AM on the first day of every month
```

### How Scheduling Works

1. **Job Definition:** Each job can specify a `cleanup:cronPattern`. If not provided, the default pattern is used.
2. **Job Scheduling:** Jobs are scheduled based on their cron pattern.
3. **Execution:** When the scheduled time arrives, the job executes, deleting resources that match the `selectPattern` and `deletePattern`.

### Locking Mechanism

To ensure that only one cleanup job runs at a time and to prevent database conflicts, the service uses a locking mechanism provided by the `async-await-mutex-lock` library.

#### How Locking Works

1. **Acquiring the Lock:** Before a job executes, it acquires a lock. This ensures that no other job can run concurrently.
2. **Job Execution:** The job executes its cleanup logic while holding the lock.
3. **Releasing the Lock:** Once the job completes or an error occurs, the lock is released, allowing the next scheduled job to run.

## REST API

### POST /cleanup

Triggers the cleanup cronjobs.

#### Response

- `201 Created` if the process was started and cronjobs were created.

### POST /disableCronjobs

Disables all cronjobs that had been triggered at an earlier point in time.

#### Response

- `200 OK` if the cronjobs were disabled successfully.

### GET /disableCronjob

Disables a single cronjob.

#### Parameters

This GET call accepts only one parameter.

- `cronJobID`: The call in this case would be `/disableCronjob?cronJobID=a53a0b8b-fdaf-41d5-a39b-20ddb3a36e6b`.

#### Response

- `406 Not Acceptable` if no or multiple parameters were passed.
- `400 Bad Request` if passed parameter is invalid.
- `200 OK` if the cronjob was successfully disabled.

### GET /runCronjob

Manually runs a single cronjob.

#### Parameters

This GET call accepts only one parameter.

- `cronJobID`: The call in this case would be `/runCronjob?cronJobID=a53a0b8b-fdaf-41d5-a39b-20ddb3a36e6b`.

#### Response

- `406 Not Acceptable` if no or multiple parameters were passed.
- `400 Bad Request` if passed parameter is invalid.
- `200 OK` if the cronjob was successfully executed.

## Development

To set up a development environment, use the following configuration in `docker-compose.yml`:

```yaml
services:
dbcleanup:
image: lblod/db-cleanup-service:x.y.z
ports:
- 8888:80
environment:
NODE_ENV: "development"
volumes:
- /path/to/the/project:/app/
```