An open API service indexing awesome lists of open source software.

https://github.com/springerpe/cf-rclone-buildpack

A buildpack to provide rclone
https://github.com/springerpe/cf-rclone-buildpack

Last synced: 10 months ago
JSON representation

A buildpack to provide rclone

Awesome Lists containing this project

README

          

# cf-rclone-buildpack

Cloudfoundry buildpack to manage buckets S3, GCS ... based on [rclone](https://rclone.org/)

Functionalities of this buildpack:

* Automatically configure Rclone from AWS and GCP service brokers services
* Provide a web interface to explore the contents of the buckets
* Enable serving of remote objects via HTTP
* Clone data from one bucket to another keeping it sync periodically
* Use a Rclone server with a HTTP API

## Using it

Example `manifest.yml`:

```manifest.yml
---
applications:
- name: rclone
memory: 512M
instances: 1
stack: cflinuxfs3
random-route: true
buildpacks:
- https://github.com/SpringerPE/cf-rclone-buildpack.git
services:
- jose-rclone-gcs
- jose-rclone-aws
env:
AUTH_USER: "admin"
AUTH_PASSWORD: "admin"
CLONE_SOURCE_SERVICE: "jose-rclone-aws"
CLONE_DESTINATION_SERVICE: "jose-rclone-gcs"
CLONE_MODE: sync
CLONE_TIMER: 600
```

With this configuration, the program will run [rclone sync](https://rclone.org/commands/rclone_sync/)
to synchronize data from the bucket `jose-rclone-aws` to `jose-rclone-gcs`
every 10 minutes. As each service offers only one bucket, you do not need to
know the bucket name.

### Environment variables

The web service always requires authentication. If **AUTH_USER** is not defined,
it defaults to `admin` and **AUTH_PASSWORD** will be autogenerated and printed
in stdout (you can see it with `cf logs`) and stored in
`/home/vcap/auth/${AUTH_USER}.password`

**GCS_PROJECT_NUMBER** is predefined, but if you have your own project in GCP you will need to redefine it

**CLONE_SOURCE_SERVICE** and **CLONE_DESTINATION_SERVICE** should match the
name of the services bound to the application and both need to be set in
order to run the clone operation.

**RCLONE_MODE** is one option of:
* `copy` (default): copies data from one bucket to another, just adding files to the new bucket. It does not delete files in source neither in destination buckets. See [rclone copy](https://rclone.org/commands/rclone_copy/)
* `sync`: synchronizes data from source to destination, making both identical, modifying destination only. **Destination is updated to match source, including deleting files if necessary**. See [rclone sync](https://rclone.org/commands/rclone_sync/)
* `move`: Moves the contents of the source bucket to the destination bucket. **Source contents will deleted as soon as they are copied to destination**, [rclone move](https://rclone.org/commands/rclone_move/)

> Be careful with `CLONE_MODE=sync` or `CLONE_MODE=move`, **those are destructive options**

**CLONE_TIMER** specifies amount of seconds to wait to re-run the clone
operation, by default is `0`, so the clone process will not run periodically,
just once after the program starts. The process will wait after the previous
run has finished, it is not queuing jobs, so if the clone process takes
one hour, the next run will be in 10 minutes (see previous manifest).

Extra rclone parameters can be defined via environment variables.
See https://rclone.org/docs/#environment-variables, but be aware that the
automatic CLONE process uses the rclone API so most likely those environment
variables will be ignored.

> **This buildpack does not allow more than one instance**, deploying more
> than one, will cause the extra intances will fail.

### What if ...

#### ... my service(bucket) is not defined/available in the current platform

Just copy the environment variable `VCAP_SERVICES` from the other CF platform
and create a file called `VCAP_SERVICES` in the root of the application with
the contents of the variable. When start, the buildpack will merge the contents
of the file with the environment variable and setup the rclone configuration.

#### ... my bucket is not provided by CF service brokers, there is no VCAP_SERVICES variable

Create a rclone configuration file `rclone.conf` with the parameters of the
bucket, something like:

```
# S3 example, please fill the access key and key id
[s3-service]
type = s3
provider = AWS
access_key_id =
secret_access_key =
region = eu-central-1
location_constraint = eu-central-1
acl = private
env_auth = false

# GCS Example. Please put the `auth.json` file in the app folder
[gcs-service]
type = google cloud storage
client_id =
client_secret =
project_number =
service_account_file = /home/vcap/app/auth.json
storage_class = REGIONAL
location = europe-west4
```

Note that the bucket names are not defined in this configuration, you
have to define them in the environment variables **CLONE_SOURCE_BUCKET** or
**CLONE_DESTINATION_BUCKET** and set the variable **CLONE_SOURCE_SERVICE**
or **CLONE_DESTINATION_SERVICE** to the name of the entry between brackets
(`s3-service` or `gcs-service` -no brackets- in this example).

#### ... I need something else, other actions or more

Just create a file `post-start.sh`, like this:
```
#!/bin/bash
# $RCLONE is defined env variable, just use it to execute commands

# Example command
$RCLONE rc core/version

# Sync these 2 buckets
$RCLONE -vv rc sync/sync srcFs=s3-service:bucket1 dstFs=gcs-service:bucket2

# alternative way to do it (aysnc == true)
$RCLONE rc sync/sync --json '{ "srcFs": "s3-service:bucket1", "dstFs": "gcs-service:bucket2", "_async": true }'
```

The variables `CLONE_SOURCE_BUCKET` and `CLONE_DESTINATION_BUCKET` are
automatically defined if the counterparts SERVICE variables are provided.

If a `post-start.sh` file is found, no automatic clone operation will be performed.
You can define all kind of logic in this file, sync or async operations, it
does not matter, the file will be executed in background automatically at
startup.

## Remote objects via HTTP

Just open in a browser `https://rclone.example.app/[SERVICE_NAME:]BUCKET_NAME/`
changing `SERVICE_NAME` and `BUCKET_NAME` to the correct values.

Or using curl:

```
# Note the square brackets are escaped with \
# curl -u admin:password 'https://rclone.example.app/\[SERVICE_NAME:\]BUCKET_NAME/'
curl -u admin:password 'https://rclone.example.app/\[s3-service:\]bucket1/'
```

## Use rclone as server

You can define a lot of buckets and use rclone API to trigger actions to those
buckets (also retrieve them using HTTP). All calls must made using POST.

https://rclone.org/rc/#accessing-the-remote-control-via-http

```
curl -u admin:password -H "Content-Type: application/json" -X POST -d '{"potato":2,"sausage":1}' http://rclone.example.com/rc/noop
```

Real world example, perform a sync between 2 buckets:

```
curl -u admin:password -H "Content-Type: application/json" -X POST -d '{ "srcFs": "s3-service:bucket1", "dstFs": "gcs-service:bucket2", "_async": true }' http://rclone.example.com/sync/sync
```

# Known issues

All issues found are regarding the new web ui. It is quite new piece of software
(Aug 2019) and currently is in alpha.

* When log in, you have to introduce twice the auth settings. The second time,
in the program interface, click first on *Verify* and then *Login*.

* It allows you to visualize the contents of the buckets, see current
operations and view/delete objects. In order to see the contents of a bucket,
go to *Explorer* and type `:` and click *Open*
(yep, you need to now the name of the bucket!).

* Graph does not get refreshed after the transation is done.

# Development

Buildpack implemented using bash scripts to make it easy to understand and change.

https://docs.cloudfoundry.org/buildpacks/understand-buildpacks.html

The builpack uses the `deps` and `cache` folders according the implementation purposes,
so, the first time the buildpack is used it will download all resources, next times
it will use the cached resources.

# Author

(c) 2019 Jose Riguera Lopez
Springernature Engineering Enablement

MIT License