https://github.com/elemental-lf/chowder
All-in-one Docker image of ClamAV with Celery worker, REST API and clamd
https://github.com/elemental-lf/chowder
anti-virus celery clamav docker kubernetes rest-api
Last synced: about 2 months ago
JSON representation
All-in-one Docker image of ClamAV with Celery worker, REST API and clamd
- Host: GitHub
- URL: https://github.com/elemental-lf/chowder
- Owner: elemental-lf
- License: other
- Created: 2019-02-11T18:35:58.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-18T15:58:08.000Z (over 7 years ago)
- Last Synced: 2025-01-18T08:46:03.562Z (over 1 year ago)
- Topics: anti-virus, celery, clamav, docker, kubernetes, rest-api
- Language: Python
- Homepage:
- Size: 89.8 KB
- Stars: 2
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
[](https://travis-ci.org/elemental-lf/chowder)
# All-in-one Docker image of ClamAV with Celery worker, REST API and clamd
This repository contains a Docker image which includes the ClamAV engine and multiple different ways to access the
engine. It is intended to be deployed with Kubernetes but can also be used with Docker.
## Modes of operation
When instantiating the image as a container the mode the container should be running in needs to be specified. There
are four possible modes:
* `freshcleam`: In this mode the container runs the freshclam daemon. It updates the anti-virus databases
in the `/var/lib/clamav` directory.
* `celery-worker`: Celery is a distributed task queue framework for Python. In this mode a Celery worker is
started which publishes one task with the following signature:
`scan(fs: str, file: str, timeout: int = 3600, clamscan_options: Dict[str, str] = None, unlink: bool = False)`
The parameters for `scan` are:
* `fs`: Name of a PyFilesystem URL
* `file`: Name of a file to be scanned
* `timeout`: Timeout for the `clamscan` call
* `clamscan_options`: This is a dictionary of options that are passed directly `clamscan`. The key of the
dictionary items is the option name (without the leading dash or dashes). The value is the argument
of the respective option. If an option has no argument the value should be set to `None`.
* `unlink`: If this boolean value is set to `True` the file is unlinked after being scanned.
The task returns a tuple consisting of a boolean value indicating if a virus was found (`True`) or or
(`False`) and a multi-line string containing the output of `clamscan`.
Resources are accessed via [PyFilesystem](https://www.pyfilesystem.org/), support for accessing S3 object
stores via [`fs-s3`](https://fs-s3fs.readthedocs.io/) is included.
All resources are scanned with `clamscan` to circumvent the 4GB limit of `clamd` and of the REST API
which also connects to `clamd`. This has the disadvantage that the whole anti-virus pattern database needs to
be loaded by each invocation of `clamscan` which takes about 20 seconds (on my hardware). Furthermore to scan
S3 objects they need to be downloaded into the local filesystem in full to be scanned.
To configure the Celery workers to connect to Celery backends the Celery configuration needs to be mounted as
`/celery-worker/config/celeryconfig.py` inside the container. It contains configuration variable assignments
as per the Celery [documentation](http://docs.celeryproject.org/en/latest/userguide/configuration.html). To
get the results of the scans a results backend is needed.
The task needs to be called by name. It is possible to use `send_task` for this or to define a `signature`.
* `clamd`: This mode starts the `clamd` daemon inside the container. It listens on TCP port 3310 and on the
Unix domain socket `/var/run/clamav/clamd.sock`. The TCP port can be exposed to the outside world
if wanted. The Unix daemon socket is currently not used. This mode is untested apart from observing
a successful startup of `clamd`.
* `rest`: In this mode [Solita's ClamAV REST proxy](https://github.com/solita/clamav-rest) is started. It connects
to `clamd` via TCP on `localhost`, port 3310 so a companion `clamd` container in the same network namespace
is needed. This mode is untested apart from observing a successful startup of the proxy.
The mode needs to be supplied as single argument to the container's entry-point. This is done via the
Kubernetes `args` option in container specifications. When using `docker-compose` or Docker Swarm
this would be `command`.
## Usage with Kubernetes
To deploy Chowder with Kubernetes it is best to use the provided Helm chart. It can be found in `charts/chowder`.
If you're not using Helm the manifest templates in `charts/chowder/templates` will still be a good starting point
for building your own manifests.
The Helm chart comes with a few configuration options:
First of a all it is possible to activate or deactivate each of the at most four containers that comprise each
pod of the deployment. The `freshclam` container should normally always be present. If it does not exist the
anti-virus pattern databases which have been baked into the image at the time of its build are used and not updated.
The other options reflect the modes of operation listed above.
The configuration for the Celery worker needs to be supplied under the key `containers.celeryWorker.config`. It is
injected into the container via a `ConfigMap`.
```yaml
containers:
clamd:
enabled: false
freshclam:
enabled: true
celeryWorker:
enabled: true
config: |
[... Celery Worker configuration ...]
rest:
enabled: false
```
To use the REST API or talk to `clamd` directly the corresponding services can be activated. The port number the
respective service listens on can also be configured.
```yaml
services:
rest:
enabled: false
type: ClusterIP
port: 8080
clamd:
enabled: false
type: ClusterIP
port: 3310
```
By default the deployment consists of five pods. `clamd` and the REST API have an internal scaling mechanism each,
so one pod can handle a number of connections simultaneously. But the Celery workers is just started with one
worker process per pod, so they need to be scaled by increasing the number of `replicas`. This can be done
automatically be enabling the horizontal autoscaler below.
```yaml
replicaCount: 5
```
With the standard settings the Helm chart will use the `latest` image. For production deployment it is recommened
to specify a release version instead of using `latest`. In that case the `pullPolicy` can be set to `IfNotPresent`.
```yaml
image:
repository: elementalnet/chowder
tag: latest
pullPolicy: Always
```
For scanning files directly a data volume can be mounted into the Celery worker container:
```yaml
containers:
celeryWorker:
dataVolume:
enabled: false
# Mount path inside the Celery worker container
mountPath: /data
reference:
persistentVolumeClaim:
claimName: your-pvc
```
It is possible to specify resources for the containers. Currently all containers get the same resource allocation. This
might turn out to be suboptimal and separate resource specifications might be needed in the future. A horizontal
pod autoscaler can be enabled to adjust the number of `replicas` automatically.
```yaml
resources: {}
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
horizontalPodAutoscaler:
# Remember to set resources above if you enable this
enabled: false
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
```
The last three options relate to pod placement:
```yaml
nodeSelector: {}
tolerations: []
affinity: {}
```
## Usage with Docker
Currently there are no examples on how to use this image with `docker` or `docker-compose` or on how to deploy
it inside Docker Swarm. Contributions are welcome.
## Available images
A pre-built Docker image is present on Docker Hub under https://hub.docker.com/r/elementalnet/chowder. The current
master branch is available under the tags `latest` and `master`. Releases are available with their respective
version as the tag. All images are built automatically via Travis CI.
## Credits
This work is in part based on https://github.com/UKHomeOffice/docker-clamav. Thank you!