https://github.com/mskcc/beagle
Voyager Backend
https://github.com/mskcc/beagle
Last synced: about 1 year ago
JSON representation
Voyager Backend
- Host: GitHub
- URL: https://github.com/mskcc/beagle
- Owner: mskcc
- License: apache-2.0
- Created: 2019-06-21T17:30:34.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2025-04-28T20:07:33.000Z (about 1 year ago)
- Last Synced: 2025-04-28T20:43:01.581Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 73.1 MB
- Stars: 2
- Watchers: 9
- Forks: 5
- Open Issues: 115
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Beagle
Beagle is a backend service for managing files, pipelines and runs.

## Beagle Responsibilities
- Users
- Authentication using MSKCC LDAP
- Every user will have same permissions
- Files
- List files in Beagle DB
- Search files (filename, metadata, file-type, file-group)
- Create File in Beagle DB
- FileMetadata
- Metadata is associated with file.
- Metadata versioning. Changes are tracked, and can be reverted.
- Metadata validation using JsonSchema.
- Pipelines
- Using pipelines hosted on github
- Creating RUNs from pipelines
- Run
- Creating run (choosing pipeline, choosing inputs)
- Submitting job to rabix executor
- Receiving updates about job status from rabix
- List outputs generated from run
- LIMS integration
- Periodically fetch new samples from LIMS and create File objects in Beagle DB
- Try to pair fails, and create runs
- Notify if there are some errors with files or file metadata
## `beagle_cli.py`
- Command line utility which helps handles authentication and accessing beagle endpoints.
## Setup
- Requirements
- PostgreSQL==11
- RabbitMQ
- python 3
- Instructions
- virtualenv beagle
- pip install -r requirements.txt
- setup your environment using the [environment page](docs/ENVIRONMENT_VARIABLES.md)
- python manage.py migrate
- python manage.py runserver
- Async
- Celery is used for scheduling tasks related to ETL from LIMS and submission to CWL Executor
- celery -A beagle_etl beat -l info -f beat.log (starting the periodic task)
- celery -A beagle_etl worker -l info -Q -f beagle-worker.log (starting the worker)
- celery -A beagle_etl worker --concurrency 1 -l info -Q -f scheduler-worker.log
- celery -A beagle_etl worker -l info -Q -f beagle-runner.log
Read more detailed specification on [wiki page](https://github.com/mskcc/beagle/wiki/Beagle).
# Development Instance
A development instance can be easily set up using `conda` with the following commands:
- Clone this repo:
```
git clone https://github.com/mskcc/beagle.git
cd beagle
```
- Install dependencies in the current directory with `conda`:
```
make install
```
- If using a m1 mac, install with:
```
make install-m1
```
and activate the conda environment:
```
conda activate beagle
```
- Initialize the PostgreSQL database:
```
make db-init
```
- Initialize the Django database and set an admin ('superuser') account:
```
make django-init
```
- Start Postgres, RabbitMQ, and Celery servers:
```
make start-services
```
- Start the main Django development server:
```
make runserver
```
The included Makefile will pre-populate most required environment variables needed for Beagle to run, using default settings. These settings can be changed when you invoke `make` on the command line by including them as keyword args, for example:
```
make db-init BEAGLE_DB_NAME=db-dev
```
Some environment variables needed for full functionality are not included; you should save these separately and `source` them before running the Makefile. These variables are:
```
BEAGLE_LIMS_USERNAME
BEAGLE_LIMS_PASSWORD
BEAGLE_LIMS_URL
BEAGLE_AUTH_LDAP_SERVER_URI
```
Beagle can run without these, but it will not be able to access IGO LIMS and LDAP server for authentication.