{"id":14128469,"url":"https://github.com/Ensembl/ensembl-hive","last_synced_at":"2025-08-03T23:31:22.108Z","repository":{"id":12315585,"uuid":"14950379","full_name":"Ensembl/ensembl-hive","owner":"Ensembl","description":"EnsEMBL Hive - a system for creating and running pipelines on a distributed compute resource","archived":false,"fork":false,"pushed_at":"2024-02-16T16:03:17.000Z","size":74822,"stargazers_count":50,"open_issues_count":3,"forks_count":28,"subscribers_count":17,"default_branch":"version/2.6","last_synced_at":"2024-05-02T17:14:37.149Z","etag":null,"topics":["docker","docker-swarm","ehive","ensembl","high-performance-computing","htcondor","java","lsf","mysql","pbs-pro","pbspro","perl","pipeline","postgresql","python","sge","slurm","sqlite","workflow-management-system"],"latest_commit_sha":null,"homepage":"","language":"Perl","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ensembl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2013-12-05T10:20:06.000Z","updated_at":"2024-03-12T12:37:57.000Z","dependencies_parsed_at":"2023-01-16T20:00:24.415Z","dependency_job_id":"29febc2d-9a60-4395-93df-7bf697316193","html_url":"https://github.com/Ensembl/ensembl-hive","commit_stats":null,"previous_names":[],"tags_count":96,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ensembl%2Fensembl-hive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ensembl%2Fensembl-hive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ensembl%2Fensembl-hive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ensembl%2Fensembl-hive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ensembl","download_url":"https://codeload.github.com/Ensembl/ensembl-hive/tar.gz/refs/heads/version/2.6","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228571844,"owners_count":17938772,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-swarm","ehive","ensembl","high-performance-computing","htcondor","java","lsf","mysql","pbs-pro","pbspro","perl","pipeline","postgresql","python","sge","slurm","sqlite","workflow-management-system"],"created_at":"2024-08-15T16:01:45.113Z","updated_at":"2024-12-07T06:31:20.909Z","avatar_url":"https://github.com/Ensembl.png","language":"Perl","funding_links":[],"categories":["Perl"],"sub_categories":[],"readme":"eHive\n=====\n\n[![Travis Build Status](https://travis-ci.org/Ensembl/ensembl-hive.svg?branch=version/2.6)](https://travis-ci.org/Ensembl/ensembl-hive)\n[![Coverage Status](https://coveralls.io/repos/Ensembl/ensembl-hive/badge.svg?branch=version/2.6\u0026service=github)](https://coveralls.io/github/Ensembl/ensembl-hive?branch=version/2.6)\n[![Documentation Status](https://readthedocs.org/projects/ensembl-hive/badge/?version=version-2.6)](http://ensembl-hive.readthedocs.io/en/version-2.6)\n[![codecov](https://codecov.io/gh/Ensembl/ensembl-hive/branch/version%2F2.6/graph/badge.svg)](https://codecov.io/gh/Ensembl/ensembl-hive/branch/version%2F2.6)\n[![Code Climate](https://codeclimate.com/github/Ensembl/ensembl-hive/badges/gpa.svg)](https://codeclimate.com/github/Ensembl/ensembl-hive)\n[![Docker Build Status](https://img.shields.io/docker/build/ensemblorg/ensembl-hive.svg)](https://hub.docker.com/r/ensemblorg/ensembl-hive)\n\neHive is a system for running computation pipelines on distributed computing resources - clusters, farms or grids.\n\nThe name comes from the way pipelines are processed by a swarm of autonomous agents.\n\nAvailable documentation\n-----------------------\n\nThe main entry point is available online in the [user\nmanual](https://ensembl-hive.readthedocs.io/en/version-2.6/), from where it can\nbe downloaded for offline access.\n\n\neHive in a nutshell\n-------------------\n\n### Blackboard, Jobs and Workers\n\nIn the centre of each running pipeline is a database that acts as a blackboard with individual tasks to be run.\nThese tasks (we call them Jobs) are claimed and processed by \"Worker bees\" or just Workers - autonomous processes\nthat are continuously running on the compute farm and connect to the pipeline database to report about the progress of Jobs\nor claim some more. When a Worker discovers that its predefined time is up or that there are no more Jobs to do,\nit claims no more Jobs and exits the compute farm freeing the resources.\n\n### Beekeeper\n\nA separate Beekeeper process makes sure there are always enough Workers on the farm.\nIt regularly checks the states of both the blackboard and the farm and submits more Workers when needed.\nThere is no direct communication between Beekeeper and Workers, which makes the system rather fault-tolerant,\nas crashing of any of the agents for whatever reason doesn't stop the rest of the system from running.\n\n### Analyses\n\nJobs that share same code, common parameters and resource requirements are typically grouped into Analyses,\nand generally an Analysis can be viewed as a \"base class\" for the Jobs that belong to it.\nHowever in some sense an Analysis also acts as a \"container\" for them.\n\nAn analysis is implemented as a Runnable file which is a Perl, Python or\nJava module conforming to a special interface. eHive provides some basic\nRunnables, especially one that allows running arbitrary commands (programs\nand scripts written in other languages).\n\n### PipeConfig file defines Analyses and dependency rules of the pipeline\n\neHive pipeline databases are molded according to PipeConfig files which are Perl modules conforming to a special interface.\nA PipeConfig file defines the stucture of the pipeline, which is a graph whose nodes are Analyses\n(with their code, parameters and resource requirements) and edges are various dependency rules:\n\n* Dataflow rules define how data that flows out of an Analysis can be used to trigger creation of Jobs in other Analyses\n* Control rules define dependencies between Analyses as Jobs' containers (\"Jobs of Analysis Y can only start when all Jobs of Analysis X are done\")\n* Semaphore rules define dependencies between individual Jobs on a more fine-grained level\n\nThere are also other parameters of Analyses that control, for example:\n\n* how many Workers can simultaneously work on a given Analysis,\n* how many times a Job should be tried until it is considered failed,\n* what should be automatically done with a Job if it needs more memory/time,\n  etc.\n\nGrid scheduler and Meadows\n--------------------------\n\neHive has a generic interface named _Meadow_ that describes how to interact with an underlying grid scheduler (submit jobs, query job's status, etc). eHive is compatible with\n[IBM Platform LSF](http://www-03.ibm.com/systems/spectrum-computing/products/lsf/),\nSun Grid Engine (now known as Oracle Grid Engine),\n[HTCondor](https://research.cs.wisc.edu/htcondor/),\n[PBS Pro](http://www.pbspro.org),\n[Docker Swarm](https://docs.docker.com/engine/swarm/) and maybe others. Read more about this on [the user manual](http://ensembl-hive.readthedocs.io/en/version-2.6/contrib/alternative_meadows.html).\n\nDocker image\n------------\n\nWe have a Docker image available on the [Docker\nHub](https://hub.docker.com/r/ensemblorg/ensembl-hive/). It can be used to\nshowcase eHive scripts (`init_pipeline.pl`, `beekeeper.pl`, `runWorker.pl`) in a\ncontainer\n\n### Open a session in a new container (will run bash)\n\n```bash\ndocker run -it ensemblorg/ensembl-hive\n```\n\n### Initialize and run a pipeline\n\n```bash\ndocker run -it ensemblorg/ensembl-hive init_pipeline.pl Bio::EnsEMBL::Hive::Examples::LongMult::PipeConfig::LongMult_conf -pipeline_url $URL\ndocker run -it ensemblorg/ensembl-hive beekeeper.pl -url $URL -loop -sleep 0.2\ndocker run -it ensemblorg/ensembl-hive runWorker.pl -url $URL\n```\n\nDocker Swarm\n------------\n\nOnce packaged into Docker images, a pipeline can actually be run under the\nDocker Swarm orchestrator, and thus on any cloud infrastructure that supports\nit (e.g. [Amazon Web Services](https://docs.docker.com/docker-cloud/cloud-swarm/create-cloud-swarm-aws/),\n[Microsoft Azure](https://docs.docker.com/docker-cloud/cloud-swarm/create-cloud-swarm-azure/)).\n\nRead more about this on [the user manual](http://ensembl-hive.readthedocs.io/en/version-2.6/contrib/docker-swarm.html).\n\nContact us (mailing list)\n-------------------------\n\neHive was originally conceived and used within EnsEMBL Compara group\nfor running Comparative Genomics pipelines, but since then it has been separated\ninto a separate software tool and is used in many projects both in Genome Campus, Cambridge and outside.\nThere is eHive users' mailing list for questions, suggestions, discussions and announcements.\n\nTo subscribe to it please visit \u003chttp://listserver.ebi.ac.uk/mailman/listinfo/ehive-users\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEnsembl%2Fensembl-hive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEnsembl%2Fensembl-hive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEnsembl%2Fensembl-hive/lists"}