{"id":18810370,"url":"https://github.com/absaoss/hyperdrive-trigger","last_synced_at":"2025-08-25T17:10:13.497Z","repository":{"id":37059346,"uuid":"206089891","full_name":"AbsaOSS/hyperdrive-trigger","owner":"AbsaOSS","description":"Event based workflow manager.","archived":false,"fork":false,"pushed_at":"2024-05-22T12:18:18.000Z","size":5887,"stargazers_count":7,"open_issues_count":31,"forks_count":5,"subscribers_count":9,"default_branch":"develop","last_synced_at":"2024-05-22T12:25:59.809Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AbsaOSS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-03T13:54:18.000Z","updated_at":"2024-05-22T15:20:10.182Z","dependencies_parsed_at":"2024-03-08T13:45:23.815Z","dependency_job_id":"ab763fca-d054-4e64-822c-3e2b0c85f2b6","html_url":"https://github.com/AbsaOSS/hyperdrive-trigger","commit_stats":null,"previous_names":[],"tags_count":42,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbsaOSS%2Fhyperdrive-trigger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbsaOSS%2Fhyperdrive-trigger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbsaOSS%2Fhyperdrive-trigger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AbsaOSS%2Fhyperdrive-trigger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AbsaOSS","download_url":"https://codeload.github.com/AbsaOSS/hyperdrive-trigger/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223603232,"owners_count":17172064,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T23:19:59.239Z","updated_at":"2024-11-07T23:19:59.749Z","avatar_url":"https://github.com/AbsaOSS.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\n  ~\n  ~ Copyright 2018 ABSA Group Limited\n  ~\n  ~  Licensed under the Apache License, Version 2.0 (the \"License\");\n  ~  you may not use this file except in compliance with the License.\n  ~  You may obtain a copy of the License at\n  ~\n  ~      http://www.apache.org/licenses/LICENSE-2.0\n  ~\n  ~  Unless required by applicable law or agreed to in writing, software\n  ~  distributed under the License is distributed on an \"AS IS\" BASIS,\n  ~  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n  ~  See the License for the specific language governing permissions and\n  ~  limitations under the License.\n  ~\n  --\u003e\n# Hyperdrive-trigger\n\n### \u003ca name=\"build_status\"/\u003eBuild Status\n| develop |\n| ------------- |\n| [![Build Status](https://opensource.bigusdatus.com/jenkins/buildStatus/icon?job=Absa-OSS-Projects%2Fhyperdrive-trigger%2Fdevelop)](https://opensource.bigusdatus.com/jenkins/job/Absa-OSS-Projects/job/hyperdrive-trigger/job/develop/) |\n___\n\n\u003c!-- toc --\u003e\n- [What is Hyperdrive-trigger?](#what-is-hyperdrive-trigger)\n- [Requirements](#requirements)\n- [How to build and run](#how-to-build-and-run)\n    - [Application properties](#application-properties)\n    - [Embedded Tomcat](#embedded-tomcat)\n    - [Docker image](#docker-image)\n    - [Web Application Archive](#web-application-archive)\n- [User Interface](#user-interface)\n- [Development](#development)\n- [How to contribute](#how-to-contribute)\n\u003c!-- tocstop --\u003e\n\n# What is Hyperdrive-trigger?\n**Hyperdrive-trigger** is a **Event based workflow manager and scheduler**.\n\nA workflow is defined via the graphical interface and consists of three parts: **details**, **sensor** and **jobs**:\n- **Details** - General workflow's information (workflow name, project name and whether is workflow active)\n- **Sensor** - Definition of workflow trigger, describes when workflow will be executed. Sensors that could be used:\n  - *Kafka* - waits for kafka message with specific message \n  - *Time* - time based trigger, cron-quartz expression or user friendly time recurring definition can be used\n  - *Recurring* - workflow is triggered always when previous execution is finished \n- **Jobs** - list of jobs. Supported job types: \n  - *Generic Spark Job* - spark job deployed to Apache Hadoop YARN\n  - *Generic Shell Job* - job executing a user-defined shell script\n\nThe user interface provides a visualization of running workflows with the ability to monitor and troubleshoot executions.\n\n# Requirements\nTested with:\n|              |                        |\n| ------------ | ---------------------- | \n| OpenJDK      | 1.8.0_25               |\n| Scala        | 2.11                   |\n| Maven        | 3.5.4                  |\n| PostgreSQL   | 12.2_1                 | \n| Spark        | 2.4.4                  | \n| Hadoop Yarn  | 2.6.4.0                | \n| Tomcat       | 9.0.24                 | \n| Docker       | 2.3.0.5                | \n| NPM          | 6.14.4                 | \n| Angular CLI  | 9.0.3                  | \n\n\u003e Note: Docker is not required.\n\n# How to build and run\n## Application properties\nAdjusted application properties have to be provided. An application properties template file can be found at `src/main/resources/application.properties`. Application properties that can be adjusted: \n\n```\n# Version of application. \nversion=@project.version@\n# Enviroment where application will be running\nenvironment=Local\n# Maximum number of workflows that can be triggered in bulk (Default value 10)\napplication.maximumNumberOfWorkflowsInBulkRun=10\n```\n```\n# Unique app id, for easier application identification\nappUniqueId=\n```\n```\n# Health check settings\nhealth.databaseConnection.timeoutMillis=120000\nhealth.yarnConnection.testEndpoint=/cluster/cluster\nhealth.yarnConnection.timeoutMillis=120000\n```\n```\n# How will users authenticate. Available options: inmemory, ldap\nauth.mechanism=inmemory\n#If set, all users that do not have admim role will not have access to admin protected endpoints\nauth.admin.role=ROLE_ADMIN\n# INMEMORY authentication: username and password defined here will be used for authentication.\nauth.inmemory.user=hyperdriver-user\nauth.inmemory.password=hyperdriver-password\nauth.inmemory.admin.user=hyperdriver-admin-user\nauth.inmemory.admin.password=hyperdriver-admin-password\n# LDAP authentication: props template that has to be defined in case of LDAP authentication\nauth.ad.domain=\nauth.ad.server=\nauth.ldap.search.base=\nauth.ldap.search.filter=\n```\n```\n# Core properties.\n# How many threads to use for each part of the \"scheduler\".\n# Heart beat interval in milliseconds.\n# Lag threshold, before instance is deactivated by another instance.\nscheduler.autostart=true\nscheduler.thread.pool.size=10\nscheduler.sensors.thread.pool.size=20\nscheduler.sensors.changedSensorsChunkQuerySize=100\nscheduler.executors.thread.pool.size=30\nscheduler.jobs.parallel.number=100\nscheduler.heart.beat=5000\nscheduler.lag.threshold=20000\n```\n```\n# Propeties used to send notifications to users.\nnotification.enabled=false\nnotification.sender.address=\nnotification.max.retries=5\nnotification.delay=0ms\nspring.mail.host=\nspring.mail.port=\n```\n```\n# Kafka Service properties. Used for per-workflow Kafka consumers\nkafka.consumers.cache.size=50\n```\n```\n#Kafka sensor properties. Not all are required. Adjust according to your use case.\nkafkaSource.group.id.prefix=hyper_drive_${appUniqueId}\nkafkaSource.poll.duration=500\nkafkaSource.always.catchup=true\nkafkaSource.properties.key.deserializer=org.apache.kafka.common.serialization.StringDeserializer\nkafkaSource.properties.value.deserializer=org.apache.kafka.common.serialization.StringDeserializer\nkafkaSource.properties.max.poll.records=3\nkafkaSource.properties.enable.auto.commit=false\nkafkaSource.properties.auto.offset.reset=latest\nkafkaSource.properties.security.protocol=\nkafkaSource.properties.ssl.truststore.location=\nkafkaSource.properties.ssl.truststore.password=\nkafkaSource.properties.ssl.keystore.location=\nkafkaSource.properties.ssl.keystore.password=\nkafkaSource.properties.ssl.key.password=\nkafkaSource.properties.sasl.kerberos.service.name=\nkafkaSource.properties.sasl.mechanism=\nkafkaSource.properties.sasl.jaas.config=\n```\n```\n# Recurring sensor properties.\nrecurringSensor.maxJobsPerDuration=8\nrecurringSensor.duration=1h\n```\n```\n#Spark properties. Properties used to deploy and run Spark job. Not all are required. Adjust according to your use case.\n#Where spark jobs will be executed. Available options: yarn, emr.\nspark.submitApi=yarn\n\n#Submit api = YARN\nsparkYarnSink.submitTimeout=160000\nsparkYarnSink.master=yarn\nsparkYarnSink.filesToDeploy=\nsparkYarnSink.additionalConfs.spark.ui.port=\nsparkYarnSink.additionalConfs.spark.executor.extraJavaOptions=\nsparkYarnSink.additionalConfs.spark.driver.extraJavaOptions=\nsparkYarnSink.additionalConfs.spark.driver.memory=2g\nsparkYarnSink.additionalConfs.spark.executor.instances=2\nsparkYarnSink.additionalConfs.spark.executor.cores=2\nsparkYarnSink.additionalConfs.spark.executor.memory=2g\nsparkYarnSink.additionalConfs.spark.yarn.keytab=\nsparkYarnSink.additionalConfs.spark.yarn.principal=\nsparkYarnSink.additionalConfs.spark.shuffle.service.enabled=true\nsparkYarnSink.additionalConfs.spark.dynamicAllocation.enabled=true\n\n#Submit api = EMR\nspark.emr.clusterId=\nspark.emr.filesToDeploy=\nspark.emr.additionalConfs=\n\n#Common properties for Submit api = YARN and EMR\nsparkYarnSink.hadoopResourceManagerUrlBase=\nsparkYarnSink.userUsedToKillJob=\nspark.submit.thread.pool.size=10\n```\n```\n#Postgresql properties for connection to trigger database\ndb.driver=org.postgresql.Driver\ndb.url=jdbc:postgresql://\ndb.user=\ndb.password=\ndb.keepAliveConnection=true\ndb.connectionPool=HikariCP\ndb.numThreads=4\n\ndb.skip.liquibase=false\nspring.liquibase.change-log=classpath:/db_scripts/liquibase/db.changelog.yml\n```\n\n## Tomcat configuration\nThe Hadoop configuration directory needs to be added as the environment variable `HADOOP_CONF_DIR` and it has to be added to the web application's classpath.\n\n- The environment variable can be added in `\u003ctomcat-root\u003e/bin/setenv.sh`, e.g. `HADOOP_CONF_DIR=/opt/hadoop`.\n- To add the Hadoop configuration directory to the application classpath, \nin the file `\u003ctomcat-base\u003e/conf/catalina.properties`, append to the key `shared.loader` the hadoop conf dir, e.g. `shared.loader=\"/opt/hadoop\"`.\n\n### Symbolic links on user-defined files\nWith [Feature #700: Skip dag instance creation if no new message is available in Kafka](https://github.com/AbsaOSS/hyperdrive-trigger/issues/700),\nthe application needs to access files that are defined in job templates for Hyperdrive jobs. Especially, it will need \nto access any files specified under `reader.option.kafka` to configure Kafka consumers, e.g. keystore, truststore\nand keytabs under the same path as the Spark job would see them.\n\nFor example, a (resolved) job template may include\n- Additional files: `/etc/config/keystore.jks#keystore.jks`\n- App arguments: `reader.option.kafka.ssl.keystore.location=keystore.jks`\n\nIn this case, obviously `/etc/config/keystore.jks` needs to exist to submit the job, but additionally, \n`\u003ctomcat-root-directory\u003e/keystore.jks` needs to exist such that the web application can access the file under the same path\nas the Spark job would, in order to be able to create a Kafka consumer using the same configuration as the Spark job. This\nmay obviously be achieved using symbolic links.\n\nFor access to HDFS, `spark.yarn.keytab` and `spark.yarn.principal` from the application properties are used for authentication.\nNo symbolic links are required.\n\n## Embedded Tomcat\n\nFor development purposes, hyperdrive-trigger can be executed as an application with an embedded tomcat. Please check out branch **feature/embedded-tomcat-2** to use it.\n\nTo build an executable jar and execute it, use the following commands:\n- Package jar: `mvn clean package` or without tests `mvn clean package -DskipTests`\n- Execute it: `java -jar ./target/hyperdrive-trigger-\u003cVERSION\u003e.jar`\n\nAccess the application at \n```\nhttp://localhost:7123/#\n```\n\nFor local and iterative front end development, the UI can be run using a live development server with the following commands: \n- Install required packages: `cd ui \u0026\u0026 npm install`\n- Start front end application: `cd ui \u0026\u0026 ng serve` or `cd ui \u0026\u0026 npm start`\n\nAccess the application at \n```\nhttp://localhost:4200/#\n```\n \n## Docker image\n\nFrom the project root directory, run the Docker build command\n```\ndocker build -t {your-image-name:your-tag} .\n```\n\n### Building Docker with Maven\nThe Docker image can also be built using Maven, with the [https://github.com/spotify/dockerfile-maven](Spotify Dockerfile Maven plugin).\n\nFrom the project root directory, run the Maven install command with the docker profile enabled (see below):\n```\nmvn clean install \\\n  -D skipTests \\                  # Skip unit and integration tests\n  -D docker \\                     # Activate \"docker\" profile\n  -D dockerfile.repositoryUrl=my  # The name prefix of the final Docker image(s)\n```\n\nThis will create a docker image with the name `my/hyperdrive-trigger`, tagged as `{project_version-{commit-id}` and `latest`\ne.g. `my/hyperdrive-trigger:0.5.3-SNAPSHOT-6514d3f22a4dcd73a734c614db96694e7ebc6efc`, and `my/hyperdrive-trigger:latest`\n\n## Web Application Archive\n \nHyperdrive-trigger can be packaged as a Web Application Archive and executed in a web server.  \n\n- Without tests: `mvn clean package -DskipTests`\n- With unit tests: `mvn clean package`\n\nTo build Hyperdrive-Trigger without the hortonworks hadoop binaries, specify the property `exclude-hortonworks`, e.g.\n`mvn clean package -Dexclude-hortonworks`\n\n## Liquibase\nThe liquibase maven plugin may be used to issue liquibase commands. To use it, copy \n`/etc/liquibase/liquibase-maven-plugin.properties.template` to `/etc/liquibase/liquibase-maven-plugin.properties` and modify it as needed.\n\nThen, the liquibase maven plugin can be executed, e.g.\n- `mvn liquibase:status` to view the status\n- `mvn liquibase:dropAll` to drop all tables, views etc.\n- `mvn liquibase:update` to apply all pending changesets\n\n## License and formatting\n- `mvn apache-rat:check` to verify required copyright headers\n- `mvn scalafmt:format -Dformat.validateOnly=true` to validate scala files formatting\n- `mvn scalafmt:format` or `mvn scalafmt:format -Dformat.validateOnly=false` to apply correct scala file formatting\n\n# Database maintenance\n\n## Clean job instances table\nIf required, old rows from the `job_instance` table can be moved to the `archive_job_instance` table, to reduce the\ntable size of `job_instance` in order to speed up queries. The rows are first copied to the destination table and then \ndeleted from the source table. Along with job instances, referenced `dag_instance`s and `event`s are archived in \nrespective tables as well. Importantly, the `job_parameters` column of `job_instance` is not\narchived, but discarded.\n\nThe archival process can be executed with the following DB procedure\n```sql\nCALL archive_dag_instances(\ni_to_ts =\u003e (now() - interval '3 months')::timestamp,\ni_max_records =\u003e 200000,\ni_chunk_size =\u003e 10000\n);\n```\nThis would archive all dag instances (and referenced job instances and events), which were created over 12 months ago.\nIt is advisable to run this query while the database is in maintenance mode, but it can also be run otherwise.\n\n# User Interface\n- **Workflows**: Overview of all workflows.\n![](/docs/img/all_workflows.png)\n\n- **Show workflow**\n![](/docs/img/create_workflow.png)\n\n- **Create workflow**\n![](/docs/img/create_workflow.png)\n\n- **Workflow history comparison**\n![](/docs/img/history_comparison.png)\n\n- **Runs**: Overview of all workflow's runs.\n![](/docs/img/all_runs.png)\n\n- **Run Detail**\n![](/docs/img/run_detail.png)\n\n\n# How to contribute\nPlease see our [**Contribution Guidelines**](CONTRIBUTING.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabsaoss%2Fhyperdrive-trigger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabsaoss%2Fhyperdrive-trigger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabsaoss%2Fhyperdrive-trigger/lists"}