{"id":13798554,"url":"https://github.com/HiveRunner/HiveRunner","last_synced_at":"2025-05-13T05:32:23.581Z","repository":{"id":50925050,"uuid":"14613496","full_name":"HiveRunner/HiveRunner","owner":"HiveRunner","description":"An Open Source unit test framework for Hive queries based on JUnit 4 and 5","archived":false,"fork":false,"pushed_at":"2025-01-06T09:02:52.000Z","size":1402,"stargazers_count":257,"open_issues_count":1,"forks_count":78,"subscribers_count":33,"default_branch":"main","last_synced_at":"2025-05-08T03:04:11.983Z","etag":null,"topics":["hive","hive-sql","junit","klarna-featured","test-framework","testing"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HiveRunner.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE-OF-CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null}},"created_at":"2013-11-22T09:19:09.000Z","updated_at":"2025-04-24T11:42:33.000Z","dependencies_parsed_at":"2023-02-13T01:45:17.892Z","dependency_job_id":null,"html_url":"https://github.com/HiveRunner/HiveRunner","commit_stats":null,"previous_names":["klarna/hiverunner"],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HiveRunner%2FHiveRunner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HiveRunner%2FHiveRunner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HiveRunner%2FHiveRunner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HiveRunner%2FHiveRunner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HiveRunner","download_url":"https://codeload.github.com/HiveRunner/HiveRunner/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253883137,"owners_count":21978611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hive","hive-sql","junit","klarna-featured","test-framework","testing"],"created_at":"2024-08-04T00:00:45.855Z","updated_at":"2025-05-13T05:32:23.573Z","avatar_url":"https://github.com/HiveRunner.png","language":"Java","funding_links":[],"categories":["Tools","测试"],"sub_categories":["Testing"],"readme":"\n[![Maven Central](https://maven-badges.herokuapp.com/maven-central/io.github.hiverunner/hiverunner/badge.svg?subject=io.github.hiverunner:hiverunner)](https://maven-badges.herokuapp.com/maven-central/io.github.hiverunner/hiverunner) \n[![Build](https://github.com/HiveRunner/hiverunner/workflows/build/badge.svg)](https://github.com/HiveRunner/HiveRunner/actions?query=workflow:\"build\")\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\n![ScreenShot](/images/HiveRunnerSplash.png)\n\n# HiveRunner\n\nWelcome to HiveRunner - Zero installation open source unit testing of [Hive](https://hive.apache.org/) applications.\n\n[Watch the HiveRunner teaser on youtube!](http://youtu.be/B7yEAHwgi2w)\n\nWelcome to the open source project HiveRunner. HiveRunner is a unit test framework based on JUnit (4 \u0026 5) and enables \nTDD development of Hive SQL without the need for any installed dependencies. All you need is to add HiveRunner to your \n`pom.xml` as any other library and you're good to go.\n\nHiveRunner is under constant development. It is used extensively by many companies. Please feel free to suggest \nimprovements both as pull requests and as written requests.\n\n\n## Overview\n\nHiveRunner enables you to write Hive SQL as releasable tested artifacts. It will require you to parametrize and \nmodularize Hive SQL in order to make it testable. The bits and pieces of code should then be wired together with some \norchestration/workflow/build tool of your choice, to be runnable in your environment (e.g. Oozie, Pentaho, Talend, \nMaven, etc.) \n\nSo, even though your current Hive SQL probably won't run off the shelf within HiveRunner, we believe the enforced \ntestability and enabling of a TDD workflow will do as much good to the scripting world of SQL as it has for the Java \ncommunity.\n\n## Versions\n\nDifferent versions of HiveRunner target different versions of Hive as follows:\n\n| HiveRunner Version | Hive Version | Status                     | Source Code Branch                                     |\n|--------------------|--------------|----------------------------|--------------------------------------------------------|\n| 7.x                | 4.x          | New, active development    | https://github.com/HiveRunner/HiveRunner/tree/hive-4.x |\n| 6.x                | 3.x          | Stable, active development | https://github.com/HiveRunner/HiveRunner (i.e. `main`) |\n| 5.x                | 2.x          | Stable, bug fixes only     | https://github.com/HiveRunner/HiveRunner/tree/hive-2.x |\n\n\n# Cook Book\n\n## 1. Include HiveRunner\n\nHiveRunner is published to [Maven Central](https://search.maven.org/search?q=hiverunner). To start to use it, add a dependency to HiveRunner to your pom file:\n\n    \u003cdependency\u003e\n        \u003cgroupId\u003eio.github.hiverunner\u003c/groupId\u003e\n        \u003cartifactId\u003ehiverunner\u003c/artifactId\u003e\n        \u003cversion\u003e[HIVERUNNER VERSION]\u003c/version\u003e\n        \u003cscope\u003etest\u003c/scope\u003e\n    \u003c/dependency\u003e\n\nAlternatively, if you want to build from source, clone this repo and build with:\n\n     mvn install\n\nThen add the dependency as mentioned above.\n\nAlso explicitly add the surefire plugin and configure forkMode=always to avoid OutOfMemory when building big test suites.\n\n    \u003cplugin\u003e\n        \u003cgroupId\u003eorg.apache.maven.plugins\u003c/groupId\u003e\n        \u003cartifactId\u003emaven-surefire-plugin\u003c/artifactId\u003e\n        \u003cversion\u003e2.21.0\u003c/version\u003e\n        \u003cconfiguration\u003e\n            \u003cforkMode\u003ealways\u003c/forkMode\u003e\n        \u003c/configuration\u003e\n    \u003c/plugin\u003e\n\nAs an alternative if this does not solve the OOM issues, try increase the -Xmx and -XX:MaxPermSize settings. For example:\n\n    \u003cplugin\u003e\n        \u003cgroupId\u003eorg.apache.maven.plugins\u003c/groupId\u003e\n        \u003cartifactId\u003emaven-surefire-plugin\u003c/artifactId\u003e\n        \u003cversion\u003e2.21.0\u003c/version\u003e\n        \u003cconfiguration\u003e\n            \u003cforkCount\u003e1\u003c/forkCount\u003e\n            \u003creuseForks\u003efalse\u003c/reuseForks\u003e\n            \u003cargLine\u003e-Xmx2048m -XX:MaxPermSize=512m\u003c/argLine\u003e\n        \u003c/configuration\u003e\n    \u003c/plugin\u003e\n\n(please note that the forkMode option is deprecated and you should use forkCount and reuseForks instead)\n\nWith forkCount and reuseForks there is a possibility to reduce the test execution time drastically, depending on your hardware. A plugin configuration which are using one fork per CPU core and reuse threads would look like:\n\n    \u003cplugin\u003e\n        \u003cgroupId\u003eorg.apache.maven.plugins\u003c/groupId\u003e\n        \u003cartifactId\u003emaven-surefire-plugin\u003c/artifactId\u003e\n        \u003cversion\u003e2.21.0\u003c/version\u003e\n        \u003cconfiguration\u003e\n            \u003cforkCount\u003e1C\u003c/forkCount\u003e\n            \u003creuseForks\u003etrue\u003c/reuseForks\u003e\n            \u003cargLine\u003e-Xmx2048m -XX:MaxPermSize=512m\u003c/argLine\u003e\n        \u003c/configuration\u003e\n    \u003c/plugin\u003e\n\nBy default, HiveRunner uses mapreduce (mr) as the execution engine for Hive. If you wish to run using Tez, set the \nSystem property `hiveconf_hive.execution.engine` to 'tez'.\n\n(Any Hive conf property may be overridden by prefixing it with 'hiveconf_')\n        \n        \u003cplugin\u003e\n            \u003cgroupId\u003eorg.apache.maven.plugins\u003c/groupId\u003e\n            \u003cartifactId\u003emaven-surefire-plugin\u003c/artifactId\u003e\n            \u003cversion\u003e2.21.0\u003c/version\u003e\n            \u003cconfiguration\u003e\n                \u003csystemProperties\u003e\n                    \u003chiveconf_hive.execution.engine\u003etez\u003c/hiveconf_hive.execution.engine\u003e\n                    \u003chiveconf_hive.exec.counters.pull.interval\u003e1000\u003c/hiveconf_hive.exec.counters.pull.interval\u003e\n                \u003c/systemProperties\u003e\n            \u003c/configuration\u003e\n        \u003c/plugin\u003e\n\n### Timeout\nIt's possible to configure HiveRunner to make tests time out after some time and retry those tests a couple of times, but only when using `StandaloneHiveRunner` as this is not available in the `HiveRunnerExtension` (from HiveRunner 5.x and up). This is to cover for the bug\nhttps://issues.apache.org/jira/browse/TEZ-2475 that at times causes test cases to not terminate due to a lost DAG reference.\nThe timeout feature can be configured via the 'enableTimeout', 'timeoutSeconds' and 'timeoutRetries' properties.\nA configuration which enables timeouts after 30 seconds and allows 2 retries would look like:\n\n    \u003cplugin\u003e\n        \u003cgroupId\u003eorg.apache.maven.plugins\u003c/groupId\u003e\n        \u003cartifactId\u003emaven-surefire-plugin\u003c/artifactId\u003e\n        \u003cversion\u003e2.21.0\u003c/version\u003e\n        \u003cconfiguration\u003e\n            \u003csystemProperties\u003e\n                \u003cenableTimeout\u003etrue\u003c/enableTimeout\u003e\n                \u003ctimeoutSeconds\u003e30\u003c/timeoutSeconds\u003e\n                \u003ctimeoutRetries\u003e2\u003c/timeoutRetries\u003e\n            \u003c/systemProperties\u003e\n        \u003c/configuration\u003e\n    \u003c/plugin\u003e\n\n\n### Logging\n\nHiveRunner uses [SLF4J](https://www.slf4j.org/) so you should configure logging in your tests using any compatible logging framework.\n\n## 2. Look at the examples\n\nLook at the [com.klarna.hiverunner.examples.HelloHiveRunnerTest](/src/test/java/com/klarna/hiverunner/examples/HelloHiveRunnerTest.java) reference test case to get a feeling for how a typical test case looks like in JUnit5. To find JUnit4 versions of the examples, look at [com.klarna.hiverunner.examples.junit4.HelloHiveRunnerTest](/src/test/java/com/klarna/hiverunner/examples/junit4/HelloHiveRunnerTest.java).\n\nIf you're put off by the verbosity of the annotations, there's always the possibility to use HiveShell in a more interactive mode.  The [com.klarna.hiverunner.SerdeTest](/src/test/java/com/klarna/hiverunner/SerdeTest.java) adds a resource (test data) interactively with HiveShell instead of using annotations.\n\nAnnotations and interactive mode can be mixed and matched, however you'll always need to include the [com.klarna.hiverunner.annotations.HiveSQL](/src/main/java/com/klarna/hiverunner/annotations/HiveSQL.java) annotation e.g:\n\n         @HiveSQL(files = {\"serdeTest/create_table.sql\", \"serdeTest/hql_custom_serde.sql\"}, autoStart = false)\n         public HiveShell hiveShell;\n\nNote that the *autostart = false* is needed for the interactive mode. It can be left out when running with only annotations.\n\n### Sequence files\nIf you work with __sequence files__ (Or anything else than regular text files) make sure to take a look at [ResourceOutputStreamTest](/src/test/java/com/klarna/hiverunner/ResourceOutputStreamTest.java) \nfor an example of how to use the new method [HiveShell](src/main/java/com/klarna/hiverunner/HiveShell.java)\\#getResourceOutputStream to manage test input data. \n\n### Programatically create test input data\n\nTest data can be programmatically inserted into any Hive table using `HiveShell.insertInto(...)`. This seamlessly handles different storage formats and partitioning types allowing you to focus on the data required by your test scenarios:\n\n    hiveShell.execute(\"create database test_db\");\n    hiveShell.execute(\"create table test_db.test_table (\"\n        + \"c1 string,\"\n        + \"c2 string,\"\n        + \"c3 string\"\n        + \")\"\n        + \"partitioned by (p1 string)\"\n        + \"stored as orc\");\n\n    hiveShell.insertInto(\"test_db\", \"test_table\")\n        .withColumns(\"c1\", \"p1\").addRow(\"v1\", \"p1\")       // add { \"v1\", null, null, \"p1\" }\n        .withAllColumns().addRow(\"v1\", \"v2\", \"v3\", \"p1\")  // add { \"v1\", \"v2\", \"v3\", \"p1\" }\n        .copyRow().set(\"c1\", \"v4\")                        // add { \"v4\", \"v2\", \"v3\", \"p1\" }\n        .addRowsFromTsv(file)                             // parses TSV data out of a file resource\n        .addRowsFrom(file, fileParser)                    // parses custom data out of a file resource\n        .commit();\n\nSee [com.klarna.hiverunner.examples.InsertTestDataTest](/src/test/java/com/klarna/hiverunner/examples/InsertTestDataTest.java) for working examples.\n\n## 3. Understand the order of execution\n\nHiveRunner will in default mode set up and start the HiveShell before the test method is invoked. If autostart is set to false, the [HiveShell](/src/main/java/com/klarna/hiverunner/HiveShell.java) must be started manually from within the test method. Either way, HiveRunner will do the following steps when start is invoked:\n\n1. Merge any [@HiveProperties](/src/main/java/com/klarna/hiverunner/annotations/HiveProperties.java) from the test case with the Hive conf\n2. Start the HiveServer with the merged conf\n3. Copy all [@HiveResource](/src/main/java/com/klarna/hiverunner/annotations/HiveResource.java) data into the temp file area for the test\n4. Execute all fields annotated with [@HiveSetupScript](/src/main/java/com/klarna/hiverunner/annotations/HiveSetupScript.java)\n5. Execute the script files given in the [@HiveSQL](/src/main/java/com/klarna/hiverunner/annotations/HiveSQL.java) annotation\n\nThe [HiveShell](/src/main/java/com/klarna/hiverunner/HiveShell.java) field annotated with [@HiveSQL](/src/main/java/com/klarna/hiverunner/annotations/HiveSQL.java) will always be injected before the test method is invoked.\n\n\n# Hive version compatibility\n\n- This version of HiveRunner is built for Hive 3.1.2.\n- For Hive 2.x support please use HiveRunner 5.x.\n- Command shell emulations are provided to closely match the behaviour of both the Hive CLI and Beeline interactive shells. The desired emulation can be specified in your `pom.xml` file like so: \n\n        \u003cplugin\u003e\n            \u003cgroupId\u003eorg.apache.maven.plugins\u003c/groupId\u003e\n            \u003cartifactId\u003emaven-surefire-plugin\u003c/artifactId\u003e\n            \u003cversion\u003e2.21.0\u003c/version\u003e\n            \u003cconfiguration\u003e\n                \u003csystemProperties\u003e\n                    \u003c!-- Defaults to HIVE_CLI, other options include BEELINE and HIVE_CLI_PRE_V200 --\u003e\n                    \u003ccommandShellEmulator\u003eBEELINE\u003c/commandShellEmulator\u003e\n                \u003c/systemProperties\u003e\n            \u003c/configuration\u003e\n        \u003c/plugin\u003e\n\n  Or provided on the command line using a system property:\n\n      mvn -DcommandShellEmulator=BEELINE test\n\n# Future work and Limitations\n\n* HiveRunner does not allow the `add jar` statement. It is considered bad practice to keep environment specific code together with the business logic that targets HiveRunner. Keep environment specific stuff in separate files and use your build/orchestration/workflow tool to run the right files in the right order in the right environment. When running HiveRunner, all SerDes available on the classpath of the IDE/maven will be available.\n\n* HiveRunner runs Hive and Hive runs on top of Hadoop, and Hadoop has limited support for Windows machines. Installing [Cygwin](http://www.cygwin.com/ \"Cygwin\") might help out.\n\n* Currently the HiveServer spins up and tears down for every test method. As a performance option it should be possible to clean the HiveServer and metastore between each test method invocation. The choice should probably be exposed to the test writer. By switching between different strategies, side effects/leakage can be ruled out during test case debugging. See [#69](https://github.com/HiveRunner/HiveRunner/issues/69).\n\n# Known Issues\n\n### UnknownHostException\nI've had issues with UnknownHostException on OS X after upgrading my system or running Docker. \nUsually a restart of my machine solved it, but last time I got some corporate \nstuff installed the restarts stopped working and I kept getting UnknownHostExceptions. \nFollowing this simple guide solved my problem:\nhttp://crunchify.com/getting-java-net-unknownhostexception-nodename-nor-servname-provided-or-not-known-error-on-mac-os-x-update-your-privateetchosts-file/\n\n### Tez queries do not terminate\nTez will at times forget the process id of a random DAG. This will cause the query to never terminate. To get around this there is \na timeout and retry functionality implemented in HiveRunner:\n \n         \u003cplugin\u003e\n             \u003cgroupId\u003eorg.apache.maven.plugins\u003c/groupId\u003e\n             \u003cartifactId\u003emaven-surefire-plugin\u003c/artifactId\u003e\n             \u003cversion\u003e2.21.0\u003c/version\u003e\n             \u003cconfiguration\u003e\n                 \u003csystemProperties\u003e\n                     \u003cenableTimeout\u003etrue\u003c/enableTimeout\u003e\n                     \u003ctimeoutSeconds\u003e30\u003c/timeoutSeconds\u003e\n                     \u003ctimeoutRetries\u003e2\u003c/timeoutRetries\u003e\n                     \u003c/systemProperties\u003e\n             \u003c/configuration\u003e\n         \u003c/plugin\u003e\n         \nMake sure to set the timeoutSeconds to that of your slowest test in the test suite and then add some padding.\n\n# Contact\n\n# Mailing List\nIf you would like to ask any questions about or discuss HiveRunner please join our mailing list at\n\n  [https://groups.google.com/forum/#!forum/hive-runner-user](https://groups.google.com/forum/#!forum/hive-runner-user)\n\n# History\nThis project was initially developed and maintained by [Klarna](https://klarna.github.io/) and then by [Expedia Group](https://expediagroup.github.io/) before moving to its own top-level organisation on GitHub.\n\n# Legal\nThis project is available under the [Apache 2.0 License](http://www.apache.org/licenses/LICENSE-2.0.html).\n\nCopyright 2021-2024 The HiveRunner Contributors.\n\nCopyright 2013-2021 Klarna AB.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHiveRunner%2FHiveRunner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHiveRunner%2FHiveRunner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHiveRunner%2FHiveRunner/lists"}