{"id":20038570,"url":"https://github.com/yahoo/hive-funnel-udf","last_synced_at":"2025-05-05T06:32:23.675Z","repository":{"id":66000694,"uuid":"51464880","full_name":"yahoo/hive-funnel-udf","owner":"yahoo","description":"Hive UDFs for funnel analysis","archived":false,"fork":false,"pushed_at":"2023-03-21T05:16:47.000Z","size":88,"stargazers_count":83,"open_issues_count":5,"forks_count":46,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-04-08T18:51:40.191Z","etag":null,"topics":["analytics","funnel","hadoop","hive","hive-udf","udf"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yahoo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-02-10T19:10:42.000Z","updated_at":"2025-02-26T09:30:54.000Z","dependencies_parsed_at":"2023-05-22T00:00:13.287Z","dependency_job_id":null,"html_url":"https://github.com/yahoo/hive-funnel-udf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yahoo%2Fhive-funnel-udf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yahoo%2Fhive-funnel-udf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yahoo%2Fhive-funnel-udf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yahoo%2Fhive-funnel-udf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yahoo","download_url":"https://codeload.github.com/yahoo/hive-funnel-udf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252451915,"owners_count":21750005,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","funnel","hadoop","hive","hive-udf","udf"],"created_at":"2024-11-13T10:30:03.560Z","updated_at":"2025-05-05T06:32:23.668Z","avatar_url":"https://github.com/yahoo.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hive Funnel Analysis UDFs\n\n[![Build Status](https://travis-ci.org/yahoo/hive-funnel-udf.svg?branch=master)](https://travis-ci.org/yahoo/hive-funnel-udf)\n[![Coverage Status](https://coveralls.io/repos/github/yahoo/hive-funnel-udf/badge.svg?branch=master)](https://coveralls.io/github/yahoo/hive-funnel-udf?branch=master)\n[![Apache License 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg?style=flat)](LICENSE)\n\n[Funnel analysis](https://en.wikipedia.org/wiki/Funnel_analysis) is a method for\ntracking user conversion rates across actions. This enables detection of actions\ncausing high user fallout.\n\nThese Hive UDFs enables funnel analysis to be performed simply and easily on any\nHive table.\n\n## Table of Contents\n\n  * [Requirements](#requirements)\n  * [How to build](#how-to-build)\n    * [Build JAR](#build-jar)\n    * [Register JAR with Hive](#register-jar-with-hive)\n  * [How to use](#how-to-use)\n    * [`funnel`](#funnel)\n    * [`funnel_merge`](#funnel_merge)\n    * [`funnel_conversion`](#funnel_conversion)\n    * [`funnel_fallout`](#funnel_fallout)\n  * [Security](#security)\n  * [Examples](#examples)\n    * [Simple funnel](#simple-funnel)\n    * [Simple funnel with conversion](#simple-funnel-with-conversion)\n    * [Funnel with multiple groups](#funnel-with-multiple-groups)\n    * [Multiple parallel funnels](#multiple-parallel-funnels)\n  * [Contributors](#contributors)\n  * [License](#license)\n\n## Requirements\n\n[Maven](https://maven.apache.org/index.html) is required to build the funnel\nUDFs.\n\n## How to build\n\nThere is a provided `Makefile` with all the build targets.\n\n### Build JAR\n\n```bash\nmake jar\n```\n\nThis creates a `funnel.jar` in the `target/` directory.\n\n### Register JAR with Hive\n\nTo use the funnel UDFs, you need to register it with Hive.\n\nWith temporary functions:\n\n```sql\nADD JAR funnel.jar;\nCREATE TEMPORARY FUNCTION funnel            AS 'com.yahoo.hive.udf.funnel.Funnel';\nCREATE TEMPORARY FUNCTION funnel_merge      AS 'com.yahoo.hive.udf.funnel.Merge';\nCREATE TEMPORARY FUNCTION funnel_conversion AS 'com.yahoo.hive.udf.funnel.Conversion';\nCREATE TEMPORARY FUNCTION funnel_fallout    AS 'com.yahoo.hive.udf.funnel.Fallout';\n```\n\nWith permenant functions you need to put the JAR on HDFS, and it will be registered with a database (you have to replace `DATABASE` and `PATH_TO_JAR` with your values):\n\n```sql\nCREATE FUNCTION DATABASE.funnel            AS 'com.yahoo.hive.udf.funnel.Funnel'  USING JAR 'hdfs:///PATH_TO_JAR/funnel.jar';\nCREATE FUNCTION DATABASE.funnel_merge      AS 'com.yahoo.hive.udf.funnel.Merge'   USING JAR 'hdfs:///PATH_TO_JAR/funnel.jar';\nCREATE FUNCTION DATABASE.funnel_conversion AS 'com.yahoo.hive.udf.funnel.Conversion' USING JAR 'hdfs:///PATH_TO_JAR/funnel.jar';\nCREATE FUNCTION DATABASE.funnel_fallout    AS 'com.yahoo.hive.udf.funnel.Fallout' USING JAR 'hdfs:///PATH_TO_JAR/funnel.jar';\n```\n\n## How to use\n\nThere are four funnel UDFs provided: [`funnel`](#funnel),\n[`funnel_merge`](#funnel_merge), [`funnel_conversion`](#funnel_conversion),\n[`funnel_fallout`](#funnel_fallout).\n\nThe [`funnel`](#funnel) UDF outputs an array of longs showing conversion rates\nacross the provided funnel steps.\n\nThe [`funnel_merge`](#funnel_merge) UDF merges multiple arrays of longs by\nadding them together.\n\nThe [`funnel_conversion`](#funnel_conversion) UDF takes a raw count funnel result and\nconverts it to the conversion rate.\n\nThe [`funnel_fallout`](#funnel_fallout) UDF takes a raw count funnel result and\nconverts it to the fallout rate.\n\nThere is no need to sort the data on timestamp, the UDF will take care of it. If\nthere is a collision in the timestamps, it then sorts on the action column.\n\n### `funnel`\n`funnel(action_column, timestamp_column, array(funnel_1_a, funnel_1_b), array(funnel_2), ...)`\n  - Builds a funnel report applied to the `action_column`, sorted by the\n    `timestamp_column`.\n  - The funnel steps are arrays of the same type as the `action` column. This allows\n    for multiple matches to move to the next funnel.\n    - For example, funnel_1 could be `array('register_button',\n      'facebook_invite_register')`. The funnel will match the first occurence\n      of either of these actions and proceed to the next funnel.\n    - Or, funnel_1 could just be `array('register_button')`.\n  - You can have an arbitrary number of funnels.\n  - The `timestamp_column` can be of any comparable type (Strings, Integers,\n    Dates, etc).\n\n### `funnel_merge`\n`funnel_merge(funnel_column)`\n  - Merges funnels. Use with funnel UDF.\n\n### `funnel_conversion`\n`funnel_conversion(funnel_column)`\n  - Converts the result of a [`funnel_merge`](#funnel_merge) to a conversion\n    rate.  Use with funnel and funnel_merge UDF.\n  - For example, a result from [`funnel_merge`](#funnel_merge) could look like\n    `[245, 110, 54, 13]`. This is result is in raw counts. If we pass this\n    through [`funnel_conversion`](#funnel_conversion) then it would look like\n    `[1.0, 0.44, 0.49, 0.24]`.\n\n### `funnel_fallout`\n`funnel_fallout(funnel_column)`\n  - Converts the result of a [`funnel_merge`](#funnel_merge) to a fallout rate.\n    Use with funnel and funnel_merge UDF.\n  - For example, a result from [`funnel_merge`](#funnel_merge) could look like\n    `[245, 110, 54, 13]`. This is result is in raw counts. If we pass this\n    through [`funnel_fallout`](#funnel_fallout) then it would look like `[0.0,\n    0.55, 0.50, 0.75]`.\n\n## Security\n\nOlder versions of Hive have known security issues. Keep the following issues in mind when deciding what Hive version to use when building the UDFs.  Use the following steps to mitigate these issues, or update to Hive 2.3.4 to avoid all issues at once.\n\n### [CVE-2018-11777](https://nvd.nist.gov/vuln/detail/CVE-2018-11777)\n\n#### Description\n\nIn Apache Hive 2.3.3, 3.1.0 and earlier, local resources on HiveServer2 machines are not properly protected against malicious user if ranger, sentry or sql standard authorizer is not in use.\n\n#### Resolution\n\nUpdate pom.xml to use Hive 2.3.4.\n\n### [CVE-2018-1284](https://nvd.nist.gov/vuln/detail/CVE-2018-1284)\n\n#### Description\n\nIn Apache Hive 0.6.0 to 2.3.2, malicious user might use any xpath UDFs (xpath/xpath_string/xpath_boolean/xpath_number/xpath_double/xpath_float/xpath_long/xpath_int/xpath_short) to expose the content of a file on the machine running HiveServer2 owned by HiveServer2 user (usually hive) if hive.server2.enable.doAs=false.\n\n#### Resolution\n\nUpdate pom.xml to use Hive 2.3.3 or do not set `hive.server2.enable.doAs` to `false`.\n\n### [CVE-2015-7521](https://nvd.nist.gov/vuln/detail/CVE-2015-7521)\n\n#### Description\n\nThe authorization framework in Apache Hive 1.0.0, 1.0.1, 1.1.0, 1.1.1, 1.2.0 and 1.2.1, on clusters protected by Ranger and SqlStdHiveAuthorization, allows attackers to bypass intended parent table access restrictions via unspecified partition-level operations.\n\n#### Resolution\n\nUpdate pom.xml to use Hive 1.2.2.\n\n## Examples\n\nAssume a table `user_data`:\n\n| action              | timestamp | user_id | gender |\n|---------------------|-----------|---------|--------|\n| signup_page         | 100       | 1       | f      |\n| confirm_button      | 200       | 1       | f      |\n| submit_button       | 300       | 1       | f      |\n| signup_page         | 200       | 2       | m      |\n| submit_button       | 400       | 2       | m      |\n| signup_page         | 100       | 3       | f      |\n| confirm_button      | 200       | 3       | f      |\n| decline             | 200       | 3       | f      |\n| ...                 | ...       | ...     | ...    |\n\n### Simple funnel\n\n```sql\nSELECT funnel_merge(funnel)\nFROM (SELECT funnel(action, timestamp, array('signup_page', 'email_signup'),\n                                       array('confirm_button'),\n                                       array('submit_button')) AS funnel\n      FROM user_data\n      GROUP BY user_id) t1;\n```\n\nResult: `[3, 2, 1]`\n\n### Simple funnel with conversion rate\n\n```sql\nSELECT funnel_conversion(funnel_merge(funnel))\nFROM (SELECT funnel(action, timestamp, array('signup_page'),\n                                       array('confirm_button'),\n                                       array('submit_button')) AS funnel\n      FROM user_data\n      GROUP BY user_id) t1;\n```\n\nResult: `[1.0, 0.66, 0.5]`\n\n### Funnel with multiple groups\n\n```sql\nSELECT gender, funnel_merge(funnel)\nFROM (SELECT gender,\n             funnel(action, timestamp, array('signup_page'),\n                                       array('confirm_button'),\n                                       array('submit_button')) AS funnel\n      FROM table\n      GROUP BY user_id, gender) t1\nGROUP BY gender;\n```\n\nResult: `m: [1, 0, 0], f: [2, 2, 1]`\n\n### Multiple parallel funnels\n\n```sql\nSELECT funnel_merge(funnel1), funnel_merge(funnel2)\nFROM (SELECT funnel(action, timestamp, array('signup_page'),\n                                       array('confirm_button'),\n                                       array('submit_button')) AS funnel1\n             funnel(action, timestamp, array('signup_page'),\n                                       array('decline')) AS funnel2\n      FROM table\n      GROUP BY user_id) t1;\n```\n\nResult: `[3, 2, 1] [3, 1]`\n\n## Contributors\n\nJosh Walters, [josh@joshwalters.com](mailto:josh@joshwalters.com)\n\n## License\n\n[Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyahoo%2Fhive-funnel-udf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyahoo%2Fhive-funnel-udf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyahoo%2Fhive-funnel-udf/lists"}