{"id":15334229,"url":"https://github.com/yaooqinn/spark-authorizer","last_synced_at":"2025-04-13T09:37:47.095Z","repository":{"id":27519430,"uuid":"113544220","full_name":"yaooqinn/spark-authorizer","owner":"yaooqinn","description":"A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apache Kyuubi","archived":false,"fork":false,"pushed_at":"2022-04-06T02:17:03.000Z","size":1686,"stargazers_count":173,"open_issues_count":20,"forks_count":80,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-03-27T01:09:52.899Z","etag":null,"topics":["acl","hive","ranger","ranger-hive-plugin","spark"],"latest_commit_sha":null,"homepage":"https://yaooqinn.github.io/spark-authorizer/","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yaooqinn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-12-08T07:17:16.000Z","updated_at":"2025-02-12T06:36:34.000Z","dependencies_parsed_at":"2022-08-07T13:00:22.610Z","dependency_job_id":null,"html_url":"https://github.com/yaooqinn/spark-authorizer","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yaooqinn%2Fspark-authorizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yaooqinn%2Fspark-authorizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yaooqinn%2Fspark-authorizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yaooqinn%2Fspark-authorizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yaooqinn","download_url":"https://codeload.github.com/yaooqinn/spark-authorizer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248691708,"owners_count":21146424,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acl","hive","ranger","ranger-hive-plugin","spark"],"created_at":"2024-10-01T10:06:21.722Z","updated_at":"2025-04-13T09:37:47.074Z","avatar_url":"https://github.com/yaooqinn.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Spark Authorizer [![Build Status](https://travis-ci.org/yaooqinn/spark-authorizer.svg?branch=master)](https://travis-ci.org/yaooqinn/spark-authorizer) [![HitCount](http://hits.dwyl.io/yaooqinn/spark-authorizer.svg)](http://hits.dwyl.io/yaooqinn/spark-authorizer)\n\n**Spark Authorizer** provides you with *SQL Standard Based Authorization* for [Apache Spark™](http://spark.apache.org) \nas same as [SQL Standard Based Hive Authorization](https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization). \nWhile you are using Spark SQL or Dataset/DataFrame API to load data from tables embedded with [Apache Hive™](https://hive.apache.org) metastore, \nthis library provides row/column level fine-grained access controls by [Apache Ranger™](https://ranger.apache.org) or Hive SQL Standard Based Authorization.\n\nSecurity is one of fundamental features for enterprise adoption. [Apache Ranger™](https://ranger.apache.org) offers many security plugins for many Hadoop ecosystem components, \nsuch as HDFS, Hive, HBase, Solr and Sqoop2. However, [Apache Spark™](http://spark.apache.org) is not counted in yet. \nWhen a secured HDFS cluster is used as a data warehouse accessed by various users and groups via different applications wrote by Spark and Hive, \nit is very difficult to guarantee data management in a consistent way.  Apache Spark users visit data warehouse only \nwith Storage based access controls offered by HDFS. This library shares [Ranger Hive plugin](https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+0.5.0+Installation#ApacheRanger0.5.0Installation-InstallingApacheHive(1.2.0)) \nwith Hive to help Spark talking to Ranger Admin. \n\nPlease refer to [ACL Management for Spark SQL](https://yaooqinn.github.io/spark-authorizer/docs/spark_sql_authorization.html) to see what spark-authorizer supports.\n\n## Quick Start\n\n### Step 1. Install Spark Authorizer\n\nInclude this package in your Spark Applications using:\n#### spark-shell, pyspark, or spark-submit\n```bash\n\u003e $SPARK_HOME/bin/spark-shell --packages yaooqinn:spark-authorizer:2.1.1\n```\n#### sbt\nIf you use the sbt-spark-package plugin, in your sbt build file, add:\n```sbtshell\nspDependencies += \"yaooqinn/spark-authorizer:2.1.1\"\n```\nOtherwise,\n```sbtshell\nresolvers += \"Spark Packages Repo\" at \"http://dl.bintray.com/spark-packages/maven\"\n\nlibraryDependencies += \"yaooqinn\" % \"spark-authorizer\" % \"2.1.1\"\n```\n\n#### Maven\nIn your pom.xml, add:\n```xml\n\u003cdependencies\u003e\n  \u003c!-- list of dependencies --\u003e\n  \u003cdependency\u003e\n    \u003cgroupId\u003eyaooqinn\u003c/groupId\u003e\n    \u003cartifactId\u003espark-authorizer\u003c/artifactId\u003e\n    \u003cversion\u003e2.1.1\u003c/version\u003e\n  \u003c/dependency\u003e\n\u003c/dependencies\u003e\n\u003crepositories\u003e\n  \u003c!-- list of other repositories --\u003e\n  \u003crepository\u003e\n    \u003cid\u003eSparkPackagesRepo\u003c/id\u003e\n    \u003curl\u003ehttp://dl.bintray.com/spark-packages/maven\u003c/url\u003e\n  \u003c/repository\u003e\n\u003c/repositories\u003e\n```\n\n#### Manully\nIf you [Building Spark Authorizer](https://yaooqinn.github.io/spark-authorizer/docs/building-spark-authorizer.html) manully, you can deploy via:\n```bash\ncp target/spark-authorizer-\u003cversion\u003e.jar $SPARK_HOME/jars\n```\n\n### Step 2. Install \u0026 Configure Ranger Hive Plugin\n\nPlease refer to [Install Ranger Hive Plugin For Apache Spark](https://yaooqinn.github.io/spark-authorizer/docs/install_plugin.html) to learn how to deploy the plugin jars to Apache Spark and set Ranger/Hive configurations.\n\n### Step 3. Enable Spark Authorizer\n\nIn `$SPARK_HOME/conf/spark-defaults.conf`, add:\n\n```scala\nspark.sql.extensions=org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension\n```\n**NOTE** `spark.sql.extensions` is only supported by Spark 2.2.x and later, for Spark 2.1.x please use [Version: 1.1.3.spark2.1](https://github.com/yaooqinn/spark-authorizer/tree/78f7d818db773c3567c636575845a413ac560c90) and check the previous doc.\n\n## Interactive Spark Shell\n\nThe easiest way to start using Spark is through the Scala shell:\n\n```shell\nbin/spark-shell --master yarn --proxy-user hzyaoqin\n```\n\n## Suffer for the Authorization Pain \n\nWe create a ranger policy as below:\n![ranger-policy-details](docs/img/ranger-prolcy-details.png)\n\nCheck Privilege with some simple cases.\n\n#### Show databases\n\n```sql\nscala\u003e spark.sql(\"show databases\").show\n+--------------+\n|  databaseName|\n+--------------+\n|       default|\n| spark_test_db|\n| tpcds_10g_ext|\n+--------------+\n```\n\n#### Switch database\n\n```sql\nscala\u003e spark.sql(\"use spark_test_db\").show\n17/12/08 17:06:17 ERROR optimizer.Authorizer:\n+===============================+\n|Spark SQL Authorization Failure|\n|-------------------------------|\n|Permission denied: user [hzyaoqin] does not have [USE] privilege on [spark_test_db]\n|-------------------------------|\n|Spark SQL Authorization Failure|\n+===============================+\n```\nOops...\n\n\n```sql\nscala\u003e spark.sql(\"use tpcds_10g_ext\").show\n++\n||\n++\n++\n```\nLOL...\n\n\n### Select \n```sql\nscala\u003e spark.sql(\"select cp_type from catalog_page limit 1\").show\n17/12/08 17:09:58 ERROR optimizer.Authorizer:\n+===============================+\n|Spark SQL Authorization Failure|\n|-------------------------------|\n|Permission denied: user [hzyaoqin] does not have [SELECT] privilege on [tpcds_10g_ext/catalog_page/cp_type]\n|-------------------------------|\n|Spark SQL Authorization Failure|\n+===============================+\n```\nOops...\n\n```sql\nscala\u003e spark.sql(\"select * from call_center limit 1\").show\n+-----------------+-----------------+-----------------+---------------+-----------------+---------------+--------+--------+------------+--------+--------+-----------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-------+-----------------+--------+------+-------------+-------------+-----------------+\n|cc_call_center_sk|cc_call_center_id|cc_rec_start_date|cc_rec_end_date|cc_closed_date_sk|cc_open_date_sk| cc_name|cc_class|cc_employees|cc_sq_ft|cc_hours| cc_manager|cc_mkt_id|        cc_mkt_class|         cc_mkt_desc|cc_market_manager|cc_division|cc_division_name|cc_company|cc_company_name|cc_street_number|cc_street_name|cc_street_type|cc_suite_number|cc_city|        cc_county|cc_state|cc_zip|   cc_country|cc_gmt_offset|cc_tax_percentage|\n+-----------------+-----------------+-----------------+---------------+-----------------+---------------+--------+--------+------------+--------+--------+-----------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-------+-----------------+--------+------+-------------+-------------+-----------------+\n|                1| AAAAAAAABAAAAAAA|       1998-01-01|           null|             null|        2450952|NY Metro|   large|           2|    1138| 8AM-4PM|Bob Belcher|        6|More than other a...|Shared others cou...|      Julius Tran|          3|             pri|         6|          cally|             730|      Ash Hill|     Boulevard|        Suite 0| Midway|Williamson County|      TN| 31904|United States|        -5.00|             0.11|\n+-----------------+-----------------+-----------------+---------------+-----------------+---------------+--------+--------+------------+--------+--------+-----------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-------+-----------------+--------+------+-------------+-------------+-----------------+\n\n```\n\nLOL...\n\n### Dataset/DataFrame\n\n```scala\nscala\u003e spark.read.table(\"catalog_page\").limit(1).collect\n```\n```\n17/12/11 14:46:33 ERROR optimizer.Authorizer:\n+===============================+\n|Spark SQL Authorization Failure|\n|-------------------------------|\n|Permission denied: user [hzyaoqin] does not have [SELECT] privilege on [tpcds_10g_ext/catalog_page/cp_catalog_page_sk,cp_catalog_page_id,cp_promo_id,cp_start_date_sk,cp_end_date_sk,cp_department,cp_catalog_number,cp_catalog_page_number,cp_description,cp_type]\n|-------------------------------|\n|Spark SQL Authorization Failure|\n+===============================+\n```\nOops...\n\n```scala\nscala\u003e spark.read.table(\"call_center\").limit(1).collect\n```\n```\nres3: Array[org.apache.spark.sql.Row] = Array([1,AAAAAAAABAAAAAAA,1998-01-01,null,null,2450952,NY Metro,large,2,1138,8AM-4PM,Bob Belcher,6,More than other authori,Shared others could not count fully dollars. New members ca,Julius Tran,3,pri,6,cally,730,Ash Hill,Boulevard,Suite 0,Midway,Williamson County,TN,31904,United States,-5.00,0.11])\n```\nLOL...\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyaooqinn%2Fspark-authorizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyaooqinn%2Fspark-authorizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyaooqinn%2Fspark-authorizer/lists"}