{"id":16917462,"url":"https://github.com/tagomoris/shib","last_synced_at":"2025-03-17T07:31:18.453Z","repository":{"id":44428078,"uuid":"1580190","full_name":"tagomoris/shib","owner":"tagomoris","description":"WebUI for query engines: Hive and Presto","archived":false,"fork":false,"pushed_at":"2016-12-28T08:57:10.000Z","size":1158,"stargazers_count":198,"open_issues_count":8,"forks_count":56,"subscribers_count":28,"default_branch":"master","last_synced_at":"2024-04-28T15:49:24.550Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tagomoris.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-04-07T01:45:44.000Z","updated_at":"2024-04-19T10:53:49.000Z","dependencies_parsed_at":"2022-09-21T09:53:28.429Z","dependency_job_id":null,"html_url":"https://github.com/tagomoris/shib","commit_stats":null,"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tagomoris%2Fshib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tagomoris%2Fshib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tagomoris%2Fshib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tagomoris%2Fshib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tagomoris","download_url":"https://codeload.github.com/tagomoris/shib/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243852438,"owners_count":20358267,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T19:36:16.515Z","updated_at":"2025-03-17T07:31:18.133Z","avatar_url":"https://github.com/tagomoris.png","language":"JavaScript","readme":"# shib\n\n* http://github.com/tagomoris/shib\n* Blog entry [tagomoris.tumblr](http://tagomoris.tumblr.com/post/77890621867/shib-web-client-for-hive-now-supports-presto)\n\n## DESCRIPTION\n\nShib is web client application for SQL-like query engines, written in Node.js, supporting\n * Hive (hiveserver, hiveserver2)\n * Facebook Presto\n * Google BigQuery\n\nOnce configured, we can switch query engines per executions.\n\nSome extension features are supported:\n\n* Setup queries: options to specify queries executed before main query, like 'create functions ...'\n* Default Database: option to specify default database for Hive 0.6 or later\n\n### Versions\n\nLatest version of 'shib' is v1.0.2.\n\n'shib' versions are:\n\n* v1.0 series\n  * metadata of v1.0 is **NOT compatible with v0.3**, so migration required\n* v0.3 series\n  * use latest node (`~\u003e v0.10.26`)\n  * multi engines/databases support\n  * presto support\n  * storages of v0.3.x are compatible with v0.2\n  * engine-wide database/table access controls\n  * tagging for executed queries\n  * authentication / logging for query execution\n  * access controls based on authentication\n* v0.2 series\n  * current status of master branch\n  * uses local filesystem instead of KT, depends on latest node (v0.8.x, v0.10.x)\n  * higher performance, more safe Web UI and updated features\n  * storages of v0.2 are **NOT complatible with v0.1**\n* v0.1 series\n  * uses KT, depends on node v0.6.x\n  * see `v0.1` tag\n\n## INSTALL\n\n### Hive/Presto\n\nFor Hive queries, shib requires HiveServer or HiveServer2. Setup and run these.\n\n* For HiveServer2\n  * Configure `hive.server2.authentication` as `NOSASL`\n    * Strongly recommended to configure `hive.support.concurrency` as `false`\n\nFor Presto, shib is tested with Presto version 0.57.\n\n### Node.js\n\nTo run shib, you must install node.js (v0.10.x recommended), and export PATH for installed node.\n\n### shib\n\nClone shib code.\n\n    $ git clone git://github.com/tagomoris/shib.git\n\nInstall libraries, configure addresses of HiveServer (and other specifications).\n\n    $ npm install\n    $ vi config.js\n\nAnd run.\n\n    $ npm start\n\nShib listens on port 3000. see http://localhost:3000/\n\nTo switch environments for each shib instance, use `NODE_ENV` environment variable. (ex: `production.js` will be used with `NODE_ENV=production`)\n\n    $ NODE_ENV=production NODE_PATH=lib node app.js\n\n## Migrate metadata database from v0 to v1\n\nMigration operation required to execute shib v1, with data in v0 era.\n\n1. Stop shib process\n2. Update shib code to v1\n3. Execute `npm run migrate` (for `var/database.sqlite3` file)\n  * This operation requires 5 minutes 30 seconds for 220MB database\n  * Backup v0 database file is `var/database.sqlite3.v0`\n4. Start shib\n\n## Configuration\n\nShib can have 2 or more query executor engines.\n\n### HiveServer2\n\nBasic configuration with HiveServer2 in config.js (or productions.js):\n\n```js\nvar servers = exports.servers = {\n  listen: 3000,\n  fetch_lines: 1000,   // lines per fetch in shib\n  query_timeout: null, // shib waits queries forever\n  setup_queries: [],\n  storage: {\n    datadir: './var'\n  },\n  engines: [\n    { label: 'mycluster1',\n      executer: {\n        name: 'hiveserver2',\n        host: 'hs2.mycluster1.local',\n        port: 10000,\n        username: 'hive',\n        support_database: true\n      },\n      monitor: null\n    },\n  ],\n};\n```\n\n`username` should be same as user name that hive job will be executed on. (`password` is not required for NOSASL transport.)\n\nFor UDFs, you can specify statements before query executions in `setup_queries`.\n\n```js\nvar servers = exports.servers = {\n  listen: 3000,\n  fetch_lines: 1000,\n  query_timeout: null,\n  setup_queries: [\n    \"add jar /path/to/jarfile/foo.jar\",\n    \"create temporary function foofunc as 'package.of.udf.FooFunc'\",\n    \"create temporary function barfunc as 'package.of.udf.BarFunc'\"\n  ],\n  storage: {\n    datadir: './var'\n  },\n  engines: [\n    { label: 'mycluster1',\n      executer: {\n        name: 'hiveserver2',\n        host: 'hs2.mycluster1.local',\n        port: 10000,\n        support_database: true\n      },\n      monitor: null\n    },\n  ],\n};\n```\n\n### HiveServer\n\nClassic HiveServer is available if you want database supports instead of HiveServer2.\n\n```js\nvar servers = exports.servers = {\n  listen: 3000,\n  fetch_lines: 1000,\n  query_timeout: null,\n  setup_queries: [],\n  storage: {\n    datadir: './var'\n  },\n  engines: [\n    { label: 'mycluster1',\n      executer: {\n        name: 'hiveserver',  // HiveServer(1)\n        host: 'hs1.mycluster1.local',\n        port: 10000,\n        support_database: true,\n        default_database: 'mylogs1'\n      },\n      monitor: null\n    },\n  ],\n};\n```\n\n### Presto\n\nFor Presto, use `presto` executer.\n\n```js\nvar servers = exports.servers = {\n  listen: 3000,\n  fetch_lines: 1000,\n  query_timeout: 30, // 30 seconds for Presto query timeouts (it will fail)\n  setup_queries: [],\n  storage: {\n    datadir: './var'\n  },\n  engines: [\n    { label: 'prestocluster1',\n      executer: {\n        name: 'presto',\n        host: 'coordinator.mycluster2.local',\n        port: 8080,\n        user: 'shib',\n        catalog: 'hive',  // required configuration argument\n        support_database: true,\n        default_database: 'mylogs1'\n      },\n      monitor: null\n    },\n  ],\n};\n```\n\n### BigQuery\n\nFor BigQuery, use `bigquery` executer.\n\n```js\nvar servers = exports.servers = {\n  listen: 3000,\n  fetch_lines: 1000,\n  query_timeout: 30, // 30 seconds for BigQuery query timeouts (it will fail)\n  storage: {\n    datadir: './var'\n  },\n  engines: [\n    { label: 'bigquery',\n      executer: {\n        name: 'bigquery',\n        default_database: 'mylogs1',\n        project_id: 'gcp-project-id',\n        key_filename: '/path/to/keyfile.json'\n      },\n      monitor: null\n    }\n  ]\n};\n```\n \nFor more detail about `project_id` and `key_filename` config, see https://github.com/GoogleCloudPlatform/gcloud-node#authorization .\n\n### Multi clusters and engines\n\nShib supports 2 or more engines for a cluster, and 2 or more clusters with same engines. These patterns are available.\n\n* HiveServer1, HiveServer2 and Presto for same data source\n* 2 or more catalogs for same Presto cluster\n* Many clusters which has one of HiveServer, HiveServer2 or Presto\n\nYou should write configurations in `engines` how you wants. `fetch_lines`, `query_timeout` and `setup_queries` in each engines overwrites global default of these configurations.\n\nFor example, This is configuration example.\n * ClusterA has HiveServer2\n   * listenes port 10000\n   * customized udfs in `foo.jar` are availabe\n * ClusterB has HiveServer\n   * listenes port 10001\n   * customized udfs in `foo.jar` are available\n * Presto cluster is configured with `hive` catalog and `hive2` catalog\n   * udfs are not available\n\n```js\nvar servers = exports.servers = {\n  listen: 3000,\n  fetch_lines: 1000,\n  query_timeout: null,\n  setup_queries: [\n    \"add jar /path/to/jarfile/foo.jar\",\n    \"create temporary function foofunc as 'package.of.udf.FooFunc'\",\n    \"create temporary function barfunc as 'package.of.udf.BarFunc'\"\n  ],\n  storage: {\n    datadir: './var'\n  },\n  engines: [\n    { label: 'myclusterA',\n      executer: {\n        name: 'hiveserver2',\n        host: 'master.a.cluster.local',\n        port: 10000,\n        support_database: true\n      },\n      monitor: null\n    },\n    { label: 'myclusterB',\n      executer: {\n        name: 'hiveserver',\n        host: 'master.b.cluster.local',\n        port: 10001,\n        support_database: true,\n        default_database: 'mylogs1'\n      },\n      monitor: null\n    },\n    { label: 'prestocluster1',\n      executer: {\n        name: 'presto',\n        host: 'coordinator.p.cluster.local',\n        port: 8080,\n        user: 'shib',\n        catalog: 'hive',\n        support_database: true,\n        default_database: 'mylogs1',\n        query_timeout: 30,  // overwrite global config\n        setup_queries: []   // overwrite global config\n      },\n      monitor: null\n    },\n    { label: 'prestocluster2',\n      executer: {\n        name: 'presto',\n        host: 'coordinator.p.cluster.local',\n        port: 8080,\n        user: 'shib',\n        catalog: 'hive2',  // one engine config per catalogs\n        support_database: true,\n        default_database: 'default',\n        query_timeout: 30,  // overwrite global config\n        setup_queries: []   // overwrite global config\n      },\n      monitor: null\n    }\n  ],\n};\n```\n\n### Access Control\n\nShib have access control list for Databases/Tables. Default is 'allow' for all databases/tables.\n\nShib's access control rules are:\n  * configure per executer\n  * database level default ('allow' or 'deny') without any optional rules makes that database unvisible\n  * database level default + allow/deny table list makes its tables visible/unvisible\n    * in this case, this 'database' is visible\n  * default 'allow' or 'deny' decides visibilities of databases without any optional rules\n\n'unvisible' databases and tables:\n * are not shown in tables/partitions list and schema list\n * cannot be queried by users (these queries always fails)\n\nAccess control options are written in 'executer' like this:\n\n```js\nexecuter: {\n  name: 'presto',\n  host: 'coordinator.p.cluster.local',\n  port: 8080,\n  catalog: 'hive',\n  support_database: true,\n  default_database: 'default',\n  query_timeout: 30,\n  setup_queries: [],\n  access_control: {\n    databases: {\n      secret: { default: \"deny\" },\n      member: { default: \"deny\", allow: [\"users\"] },\n      test:   { default: \"allow\", deny: [\"secretData\", \"userMaster\"] },\n    },\n    default: \"allow\"\n  }\n},\n```\n\nFor more details, see [wiki: Access Control](https://github.com/tagomoris/shib/wiki/Access-Control).\n\n## Monitors\n\n`monitor` configurations are used to get query status and to kill queries.\n\n### JobTracker (MRv1)\n\n`jobtracker` monitor is available in MRv1 environment (w/ both of hiveserver and hiveserver2).\n\n```js\nmonitor: {\n  name: 'jobtracker',\n  host: 'jobtracker.hostname.local',\n  port: 50030,\n  mapred: '/usr/bin/mapred' // 'mapred' in PATH by default\n}\n```\n\nFor this feature, shib should be executed by a user who can execute command `mapred job -kill JOB_ID`.\n\n### YARN (MRv2)\n\n`yarn` monitor is available in MRv2 environment (w/ both of hiveserver and hiveserver2).\n\n```js\nmonitor: {\n  name: 'yarn',\n  host: 'resourcemanager.hostname.local',\n  port: 8088\n}\n```\nIn this case, shib kills query with Resource Manager REST API.\n\nIf you specify yarn command description, shib kills query with `yarn application -kill APP_ID`.\n```js\nmonitor: {\n  name: 'yarn',\n  host: 'resourcemanager.hostname.local',\n  port: 8088,\n  yarn: '/usr/bin/yarn'\n}\n```\n\n### Huahin Manager (obsolete)\n\nFor monitors in CDH4 + MRv1 environment, Huahin manager is available.\n\nTo show map/reduce status, and/or to kill actual map/reduce jobs behind hive query, shib can use Huahin Manager. Current version supports only 'Huahin Manager CDH4 + MRv1' only.\n\nhttp://huahinframework.org/huahin-manager/\n\nConfigure `monitor` argument like below instead of `null`.\n\n```js\nmonitor: {\n  name : 'huahin_mrv1',\n  host: 'localhost',\n  port: 9010\n}\n```\n\n## Authentication\n\nShib have authentication to log who execute queries and to control accesses:\nsetup_queries_auth option means to specify queries executed before main query when authentication is required.\n\n```js\nvar servers = exports.servers = {\n  listen: 3000,\n  fetch_lines: 1000,   // lines per fetch in shib\n  query_timeout: null, // shib waits queries forever\n  setup_queries: [],\n  setup_queries_auth: [\"set hive.mapred.mode=strict\"],\n  storage: {\n    datadir: './var'\n  },\n  auth: {\n    type: 'http_basic_auth',\n    url: 'http://your.internal.protected.service.example.com/',\n    realm: '@your.service.example.com'\n  },\n  engines: [\n    { label: 'mycluster1',\n      executer: {\n        name: 'hiveserver2',\n        host: 'hs2.mycluster1.local',\n        port: 10000,\n        username: 'hive',\n        support_database: true\n      },\n      monitor: null\n    },\n  ],\n};\n```\n\nFor more details, see [wiki: Authentication](https://github.com/tagomoris/shib/wiki/Authentication).\n\n## Miscellaneous configurations\n\n### Disable \"history\" tab\n\nSpecify `disable_history: true` on `servers`.\n\n```js\nvar servers = exports.servers = {\n  listen: 3000,\n  fetch_lines: 1000,\n  query_timeout: null, // seconds. (null:shib will wait query response infinitely).\n  setup_queries: [],\n  disable_history: true,\n  storage: {\n    datadir: './var'\n  },\n```\n\n## As HTTP Proxy for query engines\n\nPOST query string into `/execute` with some parameters.\n\n```\ncurl -s -X POST -F 'querystring=SELECT COUNT(*) AS cnt FROM yourtable WHERE field=\"value\"' http://shib.server.local:3000/execute | jq .\n{\n  \"queryid\": \"69927e67c5b1d5f665697943cc4867ec\",\n  \"results\": [],\n  \"dbname\": \"default\",\n  \"engine\": \"hiveserver\",\n  \"querystring\": \"SELECT COUNT(*) AS cnt FROM yourtable WHERE field=\\\"value\\\"\"\n}\n```\n\nSpecify `engineLabel` and `dbname` for non-default query engines and databases:\n\n```\ncurl -s -X POST -F \"engineLabel=presto\" -F \"dbname=testing\" -F \"querystring=SELECT COUNT(*) AS cnt FROM yourtable WHERE field='value'\" http://shib.server.local:3000/execute\n```\n\nIf you want not to add your query into history tab, specify 'scheduled':\n\n```\ncurl -s -X POST -F \"scheduled=true\" -F \"querystring=SELECT COUNT(*) AS cnt FROM yourtable WHERE field='value'\" http://shib.server.local:3000/execute\n```\n\nThen, fetch query's status whenever you want.\n\n```\ncurl -s http://shib.server.local:3000/status/69927e67c5b1d5f665697943cc4867ec \nexecuted\n```\n\nOr get whole query object.\n\n```\ncurl -s http://shib.server.local:3000/query/69927e67c5b1d5f665697943cc4867ec | jq .\n{\n  \"queryid\": \"69927e67c5b1d5f665697943cc4867ec\",\n  \"results\": [\n    {\n      \"resultid\": \"969629614dff69411a2f4f1733c9616a\",\n      \"executed_at\": \"Wed Feb 26 2014 16:02:00 GMT+0900 (JST)\"\n    }\n  ],\n  \"dbname\": \"default\",\n  \"engine\": \"hiveserver\",\n  \"querystring\": \"SELECT COUNT(*) AS cnt FROM yourtable WHERE field=\\\"value\\\"\"\n}\n```\n\nIf this query object has `executed` status, or a member of `results`, you can fetch its result by `resultid`.\n\n```\n# if you want elasped time or bytes or lines or ....\ncurl -s http://shib.server.local:3000/result/969629614dff69411a2f4f1733c9616a | jq .\n{\n  \"schema\": [\n    {\n      \"type\": \"bigint\",\n      \"name\": \"cnt\"\n    }\n  ],\n  \"completed_msec\": 1393398893759,\n  \"completed_at\": \"Wed Feb 26 2014 16:14:53 GMT+0900 (JST)\",\n  \"completed_time\": null,\n  \"bytes\": 6,\n  \"queryid\": \"69927e67c5b1d5f665697943cc4867ec\",\n  \"executed_time\": null,\n  \"executed_at\": \"Wed Feb 26 2014 16:14:52 GMT+0900 (JST)\",\n  \"executed_msec\": 1393398892752,\n  \"resultid\": \"969629614dff69411a2f4f1733c9616a\",\n  \"state\": \"done\",\n  \"error\": \"\",\n  \"lines\": 2\n}\n# raw result data as TSV (fast)\ncurl -s http://shib.server.local:3000/download/tsv/969629614dff69411a2f4f1733c9616a | jq .\nCNT\n1234567\n# or CSV (slow)\ncurl -s http://shib.server.local:3000/download/csv/969629614dff69411a2f4f1733c9616a | jq .\n\"CNT\"\n\"1234567\"\n```\n\nThese HTTP requests/response are same with that javascript on browser does.\n\n* * * * *\n\n## TODO\n\n* Paches are welcome!\n\n## License\n\nCopyright 2011- TAGOMORI Satoshi (tagomoris)\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n   http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","funding_links":[],"categories":["Misc."],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftagomoris%2Fshib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftagomoris%2Fshib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftagomoris%2Fshib/lists"}