{"id":13849472,"url":"https://github.com/cloudera/impyla","last_synced_at":"2025-05-13T16:06:27.832Z","repository":{"id":16037142,"uuid":"18780981","full_name":"cloudera/impyla","owner":"cloudera","description":"Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)","archived":false,"fork":false,"pushed_at":"2025-03-13T17:09:54.000Z","size":3338,"stargazers_count":738,"open_issues_count":172,"forks_count":251,"subscribers_count":50,"default_branch":"master","last_synced_at":"2025-05-04T08:37:16.365Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cloudera.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2014-04-14T23:52:07.000Z","updated_at":"2025-04-18T04:01:36.000Z","dependencies_parsed_at":"2024-01-18T09:04:51.186Z","dependency_job_id":"4387d1e9-d9ba-46d0-a2cf-7618d8c72062","html_url":"https://github.com/cloudera/impyla","commit_stats":{"total_commits":420,"total_committers":81,"mean_commits":5.185185185185185,"dds":0.4809523809523809,"last_synced_commit":"e4c76169f7e5765c09b11c92fceb862dbb9b72be"},"previous_names":[],"tags_count":55,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudera%2Fimpyla","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudera%2Fimpyla/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudera%2Fimpyla/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudera%2Fimpyla/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cloudera","download_url":"https://codeload.github.com/cloudera/impyla/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253547215,"owners_count":21925545,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T19:01:19.178Z","updated_at":"2025-05-13T16:06:27.809Z","avatar_url":"https://github.com/cloudera.png","language":"Python","readme":"# impyla\n\nPython client for HiveServer2 implementations (e.g., Impala, Hive) for\ndistributed query engines.\n\nFor higher-level Impala functionality, including a Pandas-like interface over\ndistributed data sets, see the [Ibis project][ibis].\n\n### Features\n\n* HiveServer2 compliant; works with Impala and Hive, including nested data\n\n* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to\nsqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.\n\n* Works with Kerberos, LDAP, SSL\n\n* [SQLAlchemy][sqlalchemy] connector\n\n* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the\nPython data stack (including [scikit-learn][sklearn] and\n[matplotlib][matplotlib]); but see the [Ibis project][ibis] for a richer\nexperience\n\n### Dependencies\n\nRequired:\n\n* Python 2.7+ or 3.5+\n\n* `six`, `bitarray`\n\n* `thrift==0.16.0`\n\n* `thrift_sasl==0.4.3`\n\nOptional:\n\n* `kerberos\u003e=1.3.0` for Kerberos over HTTP support. This also requires Kerberos libraries\n   to be installed on your system - see [System Kerberos](#system-kerberos)\n\n* `pandas` for conversion to `DataFrame` objects; but see the [Ibis project][ibis] instead\n\n* `sqlalchemy` for the SQLAlchemy engine\n\n* `pytest` and `requests` for running tests; `unittest2` for testing on Python 2.6\n\n\n#### System Kerberos\n\nDifferent systems require different packages to be installed to enable Kerberos support in\nImpyla. Some examples of how to install the packages on different distributions follow.\n\nUbuntu:\n```bash\napt-get install libkrb5-dev krb5-user\n```\n\nRHEL/CentOS:\n```bash\nyum install krb5-libs krb5-devel krb5-server krb5-workstation\n```\n\n### Installation\n\nInstall the latest release with `pip`:\n\n```bash\npip install impyla\n```\n\nFor the latest (dev) version, install directly from the repo:\n\n```bash\npip install git+https://github.com/cloudera/impyla.git\n```\n\nor clone the repo:\n\n```bash\ngit clone https://github.com/cloudera/impyla.git\ncd impyla\npython setup.py install\n```\n\n#### Running the tests\n\nimpyla uses the [pytest][pytest] toolchain, and depends on the following\nenvironment variables:\n\n```bash\nexport IMPYLA_TEST_HOST=your.impalad.com\nexport IMPYLA_TEST_PORT=21050\nexport IMPYLA_TEST_AUTH_MECH=NOSASL\n```\n\nTo run the maximal set of tests, run\n\n```bash\ncd path/to/impyla\npy.test --connect impala\n```\n\nLeave out the `--connect` option to skip tests for DB API compliance.\n\nTo test impyla with different Python versions [tox] can be used.\nThe commands below will run all impyla tests with all supported and\ninstalled Python versions:\n```bash\ncd path/to/impyla\ntox\n```\nTo filter environments / tests use `-e` and [pytest] arguments after `--`:\n```bash\ntox -e py310 -- -ktest_utf8_strings\n```\n\n### Usage\n\nImpyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface\n(refer to it for API details):\n\n```python\nfrom impala.dbapi import connect\nconn = connect(host='my.host.com', port=21050) # auth_mechanism='PLAIN' for unsecured Hive connection, see function doc\ncursor = conn.cursor()\ncursor.execute('SELECT * FROM mytable LIMIT 100')\nprint cursor.description  # prints the result set's schema\nresults = cursor.fetchall()\n```\n\nThe `Cursor` object also exposes the iterator interface, which is buffered\n(controlled by `cursor.arraysize`):\n\n```python\ncursor.execute('SELECT * FROM mytable LIMIT 100')\nfor row in cursor:\n    print(row)\n```\n\nFurthermore the `Cursor` object returns you information about the columns\nreturned in the query. This is useful to export your data as a csv file.\n\n```python\nimport csv\n\ncursor.execute('SELECT * FROM mytable LIMIT 100')\ncolumns = [datum[0] for datum in cursor.description]\ntargetfile = '/tmp/foo.csv'\n\nwith open(targetfile, 'w', newline='') as outcsv:\n    writer = csv.writer(outcsv, delimiter=',', quotechar='\"', quoting=csv.QUOTE_ALL, lineterminator='\\n')\n    writer.writerow(columns)\n    for row in cursor:\n        writer.writerow(row)\n```\n\nYou can also get back a pandas DataFrame object\n\n```python\nfrom impala.util import as_pandas\ndf = as_pandas(cur)\n# carry df through scikit-learn, for example\n```\n\n\n[pep249]: http://legacy.python.org/dev/peps/pep-0249/\n[pandas]: http://pandas.pydata.org/\n[sklearn]: http://scikit-learn.org/\n[matplotlib]: http://matplotlib.org/\n[pytest]: http://pytest.org/latest/\n[sqlalchemy]: http://www.sqlalchemy.org/\n[ibis]: http://www.ibis-project.org/\n[tox]: http://tox.wiki/\n\n# How do I contribute code?\nYou need to first sign and return an\n[ICLA](https://github.com/cloudera/native-toolchain/blob/icla/Cloudera%20ICLA_25APR2018.pdf)\nand\n[CCLA](https://github.com/cloudera/native-toolchain/blob/icla/Cloudera%20CCLA_25APR2018.pdf)\nbefore we can accept and redistribute your contribution. Once these are submitted you are\nfree to start contributing to impyla. Submit these to CLA@cloudera.com.\n\n## Find\nWe use Github issues to track bugs for this project. Find an issue that you would like to\nwork on (or file one if you have discovered a new issue!). If no-one is working on it,\nassign it to yourself only if you intend to work on it shortly.\n\nIt's a good idea to discuss your intended approach on the issue. You are much more\nlikely to have your patch reviewed and committed if you've already got buy-in from the\nimpyla community before you start.\n\n## Fix\nNow start coding! As you are writing your patch, please keep the following things in mind:\n\nFirst, please include tests with your patch. If your patch adds a feature or fixes a bug\nand does not include tests, it will generally not be accepted. If you are unsure how to\nwrite tests for a particular component, please ask on the issue for guidance.\n\nSecond, please keep your patch narrowly targeted to the problem described by the issue.\nIt's better for everyone if we maintain discipline about the scope of each patch. In\ngeneral, if you find a bug while working on a specific feature, file a issue for the bug,\ncheck if you can assign it to yourself and fix it independently of the feature. This helps\nus to differentiate between bug fixes and features and allows us to build stable\nmaintenance releases.\n\nFinally, please write a good, clear commit message, with a short, descriptive title and\na message that is exactly long enough to explain what the problem was, and how it was\nfixed.\n\nPlease create a pull request on github with your patch.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudera%2Fimpyla","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloudera%2Fimpyla","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudera%2Fimpyla/lists"}