{"id":28455711,"url":"https://github.com/questdb/py-questdb-query","last_synced_at":"2025-09-14T06:37:10.045Z","repository":{"id":176852783,"uuid":"659357967","full_name":"questdb/py-questdb-query","owner":"questdb","description":"Fast query over HTTP(S)/CSV for QuestDB","archived":false,"fork":false,"pushed_at":"2024-11-11T10:54:55.000Z","size":433,"stargazers_count":11,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-27T02:38:45.067Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/questdb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-06-27T16:54:09.000Z","updated_at":"2025-06-07T16:45:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"afb48266-0aa9-4702-9f88-2c1e2107e7cc","html_url":"https://github.com/questdb/py-questdb-query","commit_stats":null,"previous_names":["questdb/py-questdb-query"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/questdb/py-questdb-query","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fpy-questdb-query","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fpy-questdb-query/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fpy-questdb-query/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fpy-questdb-query/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/questdb","download_url":"https://codeload.github.com/questdb/py-questdb-query/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fpy-questdb-query/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275071546,"owners_count":25400398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-14T02:00:10.474Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-06T22:10:26.703Z","updated_at":"2025-09-14T06:37:10.018Z","avatar_url":"https://github.com/questdb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# py-questdb-query\nThis library allows you to perform fast queries over HTTP(S)/CSV for QuestDB, a high-performance time-series database.\n\nQuery results are obtained as either Pandas dataframes or dicts of numpy arrays. \n\n## Installation\n\nThe library can be installed using the following command:\n\n```shell\npython3 -m pip install -U git+https://github.com/questdb/py-questdb-query.git#questdb_query\n```\n\nTo uninstall the library, you can use the command:\n\n```shell\npython3 -m pip uninstall questdb_query\n```\n\n## Basic Usage, querying into Pandas\n\nOnce installed, you can use the library to query a QuestDB database. Here's an example that demonstrates how to query\nCPU utilization data using the library against a database running on `localhost` on the default HTTP port (9000).\n\n```python\nfrom questdb_query import pandas_query\n\ndf = pandas_query('select * from cpu limit 1000')\n```\n\nThis allows you, for example, to pre-aggregate results:\n\n```python\n\u003e\u003e\u003e df = df[['region', 'usage_user', 'usage_nice']].groupby('region').mean()\n\u003e\u003e\u003e df\n                usage_user  usage_nice\nregion                                \nap-northeast-1    8.163766    6.492334\nap-southeast-1    6.511215    7.341863\nap-southeast-2    6.788770    6.257839\neu-central-1      7.392642    6.416479\neu-west-1         7.213417    7.185956\nsa-east-1         7.143568    5.925026\nus-east-1         7.620643    7.243553\nus-west-1         6.286770    6.531977\nus-west-2         6.228692    6.439672\n```\n\nYou can then switch over to numpy with a simple and fast conversion:\n\n```python\n\u003e\u003e\u003e from questdb_query import pandas_to_numpy\n\u003e\u003e\u003e np_arrs = pandas_to_numpy(df)\n\u003e\u003e\u003e np_arrs\n{'usage_user': array([8.16376556, 6.51121543, 6.78876964, 7.3926419 , 7.21341716,\n       7.14356839, 7.62064304, 6.28677006, 6.22869169]), 'usage_nice': array([6.49233392, 7.34186348, 6.25783903, 6.41647863, 7.18595643,\n       5.92502642, 7.24355328, 6.53197733, 6.43967247]), 'region': array(['ap-northeast-1', 'ap-southeast-1', 'ap-southeast-2',\n       'eu-central-1', 'eu-west-1', 'sa-east-1', 'us-east-1', 'us-west-1',\n       'us-west-2'], dtype=object)}\n```\n\n## Querying a remote database\n\nIf your database is running on a remote host, specify an endpoint:\n\n```python\nfrom questdb_query import pandas_query, Endpoint\n\nendpoint = Endpoint(host='your.hostname.com', port=22453, https=True, username='user', password='pass')\n\nnp_arrs = numpy_query('select * from cpu limit 10', endpoint)\n```\n\nNote how the example above enables HTTPS and specifies a username and password for authentication.\n\nThe port is optional and defaults to 9000 for HTTP and 443 for HTTPS.\n\nAlternatively, if the server is set up with token-based authentication you can use the `token` parameter:\n\n```python\nendpoint = Endpoint(host='your.hostname.com', https=True, token='your_token')\n```\n\n## Chunks: Query Parallelism\n\nYou can sometimes improve performance by splitting up a large query into smaller ones, running them in parallel,\nand joining the results together. This is especially useful if you have multiple CPUs available.\n\nThe `numpy_query` function can do this automatically for you, by specifying the `chunks` parameter.\n\nThe example below, splits up the query into 6 parallel chunks.\n\n```python\nfrom questdb_query import numpy_query\n\nnp_arrs = numpy_query('select * from cpu', chunks=6)\n```\n\nThe speed-up of splitting up a query into smaller ones is highly query-dependent and we recommend you experiment and\nbenchmark. Mostly due to Python library limitations, not all parts of the query can be parallelized, so whilst you may\nsee great benefits in going from 1 chunk (the default) to 8, the improvement going from 8 to 16 might be marginal. \n\n_Read on for more details on benchmarking: This is covered later in this README page._\n\n\u003e :warning: The `chunks \u003e 1` parameter parallelizes queries. If the table(s) queried contain fast-moving data the\n\u003e results may be inconsistent as each chunk's query would be started at slightly different times.\n\u003e\n\u003e To avoid consistency issues formulate the query so that it only queries data that is not changing.\n\u003e You can do this, for example, by specifying a `timestamp` range in the `WHERE` clause.\n\n## Querying into Numpy\n\nYou can also query directly into a dictionary of Numpy arrays.\n\nNotice that Numpy's datatypes are more limited than Panadas, specifically in the\nhandling of null values.\n\nThis is a simple shorthand for querying into Pandas and then converting to Numpy:\n\n```python\ndef numpy_query(query: str, endpoint: Endpoint = None,\n        chunks: int = 1, timeout: int = None) -\u003e dict[str, np.array]:\n    df = pandas_query(query, endpoint, chunks, timeout)\n    return pandas_to_numpy(df)\n```\n\nTo use it, pass the query string to the `numpy_query` function, along with the\nsame optional parameters as the `pandas_query` function.\n\n```python\nfrom questdb_query import numpy_query\n\nnp_arrs = numpy_query('''\n    select\n        timestamp, hostname, datacenter, usage_user, usage_nice\n    from\n        cpu\n    limit 10''')\n```\n\nThe `np_arrs` object is a python `dict` which holds a numpy array per column, keyed by column name:\n```python\n\u003e\u003e\u003e np_arrs\n{'timestamp': array(['2016-01-01T00:00:00.000000000', '2016-01-01T00:00:10.000000000',\n       '2016-01-01T00:00:20.000000000', '2016-01-01T00:00:30.000000000',\n       '2016-01-01T00:00:40.000000000', '2016-01-01T00:00:50.000000000',\n       '2016-01-01T00:01:00.000000000', '2016-01-01T00:01:10.000000000',\n       '2016-01-01T00:01:20.000000000', '2016-01-01T00:01:30.000000000'],\n      dtype='datetime64[ns]'), 'hostname': array(['host_0', 'host_1', 'host_2', 'host_3', 'host_4', 'host_5',\n       'host_6', 'host_7', 'host_8', 'host_9'], dtype=object), 'datacenter': array(['ap-southeast-2b', 'eu-west-1b', 'us-west-1b', 'us-west-2c',\n       'us-west-2b', 'eu-west-1b', 'eu-west-1b', 'us-west-1a',\n       'ap-southeast-2a', 'us-east-1a'], dtype=object), 'usage_user': array([1.39169048, 0.33846369, 0.        , 1.81511203, 0.84273104,\n       0.        , 0.        , 0.28085548, 0.        , 1.37192634]), 'usage_nice': array([0.30603088, 1.21496673, 0.        , 0.16688796, 0.        ,\n       2.77319521, 0.40332488, 1.81585253, 1.92844804, 2.12841919])}\n```\n\nIf we wanted to calculate a (rather non-sensical) weighted average of `usage_user` and `usage_nice` we can\ndo this by accessing the `numpy` columns:\n\n```python\n\u003e\u003e\u003e np_arrs['usage_user'].dot(np_arrs['usage_nice'].T)\n4.5700692045031985\n```\n\n## Benchmarking\n\n### From code\n\nEach query result also contains a `Stats` object with the performance summary which you can print.\n\n```python\n\u003e\u003e\u003e from questdb_query import pandas_query\n\u003e\u003e\u003e df = pandas_query('select * from cpu', chunks=8)\n\u003e\u003e\u003e print(df.query_stats)\nDuration: 2.631s\nMillions of lines: 5.000\nMillions of lines/s: 1.901\nMiB: 1332.144\nMiB/s: 506.381\n```\n\nYou can also extract individual fields:\n\n```python\n\u003e\u003e\u003e df.query_stats\nStats(duration_s=2.630711865, line_count=5000000, byte_count=1396853875, throughput_mbs=506.3814407360216, throughput_mlps=1.900626239810569)\n\u003e\u003e\u003e df.query_stats.throughput_mlps\n1.900626239810569\n```\n\n### From the command line\n\nTo get the best performance it may be useful to try queries with different hardware setups, chunk counts etc.\n\nYou can run the benchmarking tool from the command line:\n\n```bash\n$ python3 -m questdb_query.tool --chunks 8 \"select * from cpu\"\n```\n```\n         hostname          region       datacenter  rack              os arch team  service  service_version service_environment  usage_user  usage_system  usage_idle  usage_nice  usage_iowait  usage_irq  usage_softirq  usage_steal  usage_guest  usage_guest_nice           timestamp\n0          host_0  ap-southeast-2  ap-southeast-2b    96     Ubuntu16.10  x86  CHI       11                0                test    1.391690      0.000000    2.644812    0.306031      1.194629   0.000000       0.000000     0.726996     0.000000          0.000000 2016-01-01 00:00:00\n1          host_1       eu-west-1       eu-west-1b    52  Ubuntu16.04LTS  x64  NYC        7                0          production    0.338464      1.951409    2.455378    1.214967      2.037935   0.000000       1.136997     1.022753     1.711183          0.000000 2016-01-01 00:00:10\n2          host_2       us-west-1       us-west-1b    69  Ubuntu16.04LTS  x64  LON        8                1          production    0.000000      2.800873    2.296324    0.000000      1.754139   1.531160       0.662572     0.000000     0.472402          0.312164 2016-01-01 00:00:20\n3          host_3       us-west-2       us-west-2c     8  Ubuntu16.04LTS  x86  LON       11                0                test    1.815112      4.412385    2.056344    0.166888      3.507148   3.276577       0.000000     0.000000     0.000000          1.496152 2016-01-01 00:00:30\n4          host_4       us-west-2       us-west-2b    83  Ubuntu16.04LTS  x64  NYC        6                0                test    0.842731      3.141248    2.199520    0.000000      2.943054   5.032342       0.391105     1.375450     0.000000          1.236811 2016-01-01 00:00:40\n...           ...             ...              ...   ...             ...  ...  ...      ...              ...                 ...         ...           ...         ...         ...           ...        ...            ...          ...          ...               ...                 ...\n624995  host_3995  ap-southeast-2  ap-southeast-2a    30  Ubuntu16.04LTS  x86  CHI       19                1             staging   33.238309     82.647341   17.272531   52.707720     71.718564  45.605728     100.000000    22.907723    78.130846         15.652954 2017-08-01 16:52:30\n624996  host_3996       us-west-2       us-west-2a    67     Ubuntu15.10  x64  CHI        9                0          production   33.344070     81.922739   16.653731   52.107537     71.844945  45.880606      99.835977    23.045458    76.468930         17.091646 2017-08-01 16:52:40\n624997  host_3997       us-west-2       us-west-2b    63     Ubuntu15.10  x86   SF        8                0          production   32.932095     80.662915   14.708377   53.354277     72.265215  44.803275      99.013038    20.375169    78.043473         17.870002 2017-08-01 16:52:50\n624998  host_3998       eu-west-1       eu-west-1b    53  Ubuntu16.04LTS  x86  CHI       11                1             staging   31.199818     80.994859   15.051577   51.923123     74.169828  46.453950      99.107213    21.004499    78.341154         18.880808 2017-08-01 16:53:00\n624999  host_3999       us-east-1       us-east-1c    87     Ubuntu16.10  x64   SF        8                1          production   30.310735     81.727637   15.413537   51.417897     74.973555  44.882255      98.821672    19.055040    78.094993         19.263652 2017-08-01 16:53:10\n\n[5000000 rows x 21 columns]\n\nDuration: 2.547s\nMillions of lines: 5.000\nMillions of lines/s: 1.963\nMiB: 1332.144\nMiB/s: 522.962\n```\n\nThese are the complete command line arguments:\n\n```bash\n$ python3 -m questdb_query.tool --help\n```\n```\nusage: tool.py [-h] [--host HOST] [--port PORT] [--https] [--username USERNAME] [--password PASSWORD] [--chunks CHUNKS] query\n\npositional arguments:\n  query\n\noptional arguments:\n  -h, --help           show this help message and exit\n  --host HOST\n  --port PORT\n  --https\n  --username USERNAME\n  --password PASSWORD\n  --chunks CHUNKS\n```\n\n\n## Async operation\n\nThe `numpy_query` and `pandas_query` functions are actually wrappers around `async` variants.\n\nIf your application is already using `async`, then call those directly as it allows other parts of your application to\nperform work in parallel during the data download.\n\nThe functions take identical arguments as their synchronous counterparts.\n\n```python\nimport asyncio\nfrom questdb_query.asynchronous import numpy_query\n\n\ndef main():\n    endpoint = Endpoint(host='your.hostname.com', https=True, username='user', password='pass')\n    np_arrs = await numpy_query('select * from cpu limit 10', endpoint)\n    print(np_arrs)\n\n\nif __name__ == '__main__':\n    asyncio.run(main())\n\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquestdb%2Fpy-questdb-query","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquestdb%2Fpy-questdb-query","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquestdb%2Fpy-questdb-query/lists"}