{"id":13467794,"url":"https://github.com/dcmoura/spyql","last_synced_at":"2025-10-21T19:41:52.221Z","repository":{"id":42048108,"uuid":"297863782","full_name":"dcmoura/spyql","owner":"dcmoura","description":"Query data on the command line with SQL-like SELECTs powered by Python expressions","archived":false,"fork":false,"pushed_at":"2022-12-04T00:18:19.000Z","size":1611,"stargazers_count":917,"open_issues_count":29,"forks_count":25,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-10-29T21:59:03.483Z","etag":null,"topics":["command-line","csv","data","json","python","sql","text"],"latest_commit_sha":null,"homepage":"https://spyql.readthedocs.io","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dcmoura.png","metadata":{"files":{"readme":"README.rst","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-23T05:35:45.000Z","updated_at":"2024-10-22T22:15:24.000Z","dependencies_parsed_at":"2023-01-23T23:03:51.790Z","dependency_job_id":null,"html_url":"https://github.com/dcmoura/spyql","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcmoura%2Fspyql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcmoura%2Fspyql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcmoura%2Fspyql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcmoura%2Fspyql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dcmoura","download_url":"https://codeload.github.com/dcmoura/spyql/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245584679,"owners_count":20639604,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line","csv","data","json","python","sql","text"],"created_at":"2024-07-31T15:01:00.630Z","updated_at":"2025-10-21T19:41:47.168Z","avatar_url":"https://github.com/dcmoura.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"\nSPyQL\n=====\n\nSQL with Python in the middle\n\n\n.. image:: https://img.shields.io/pypi/v/spyql.svg\n   :target: https://pypi.org/project/spyql/\n   :alt: https://pypi.python.org/pypi/spyql\n\n\n.. image:: https://readthedocs.org/projects/spyql/badge/?version=latest\n   :target: https://spyql.readthedocs.io/en/latest/\n   :alt: https://spyql.readthedocs.io/en/latest/?version=latest\n\n\n.. image:: https://codecov.io/gh/dcmoura/spyql/branch/master/graph/badge.svg?token=5C7I7LG814\n   :target: https://codecov.io/gh/dcmoura/spyql\n   :alt: codecov\n\n\n.. image:: https://pepy.tech/badge/spyql\n   :target: https://pepy.tech/project/spyql\n   :alt: downloads\n\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n   :target: https://github.com/psf/black\n   :alt: code style: black\n\n\n.. image:: https://img.shields.io/badge/License-MIT-yellow.svg\n   :target: https://opensource.org/licenses/MIT\n   :alt: license: MIT\n\n\nAbout\n-----\n.. intro_start\n\nSPyQL is a query language that combines:\n\n\n* the simplicity and structure of SQL;\n* with the power and readability of Python.\n\n.. code-block:: sql\n\n   SELECT\n       date.fromtimestamp(.purchase_ts) AS purchase_date,\n       .price * .quantity AS total\n   FROM json\n   WHERE .department.upper() == 'IT'\n   ORDER BY 2 DESC\n   TO csv\n\nSQL provides the structure of the query, while Python is used to define expressions, bringing along a vast ecosystem of packages.\n\nSPyQL is fast and memory efficient. Take a look at the `benchmarks with GB-size JSON data \u003chttps://colab.research.google.com/github/dcmoura/spyql/blob/master/notebooks/json_benchmark.ipynb\u003e`_.\n\n\n\nSPyQL CLI\n^^^^^^^^^\n\nSPyQL offers a command-line interface that allows running SPyQL queries on top of text data (e.g. CSV, JSON). Data can come from files but also from data streams, such as as Kafka, or from databases such as PostgreSQL. Basically, data can come from any command that outputs text :-). More, data can be generated by a Python expression! And since  SPyQL also writes to different formats, it allows to easily convert between data formats.\n\nTake a look at the Command line examples to see how to query parquet, process API calls, transverse directories of zipped JSONs, convert CSV to JSON, and import JSON/CSV data into SQL databases, among many other things.\n\nSee also:\n\n* `Tutorial (v0.8) \u003chttps://danielcmoura.com/blog/2022/spyql-cell-towers/\u003e`_\n\n* `Demo video (v0.4) \u003chttps://vimeo.com/danielcmoura/spyqldemo\u003e`_\n\n\nSPyQL Module\n^^^^^^^^^^^^\n\nSPyQL is also available as a Python module. In addition to the CLI features, you can also:\n\n* query variables (e.g. lists of dicts);\n* get results into in-memory data structures.\n\n\nPrinciples\n^^^^^^^^^^\n\nWe aim for SPyQL to be:\n\n\n* **Simple**\\ : simple to use with a straightforward implementation;\n* **Familiar**\\ : you should feel at home if you are acquainted with SQL and Python;\n* **Light**\\ : small memory footprint that allows you to process large data that fit into your machine;\n* **Useful**\\ : it should make your life easier, filling a gap in the eco-system.\n\n.. intro_end\n\nDistinctive features of SPyQL\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Row order guarantee\n* Natural window for aggregations\n* No distinction between aggregate and window functions\n* IMPORT clause\n* Natural support for lists, sets, dictionaries, objects, etc\n* 1-liner by design\n* Multiple data formats supported\n\n\nTestimonials\n------------\n\n|\n\n   \"I'm very impressed - this is some very neat pragmatic software design.\"\n\nSimon Willison, Creator of Datasette, co-creator of Django\n\n|\n\n   \"I love this tool! I use it every day\"...\n\nAlin Panaitiu, Creator of Lunar\n\n|\n\n   \"Brilliant tool, thanks a lot for creating it and for the example here!\"\n\nGreg Sadetsky, Co-founder and CTO at Decibel Ads\n\n|\n\nDocumentation\n--------------\n\nThe official documentation of SPyQL can be found at: `\u003chttps://spyql.readthedocs.io/\u003e`_.\n\n\nInstallation\n------------\n\nThe easiest way to install SPyQL is from pip:\n\n.. code-block:: sh\n\n   pip install spyql\n\nHello world\n-----------\n\n.. hello_start\n\nTo test your installation run in the terminal:\n\n.. code-block:: sh\n\n   spyql \"SELECT 'Hello world' as Message TO pretty\"\n\nOutput:\n\n.. code-block::\n\n   Message\n   -----------\n   Hello world\n\nYou can try replacing the output format by JSON or CSV, and adding more columns. e.g. run in the terminal:\n\n.. code-block:: sh\n\n   spyql \"SELECT 'Hello world' as message, 1+2 as three TO json\"\n\nOutput:\n\n.. code-block:: json\n\n   {\"message\": \"Hello world\", \"three\": 3}\n\n\n.. hello_end\n\n.. recipes_start\n\nExample queries\n---------------\n\nYou can run the following example queries in the terminal:\n``spyql \"the_query\" \u003c a_data_file``\n\nExample data files are not provided on most cases.\n\nQuery a CSV (and print a pretty table)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sql\n\n   SELECT a_col_name, 'positive' if int(col2) \u003e= 0 else 'negative' AS sign\n   FROM csv\n   TO pretty\n\nConvert CSV to a flat JSON\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sql\n\n   SELECT * FROM csv TO json\n\nConvert from CSV to a hierarchical JSON\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sql\n\n   SELECT {'client': {'id': col1, 'name': col2}, 'price': 120.40} AS json\n   FROM csv TO json\n\nor\n\n.. code-block:: sql\n\n   SELECT {'id': col1, 'name': col2} AS client, 120.40 AS price\n   FROM csv TO json\n\nJSON to CSV, filtering out NULLs\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sql\n\n   SELECT .client.id AS id, .client.name AS name, .price\n   FROM json\n   WHERE .client.name is not NULL\n   TO csv\n\nExplode JSON to CSV\n^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sql\n\n   SELECT .invoice_num AS id, .items.name AS name, .items.price AS price\n   FROM json\n   EXPLODE .items\n   TO csv\n\nSample input:\n\n.. code-block:: json\n\n   {\"invoice_num\" : 1028, \"items\": [{\"name\": \"tomatoes\", \"price\": 1.5}, {\"name\": \"bananas\", \"price\": 2.0}]}\n   {\"invoice_num\" : 1029, \"items\": [{\"name\": \"peaches\", \"price\": 3.12}]}\n\nOutput:\n\n.. code-block::\n\n   id, name, price\n   1028, tomatoes, 1.5\n   1028, bananas, 2.0\n   1029, peaches, 3.12\n\nPython iterator/list/comprehension to JSON\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sql\n\n   SELECT 10 * cos(col1 * ((pi * 4) / 90))\n   FROM range(80)\n   TO json\n\nor\n\n.. code-block:: sql\n\n   SELECT col1\n   FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]\n   TO json\n\nImporting python modules\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nHere we import ``hashlib`` to calculate a md5 hash for each input line.\nBefore running this example you need to install the ``hashlib`` package (\\ ``pip install hashlib``\\ ).\n\n.. code-block:: sql\n\n   IMPORT hashlib as hl\n   SELECT hl.md5(col1.encode('utf-8')).hexdigest()\n   FROM text\n\nGetting the top 5 records\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sql\n\n   SELECT int(score) AS score, player_name\n   FROM csv\n   ORDER BY 1 DESC NULLS LAST, score_date\n   LIMIT 5\n\nAggregations\n^^^^^^^^^^^^\n\nTotals by player, alphabetically ordered.\n\n.. code-block:: sql\n\n   SELECT .player_name, sum_agg(.score) AS total_score\n   FROM json\n   GROUP BY 1\n   ORDER BY 1\n\nPartial aggregations\n^^^^^^^^^^^^^^^^^^^^\n\nCalculating the cumulative sum of a variable using the ``PARTIALS`` modifier. Also demoing the lag aggregator.\n\n.. code-block:: sql\n\n   SELECT PARTIALS\n       .new_entries,\n       sum_agg(.new_entries) AS cum_new_entries,\n       lag(.new_entries) AS prev_entries\n   FROM json\n   TO json\n\nSample input:\n\n.. code-block:: json\n\n   {\"new_entries\" : 10}\n   {\"new_entries\" : 5}\n   {\"new_entries\" : 25}\n   {\"new_entries\" : null}\n   {}\n   {\"new_entries\" : 100}\n\nOutput:\n\n.. code-block:: json\n\n   {\"new_entries\" : 10,   \"cum_new_entries\" : 10,  \"prev_entries\": null}\n   {\"new_entries\" : 5,    \"cum_new_entries\" : 15,  \"prev_entries\": 10}\n   {\"new_entries\" : 25,   \"cum_new_entries\" : 40,  \"prev_entries\": 5}\n   {\"new_entries\" : null, \"cum_new_entries\" : 40,  \"prev_entries\": 25}\n   {\"new_entries\" : null, \"cum_new_entries\" : 40,  \"prev_entries\": null}\n   {\"new_entries\" : 100,  \"cum_new_entries\" : 140, \"prev_entries\": null}\n\nIf ``PARTIALS``  was omitted the result would be equivalent to the last output row.\n\nDistinct rows\n^^^^^^^^^^^^^\n\n.. code-block:: sql\n\n   SELECT DISTINCT *\n   FROM csv\n\nCommand line examples\n---------------------\n\nTo run the following examples, type ``Ctrl-x Ctrl-e`` on you terminal. This will open your default editor (emacs/vim). Paste the code of one of the examples, save and exit.\n\nQueries on Parquet with directories\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nHere, ``find`` transverses a directory and executes ``parquet-tools`` for each parquet file, dumping each file to json format. ``jq -c`` makes sure that the output has 1 json per line before handing over to spyql. This is far from being an efficient way to query parquet files, but it might be a handy option if you need to do a quick inspection.\n\n.. code-block:: sh\n\n   find /the/directory -name \"*.parquet\" -exec parquet-tools cat --json {} \\; |\n   jq -c |\n   spyql \"\n       SELECT .a_field, .a_num_field * 2 + 1\n       FROM json\n   \"\n\nQuerying multiple json.gz files\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sh\n\n   gzcat *.json.gz |\n   jq -c |\n   spyql \"\n       SELECT .a_field, .a_num_field * 2 + 1\n       FROM json\n   \"\n\nQuerying YAML / XML / TOML files\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n`yq \u003chttps://kislyuk.github.io/yq/#\u003e`_ converts yaml, xml and toml files to json, allowing to easily query any of these with spyql.\n\n.. code-block:: sh\n\n   cat file.yaml | yq -c | spyql \"SELECT .a_field FROM json\"\n\n.. code-block:: sh\n\n   cat file.xml | xq -c | spyql \"SELECT .a_field FROM json\"\n\n.. code-block:: sh\n\n   cat file.toml | tomlq -c | spyql \"SELECT .a_field FROM json\"\n\nKafka to PostegreSQL pipeline\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nRead data from a kafka topic and write to postgres table name ``customer``.\n\n.. code-block:: sh\n\n   kafkacat -b the.broker.com -t the.topic |\n   spyql -Otable=customer -Ochunk_size=1 --unbuffered \"\n       SELECT\n           .customer.id AS id,\n           .customer.name AS name\n       FROM json\n       TO sql\n   \" |\n   psql -U an_user_name -h a.host.com a_database_name\n\nMonitoring statistics in Kafka\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nRead data from a kafka topic, continuously calculating statistics.\n\n.. code-block:: sh\n\n   kafkacat -b the.broker.com -t the.topic |\n   spyql --unbuffered \"\n       SELECT PARTIALS\n           count_agg(*) AS running_count,\n           sum_agg(value) AS running_sum,\n           min_agg(value) AS min_so_far,\n           value AS current_value\n       FROM json\n       TO csv\n   \"\n\nSub-queries (piping)\n^^^^^^^^^^^^^^^^^^^^\n\nA special file format (spy) is used to efficiently pipe data between queries.\n\n.. code-block:: sh\n\n   cat a_file.json |\n   spyql \"\n       SELECT ' '.join([.first_name, .middle_name, .last_name]) AS full_name\n       FROM json\n       TO spy\" |\n   spyql \"SELECT full_name, full_name.upper() FROM spy\"\n\n\n\n(Equi) Joins\n^^^^^^^^^^^^^\n\nIt is possible to make simple (LEFT) JOIN operations based on dictionary lookups.\n\nGiven `numbers.json`:\n\n.. code-block:: json\n\n   {\n      \"1\": \"One\",\n      \"2\": \"Two\",\n      \"3\": \"Three\"\n   }\n\n\nQuery:\n\n.. code-block:: sh\n\n   spyql -Jnums=numbers.json \"\n\t   SELECT nums[col1] as res\n\t   FROM [3,4,1,1]\n\t   TO json\"\n\n\nOutput:\n\n.. code-block:: json\n\n   {\"res\": \"Three\"}\n   {\"res\": null}\n   {\"res\": \"One\"}\n   {\"res\": \"One\"}\n\n\nIf you want a INNER JOIN instead of a LEFT JOIN, you can add a criteria to the where clause, e.g.:\n\n.. code-block:: sql\n\n   SELECT nums[col1] as res\n   FROM [3,4,1,1]\n   WHERE col1 in nums\n   TO json\n\n\nOutput:\n\n.. code-block:: json\n\n   {\"res\": \"Three\"}\n   {\"res\": \"One\"}\n   {\"res\": \"One\"}\n\n\nQueries over APIs\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: sh\n\n   curl https://reqres.in/api/users?page=2 |\n   spyql \"\n       SELECT\n           .data.email AS email,\n           'Dear {}, thank you for being a great customer!'.format(.data.first_name) AS msg\n       FROM json\n       EXPLODE .data\n       TO json\n   \"\n\nPlotting to the terminal\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sh\n\n   spyql \"\n       SELECT col1\n       FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]\n       TO plot\n   \"\n\nPlotting with `matplotcli \u003chttps://github.com/dcmoura/matplotcli\u003e`_\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: sh\n\n   spyql \"\n      SELECT col1 AS y\n      FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]\n      TO json\n   \" | plt \"plot(y)\"\n\n\n.. image:: imgs/matplotcli_demo1.png\n  :width: 600\n  :alt: matplotcli demo\n\n\n.. recipes_end\n\n----\n\nThis package was created with `Cookiecutter \u003chttps://github.com/audreyr/cookiecutter\u003e`_ and the ``audreyr/cookiecutter-pypackage`` `project template \u003chttps://github.com/audreyr/cookiecutter-pypackage\u003e`_.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcmoura%2Fspyql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdcmoura%2Fspyql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcmoura%2Fspyql/lists"}