{"id":15623172,"url":"https://github.com/macbre/sql-metadata","last_synced_at":"2025-05-13T20:17:13.585Z","repository":{"id":37405660,"uuid":"93537184","full_name":"macbre/sql-metadata","owner":"macbre","description":"Uses tokenized query returned by python-sqlparse and generates query metadata","archived":false,"fork":false,"pushed_at":"2025-05-12T02:26:14.000Z","size":901,"stargazers_count":857,"open_issues_count":85,"forks_count":129,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-05-12T03:31:57.157Z","etag":null,"topics":["database","hive","hiveql","metadata","mysql-query","parser","python-package","python3-library","sql","sql-parser","sqlparse"],"latest_commit_sha":null,"homepage":"https://pypi.python.org/pypi/sql-metadata","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/macbre.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-06-06T15:59:09.000Z","updated_at":"2025-05-12T02:26:17.000Z","dependencies_parsed_at":"2023-12-25T03:25:16.168Z","dependency_job_id":"c8dd119a-d260-4be3-96eb-dc9fe014f826","html_url":"https://github.com/macbre/sql-metadata","commit_stats":{"total_commits":565,"total_committers":21,"mean_commits":"26.904761904761905","dds":0.5911504424778762,"last_synced_commit":"18ec5ea21d8b98ebdcead8f417e6f41d02b34a10"},"previous_names":[],"tags_count":38,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macbre%2Fsql-metadata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macbre%2Fsql-metadata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macbre%2Fsql-metadata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macbre%2Fsql-metadata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/macbre","download_url":"https://codeload.github.com/macbre/sql-metadata/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253672535,"owners_count":21945475,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","hive","hiveql","metadata","mysql-query","parser","python-package","python3-library","sql","sql-parser","sqlparse"],"created_at":"2024-10-03T09:56:38.165Z","updated_at":"2025-05-13T20:17:13.540Z","avatar_url":"https://github.com/macbre.png","language":"Python","readme":"# sql-metadata\n\n[![PyPI](https://img.shields.io/pypi/v/sql_metadata.svg)](https://pypi.python.org/pypi/sql_metadata)\n[![Tests](https://github.com/macbre/sql-metadata/actions/workflows/python-ci.yml/badge.svg)](https://github.com/macbre/sql-metadata/actions/workflows/python-ci.yml)\n[![Coverage Status](https://coveralls.io/repos/github/macbre/sql-metadata/badge.svg?branch=master\u00261)](https://coveralls.io/github/macbre/sql-metadata?branch=master)\n\u003ca href=\"https://github.com/psf/black\"\u003e\u003cimg alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\u003e\u003c/a\u003e\n[![Maintenance](https://img.shields.io/badge/maintained%3F-yes-green.svg)](https://github.com/macbre/sql-metadata/graphs/commit-activity)\n[![Downloads](https://pepy.tech/badge/sql-metadata/month)](https://pepy.tech/project/sql-metadata)\n\nUses tokenized query returned by [`python-sqlparse`](https://github.com/andialbrecht/sqlparse) and generates query metadata.\n\n**Extracts column names and tables** used by the query. \nAutomatically conduct **column alias resolution**, **sub queries aliases resolution** as well as **tables aliases resolving**.\n\nProvides also a helper for **normalization of SQL queries**.\n\nSupported queries syntax:\n\n* MySQL\n* PostgreSQL\n* Sqlite\n* MSSQL\n* [Apache Hive](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML)\n\n(note that listed backends can differ quite substantially but should work in regard of query types supported by `sql-metadata`)\n\nYou can test the capabilities of `sql-metadata` with an interactive demo: [https://sql-app.infocruncher.com/](https://sql-app.infocruncher.com/)\n\n## Usage\n\n```\npip install sql-metadata\n```\n\n### Extracting raw sql-metadata tokens\n\n```python\nfrom sql_metadata import Parser\n\n# extract raw sql-metadata tokens\nParser(\"SELECT * FROM foo\").tokens\n# ['SELECT', '*', 'FROM', 'foo']\n```\n\n### Extracting columns from query\n\n```python\nfrom sql_metadata import Parser\n\n# get columns from query - for more examples see `tests/test_getting_columns.py`\nParser(\"SELECT test, id FROM foo, bar\").columns\n# ['test', 'id']\n\nParser(\"INSERT /* VoteHelper::addVote xxx */  INTO `page_vote` (article_id,user_id,`time`) VALUES ('442001','27574631','20180228130846')\").columns\n# ['article_id', 'user_id', 'time']\n\nparser = Parser(\"SELECT a.* FROM product_a.users AS a JOIN product_b.users AS b ON a.ip_address = b.ip_address\")\n\n# note that aliases are auto-resolved\nparser.columns\n# ['product_a.*', 'product_a.users.ip_address', 'product_b.users.ip_address']\n\n# note that you can also extract columns with their place in the query\n# which will return dict with lists divided into select, where, order_by, group_by, join, insert and update\nparser.columns_dict\n# {'select': ['product_a.users.*'], 'join': ['product_a.users.ip_address', 'product_b.users.ip_address']}\n```\n\n### Extracting columns aliases from query\n\n```python\nfrom sql_metadata import Parser\nparser = Parser(\"SELECT a, (b + c - u) as alias1, custome_func(d) alias2 from aa, bb order by alias1\")\n\n# note that columns list do not contain aliases of the columns\nparser.columns\n# [\"a\", \"b\", \"c\", \"u\", \"d\"]\n\n# but you can still extract aliases names\nparser.columns_aliases_names\n# [\"alias1\", \"alias2\"]\n\n# aliases are resolved to the columns which they refer to\nparser.columns_aliases\n# {\"alias1\": [\"b\", \"c\", \"u\"], \"alias2\": \"d\"}\n\n# you can also extract aliases used by section of the query in which they are used\nparser.columns_aliases_dict\n# {\"order_by\": [\"alias1\"], \"select\": [\"alias1\", \"alias2\"]}\n\n# the same applies to aliases used in queries section when you extract columns_dict\n# here only the alias is used in order by but it's resolved to actual columns\nassert parser.columns_dict == {'order_by': ['b', 'c', 'u'],\n                               'select': ['a', 'b', 'c', 'u', 'd']}\n```\n\n### Extracting tables from query\n\n```python\nfrom sql_metadata import Parser\n\n# get tables from query - for more examples see `tests/test_getting_tables.py`\nParser(\"SELECT a.* FROM product_a.users AS a JOIN product_b.users AS b ON a.ip_address = b.ip_address\").tables\n# ['product_a.users', 'product_b.users']\n\nParser(\"SELECT test, id FROM foo, bar\").tables\n# ['foo', 'bar']\n\n# you can also extract aliases of the tables as a dictionary\nparser = Parser(\"SELECT f.test FROM foo AS f\")\n\n# get table aliases\nparser.tables_aliases\n# {'f': 'foo'}\n\n# note that aliases are auto-resolved for columns\nparser.columns\n# [\"foo.test\"]\n```\n\n### Extracting values from insert query\n```python\nfrom sql_metadata import Parser\n\nparser = Parser(\n    \"INSERT /* VoteHelper::addVote xxx */  INTO `page_vote` (article_id,user_id,`time`) \" \n    \"VALUES ('442001','27574631','20180228130846')\"\n)\n# extract values from query\nparser.values\n# [\"442001\", \"27574631\", \"20180228130846\"]\n\n# extract a dictionary with column-value pairs\nparser.values_dict\n#{\"article_id\": \"442001\", \"user_id\": \"27574631\", \"time\": \"20180228130846\"}\n\n# if column names are not set auto-add placeholders\nparser = Parser(\n    \"INSERT IGNORE INTO `table` VALUES (9, 2.15, '123', '2017-01-01');\"\n)\nparser.values\n# [9, 2.15, \"123\", \"2017-01-01\"]\n\nparser.values_dict\n#{\"column_1\": 9, \"column_2\": 2.15, \"column_3\": \"123\", \"column_4\": \"2017-01-01\"}\n```\n\n\n### Extracting limit and offset\n```python\nfrom sql_metadata import Parser\n\nParser('SELECT foo_limit FROM bar_offset LIMIT 50 OFFSET 1000').limit_and_offset\n# (50, 1000)\n\nParser('SELECT foo_limit FROM bar_offset limit 2000,50').limit_and_offset\n# (50, 2000)\n```\n\n### Extracting with names\n\n```python\nfrom sql_metadata import Parser\n\nparser = Parser(\n    \"\"\"\nWITH\n    database1.tableFromWith AS (SELECT aa.* FROM table3 as aa \n                                left join table4 on aa.col1=table4.col2),\n    test as (SELECT * from table3)\nSELECT\n  \"xxxxx\"\nFROM\n  database1.tableFromWith alias\nLEFT JOIN database2.table2 ON (\"tt\".\"ttt\".\"fff\" = \"xx\".\"xxx\")\n\"\"\"\n)\n\n# get names/ aliases of with statements\nparser.with_names\n# [\"database1.tableFromWith\", \"test\"]\n\n# get definition of with queries\nparser.with_queries\n# {\"database1.tableFromWith\": \"SELECT aa.* FROM table3 as aa left join table4 on aa.col1=table4.col2\"\n#  \"test\": \"SELECT * from table3\"}\n\n# note that names of with statements do not appear in tables\nparser.tables\n# [\"table3\", \"table4\", \"database2.table2\"]\n```\n\n### Extracting sub-queries\n\n```python\nfrom sql_metadata import Parser\n\nparser = Parser(\n\"\"\"\nSELECT COUNT(1) FROM\n(SELECT std.task_id FROM some_task_detail std WHERE std.STATUS = 1) a\nJOIN (SELECT st.task_id FROM some_task st WHERE task_type_id = 80) b\nON a.task_id = b.task_id;\n\"\"\"\n)\n\n# get sub-queries dictionary\nparser.subqueries\n# {\"a\": \"SELECT std.task_id FROM some_task_detail std WHERE std.STATUS = 1\",\n#  \"b\": \"SELECT st.task_id FROM some_task st WHERE task_type_id = 80\"}\n\n\n# get names/ aliases of sub-queries / derived tables\nparser.subqueries_names\n# [\"a\", \"b\"]\n\n# note that columns coming from sub-queries are resolved to real columns\nparser.columns\n#[\"some_task_detail.task_id\", \"some_task_detail.STATUS\", \"some_task.task_id\", \n# \"task_type_id\"]\n\n# same applies for columns_dict, note the join columns are resolved\nparser.columns_dict\n#{'join': ['some_task_detail.task_id', 'some_task.task_id'],\n# 'select': ['some_task_detail.task_id', 'some_task.task_id'],\n# 'where': ['some_task_detail.STATUS', 'task_type_id']}\n\n```\n\nSee `tests` file for more examples of a bit more complex queries.\n\n### Queries normalization and comments extraction\n\n```python\nfrom sql_metadata import Parser\nparser = Parser('SELECT /* Test */ foo FROM bar WHERE id in (1, 2, 56)')\n\n# generalize query\nparser.generalize\n# 'SELECT foo FROM bar WHERE id in (XYZ)'\n\n# remove comments\nparser.without_comments\n# 'SELECT foo FROM bar WHERE id in (1, 2, 56)'\n\n# extract comments\nparser.comments\n# ['/* Test */']\n```\n\nSee `test/test_normalization.py` file for more examples of a bit more complex queries.\n\n## Migrating from `sql_metadata` 1.x\n\n`sql_metadata.compat` module has been implemented to make the introduction of sql-metadata v2.0 smoother.\n\nYou can use it by simply changing the imports in your code from:\n\n```python\nfrom sql_metadata import get_query_columns, get_query_tables\n```\n\ninto:\n\n```python\nfrom sql_metadata.compat import get_query_columns, get_query_tables\n```\n\nThe following functions from the old API are available in the `sql_metadata.compat` module:\n\n* `generalize_sql`\n* `get_query_columns` (since #131 columns aliases ARE NOT returned by this function)\n* `get_query_limit_and_offset`\n* `get_query_tables`\n* `get_query_tokens`\n* `preprocess_query`\n\n## Authors and contributors\n\nCreated and maintained by [@macbre](https://github.com/macbre) with a great contributions from [@collerek](https://github.com/collerek) and the others.\n\n* aborecki (https://github.com/aborecki)\n* collerek (https://github.com/collerek)\n* dylanhogg (https://github.com/dylanhogg)\n* macbre (https://github.com/macbre)\n\n## Stargazers over time\n\n[![Stargazers over time](https://starchart.cc/macbre/sql-metadata.svg)](https://starchart.cc/macbre/sql-metadata)\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacbre%2Fsql-metadata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmacbre%2Fsql-metadata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacbre%2Fsql-metadata/lists"}