{"id":26261099,"url":"https://github.com/XGenerationLab/XiYan-DBDescGen","last_synced_at":"2025-03-14T00:01:35.709Z","repository":{"id":266794907,"uuid":"899377322","full_name":"XGenerationLab/XiYan-DBDescGen","owner":"XGenerationLab","description":"A method and corresponding code for automatic description generation for Text-to-SQL","archived":false,"fork":false,"pushed_at":"2025-03-04T12:09:41.000Z","size":817,"stargazers_count":12,"open_issues_count":2,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-04T12:23:15.008Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/XGenerationLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-06T06:24:08.000Z","updated_at":"2025-03-04T12:21:57.000Z","dependencies_parsed_at":"2024-12-06T10:53:58.972Z","dependency_job_id":null,"html_url":"https://github.com/XGenerationLab/XiYan-DBDescGen","commit_stats":null,"previous_names":["xgenerationlab/dbdescgen"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XGenerationLab%2FXiYan-DBDescGen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XGenerationLab%2FXiYan-DBDescGen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XGenerationLab%2FXiYan-DBDescGen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XGenerationLab%2FXiYan-DBDescGen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/XGenerationLab","download_url":"https://codeload.github.com/XGenerationLab/XiYan-DBDescGen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243500790,"owners_count":20300773,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-14T00:00:54.004Z","updated_at":"2025-03-14T00:01:35.696Z","avatar_url":"https://github.com/XGenerationLab.png","language":"Python","funding_links":[],"categories":["📰 Text-to-SQL Paper List"],"sub_categories":[],"readme":"# Automatic database description generation for Text-to-SQL\r\n\r\n## Important Links\r\n\r\n🤖[Arxiv](https://arxiv.org/abs/2502.20657) |\r\n📖[XiYan-SQL](https://github.com/XGenerationLab/XiYan-SQL) |\r\n\r\n\r\n## Introduction\r\nThis repository provides a method for automatically generating effective database descriptions when explicit descriptions are unavailable. The proposed method employs a dual-process approach: a coarse-to-fine process, followed by a fine-to-coarse process. Experimental results on the Bird benchmark indicate that using descriptions generated by the proposed improves SQL generation accuracy by 0.93% compared to not using descriptions, and achieves 37% of human-level performance. \r\nWe support three common database dialects: SQLite, MySQL and PostgreSQL.\r\n\r\nRead more: [Arxiv](https://arxiv.org/abs/2502.20657)\r\n\u003cp align=\"center\"\u003e\r\n  \u003cimg src=\"https://github.com/XGenerationLab/XiYan-DBDescGen/blob/main/description_generation.png\" alt=\"image\" width=\"1000\"/\u003e\r\n\u003c/p\u003e\r\n\r\n## Requirements\r\n+ python \u003e= 3.9\r\n\r\nYou can install the required packages with the following command:\r\n```shell\r\npip install -r requirements.txt\r\n```\r\n\r\n## Quick Start\r\n\r\n1. Create a database connection.\r\n\r\nConnect to SQLite:\r\n```python\r\nimport os\r\nfrom sqlalchemy import create_engine\r\n\r\ndb_path = \"path_to_sqlite\"\r\nabs_path = os.path.abspath(db_path)\r\ndb_engine = create_engine(f'sqlite:///{abs_path}')\r\n```\r\n\r\n2. Set llama-index LLM.\r\n\r\nTake dashscope as an example:\r\n```python\r\nfrom llama_index.llms.dashscope import DashScope, DashScopeGenerationModels\r\ndashscope_llm = DashScope(model_name=DashScopeGenerationModels.QWEN_PLUS, api_key='YOUR API KEY HERE.')\r\n```\r\n\r\n3. Generate the database description and build M-Schema.\r\n```python\r\nfrom schema_engine import SchemaEngine\r\n\r\ndb_name = 'your_db_name'\r\ncomment_mode = 'generation'\r\nschema_engine_instance = SchemaEngine(db_engine, llm=dashscope_llm, db_name=db_name,\r\n                                      comment_mode=comment_mode)\r\nschema_engine_instance.fields_category()\r\nschema_engine_instance.table_and_column_desc_generation()\r\nmschema = schema_engine_instance.mschema\r\nmschema.save(f'./{db_name}.json')\r\nmschema_str = mschema.to_mschema()\r\nprint(mschema_str)\r\n```\r\n\r\n## Citation\r\nIf you find our work helpful, feel free to give us a cite.\r\n\r\n```bibtex\r\n@article{description_generation,\r\n      title={Automatic database description generation for Text-to-SQL}, \r\n      author={Yingqi Gao and Zhiling Luo},\r\n      year={2025},\r\n      eprint={2502.20657},\r\n      archivePrefix={arXiv},\r\n      primaryClass={cs.AI},\r\n      url={https://arxiv.org/abs/2502.20657}, \r\n}\r\n\r\n@article{xiyansql,\r\n      title={A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL}, \r\n      author={Yingqi Gao and Yifu Liu and Xiaoxia Li and Xiaorong Shi and Yin Zhu and Yiming Wang and Shiqi Li and Wei Li and Yuntao Hong and Zhiling Luo and Jinyang Gao and Liyu Mou and Yu Li},\r\n      year={2024},\r\n      journal={arXiv preprint arXiv:2411.08599},\r\n      url={https://arxiv.org/abs/2411.08599},\r\n      primaryClass={cs.AI}\r\n}\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FXGenerationLab%2FXiYan-DBDescGen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FXGenerationLab%2FXiYan-DBDescGen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FXGenerationLab%2FXiYan-DBDescGen/lists"}