{"id":22895142,"url":"https://github.com/ademakdogan/chatsql","last_synced_at":"2025-08-02T02:42:08.726Z","repository":{"id":161758321,"uuid":"634675704","full_name":"ademakdogan/ChatSQL","owner":"ademakdogan","description":"Convert the given plain text to MySQL query by ChatGPT","archived":false,"fork":false,"pushed_at":"2023-05-27T19:00:53.000Z","size":98,"stargazers_count":143,"open_issues_count":0,"forks_count":27,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-07T19:52:42.327Z","etag":null,"topics":["chatgpt","chatgpt-api","database","dataset","langchain","langchain-python","mysql","natural-language-processing","nlp","python","query-generator","sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ademakdogan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-04-30T22:06:46.000Z","updated_at":"2025-03-07T01:03:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"4bd05546-94d0-4495-adac-f5f3e3beea52","html_url":"https://github.com/ademakdogan/ChatSQL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ademakdogan/ChatSQL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ademakdogan%2FChatSQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ademakdogan%2FChatSQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ademakdogan%2FChatSQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ademakdogan%2FChatSQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ademakdogan","download_url":"https://codeload.github.com/ademakdogan/ChatSQL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ademakdogan%2FChatSQL/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268329177,"owners_count":24232998,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","chatgpt-api","database","dataset","langchain","langchain-python","mysql","natural-language-processing","nlp","python","query-generator","sql"],"created_at":"2024-12-13T23:28:23.001Z","updated_at":"2025-08-02T02:42:08.675Z","avatar_url":"https://github.com/ademakdogan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"180\" src=\"./images/logo.png\" alt=\"ChatSQL\"\u003e\n  \u003ch1 align=\"center\"\u003eChatSQL\u003c/h1\u003e\n\u003c/p\u003e\n\n\n\n\nThe plain text that it is given by the user is converted to mysql queries using ChatGPT in this project. \nWe need to specify some information about our database from the beginning in order for Chatgpt to understand our database. The [info.json](info.json) file can be used for this process. The database information should be added in this file in detail. As the complexity of your database increases, you should provide more detailed information. After a certain level of complexity, this data must be kept by vectorization and autonomously extracting the specific information structure for each incoming prompt. This method will be more effective and more economical. For this reason, this project is more suitable for mid-small databases. If I have enough time in the future, I will do new project about large database. \n\nOpenai api key and database informations should be added to the [conf.json](conf.json) file. You want to try the project, but you may not have a sample data set. You can use the [books.csv](data/books.csv) file for testing. \n\nAll packages are installed before starting.  The following command is used for this installation process (python 3.8 is used in this project):\n\n## Installation of Package\n```\n  make install\n```\nor\n\n```\npip3 install --default-timeout=900 -r requirements.txt\n```\n\n## Data Insertion into Databse\nRun [sample_data_creator.py](src/sample_data_creator.py) to insert the sample dataset into your own database. You can use following commands (default table name is \"bt\").\n\n```\npython3 sample_data_creator.py\n```\n## Usage\n\nNow that our data is ready, so we can start using it. \nThere are two different usage methods. The first of these is to run the [chatsql.py](src/chatsql.py) file. In this method, prompt is added as flag. In the second method, it is used via grpc server.\n\n### 1- ChatSql\n\nA sample of the database can be viewed below.\n\n ```\n+-----+--------------------------------------------------------+------------------------+-------------------+--------+------------------+\n| ID  | Title                                                  | Author                 | Genre             | Height | Publisher        |\n+-----+--------------------------------------------------------+------------------------+-------------------+--------+------------------+\n|   1 | Fundamentals of Wavelets                               | Goswami, Jaideva       | signal_processing |    228 | Wiley            |\n|   2 | Data Smart                                             | Foreman, John          | data_science      |    235 | Wiley            |\n|   3 | God Created the Integers                               | Hawking, Stephen       | mathematics       |    197 | Penguin          |\n|   4 | Superfreakonomics                                      | Dubner, Stephen        | economics         |    179 | HarperCollins    |\n|   5 | Orientalism                                            | Said, Edward           | history           |    197 | Penguin          |\n|   6 | Nature of Statistical Learning Theory, The             | Vapnik, Vladimir       | data_science      |    230 | Springer         |\n|   7 | Integration of the Indian States                       | Menon, V P             | history           |    217 | Orient Blackswan |\n|   8 | Drunkard's Walk, The                                   | Mlodinow, Leonard      | science           |    197 | Penguin          |\n|   9 | Image Processing \u0026 Mathematical Morphology             | Shih, Frank            | signal_processing |    241 | CRC              |\n|  10 | How to Think Like Sherlock Holmes                      | Konnikova, Maria       | psychology        |    240 | Penguin          |\n|  11 | Data Scientists at Work                                | Sebastian Gutierrez    | data_science      |    230 | Apress           |\n|  12 | Slaughterhouse Five                                    | Vonnegut, Kurt         | fiction           |    198 | Random House     |\n|  13 | Birth of a Theorem                                     | Villani, Cedric        | mathematics       |    234 | Bodley Head      |\n|  14 | Structure \u0026 Interpretation of Computer Programs        | Sussman, Gerald        | computer_science  |    240 | MIT Press        |\n|  15 | Age of Wrath, The                                      | Eraly, Abraham         | history           |    238 | Penguin          |\n|  16 | Trial, The                                             | Kafka, Frank           | fiction           |    198 | Random House     |\n|  17 | Statistical Decision Theory'                           | Pratt, John            | data_science      |    236 | MIT Press        |\n|  18 | Data Mining Handbook                                   | Nisbet, Robert         | data_science      |    242 | Apress           |\n|  19 | New Machiavelli, The                                   | Wells, H. G.           | fiction           |    180 | Penguin          |\n|  20 | Physics \u0026 Philosophy                                   | Heisenberg, Werner     | science           |    197 | Penguin          |\n|  21 | Making Software                                        | Oram, Andy             | computer_science  |    232 | O'Reilly         |\n|  .  | .......                                                | .......                | ....              |    ... | ....             |\n|  .  | .......                                                | .......                | ....              |    ... | ....             |\n```\n\nHere's our sample prompt : \"Show me the book type fiction which they height bigger than 175 and smaller than 178. The author shoudn't be 'Doyle, Arthur Conan'.\"  \nSo the usage is:  \n\n```\npython3 chatsql.py -p 'Show me the book type fiction which they height bigger than 175 and smaller than 178. The author shouldn't be 'Doyle, Arthur Conan'. '\n```\nResult:\n```\nCHATGPT QUERY------------------:\nSELECT * FROM bt WHERE Genre = 'Fiction' AND Height \u003e 175 AND Height \u003c 178 AND Author != 'Doyle, Arthur Conan'\nRAW RESULT------------------:\n[(32, 'Pillars of the Earth, The', 'Follett, Ken', 'fiction', 176, 'Random House'), (37, 'Veteran, The', 'Forsyth, Frederick', 'fiction', 177, 'Transworld'), (38, 'False Impressions', 'Archer, Jeffery', 'fiction', 177, 'Pan'), (72, 'Prisoner of Birth, A', 'Archer, Jeffery', 'fiction', 176, 'Pan'), (87, 'City of Joy, The', 'Lapierre, Dominique', 'fiction', 177, 'vikas'), (128, 'Rosy is My Relative', 'Durrell, Gerald', 'fiction', 176, 'nan')]\nPROCESSED RESULT------------------ :\nThe books 'Pillars of the Earth, The' by Ken Follett, 'Veteran, The' by Frederick Forsyth, 'False Impressions' by Jeffery Archer, 'Prisoner of Birth, A' by Jeffery Archer, 'City of Joy, The' by Dominique Lapierre, and 'Rosy is My Relative' by Gerald Durrell are all fiction books with 176 or 177 pages published by Random House, Transworld, Pan, Vikas, and Nan, respectively.\n```\n\nAs can be seen above, three different output results are obtained. The first result is the translation of the given prompt into a sql query. Raw result is the raw data returned from the database as a result of this query. Finally, processed data is the interpretation of the sql results as plain text by chatgpt.\n\n### 2- Using via gRPC\n\ngRPC server: \n```\npython3 main.py -p 9001\n```\nAfter running the gRPC server, you can connect to this server with your own client and send a prompt. If you want to see an example, you can look at the [client.py](src/client.py) file.\n\n```\npython3 client.py\n```\nResult:\n```\n{'query': \"SELECT * from bt WHERE Genre = 'Fiction' AND Height \u003e 175 AND Height \u003c 178 AND Author != 'Doyle, Arthur Conan'\", 'raw_result': \"[(32, 'Pillars of the Earth, The', 'Follett, Ken', 'fiction', 176, 'Random House'), (37, 'Veteran, The', 'Forsyth, Frederick', 'fiction', 177, 'Transworld'), (38, 'False Impressions', 'Archer, Jeffery', 'fiction', 177, 'Pan'), (72, 'Prisoner of Birth, A', 'Archer, Jeffery', 'fiction', 176, 'Pan'), (87, 'City of Joy, The', 'Lapierre, Dominique', 'fiction', 177, 'vikas'), (128, 'Rosy is My Relative', 'Durrell, Gerald', 'fiction', 176, 'nan')]\", 'processed_result': \"\\n1. Ken Follett's 'Pillars of the Earth, The' is a fiction novel with 176 pages that was published by Random House.\\n2. Frederick Forsyth's 'Veteran, The' is a fiction novel with 177 pages that was published by Transworld.\\n3. Jeffery Archer's 'False Impressions' is a fiction novel with 177 pages that was published by Pan.\\n4. Jeffery Archer's 'Prisoner of Birth, A' is a fiction novel with 176 pages that was published by Pan.\\n5. Dominique Lapierre's 'City of Joy, The' is a fiction novel with 177 pages that was published by Vikas.\\n6. Gerald Durrell's 'Rosy is My Relative' is a fiction novel with 176 pages that was published by Nan.\"}\nTime: 10.407907724380493\n```\n\n### 3- Using via Docker - gRPC\n\nIf you want to create gRPC server via docker (default image name --\u003e chatsql):\nInstall \n```\nmake docker\n``` \nUsage:\n``` \nmake docker_run p=9001\n```  \nAfter that use your client: \n```\npython3 client.py\n```\nand result: \n```\n'query': \"SELECT * FROM bt WHERE Genre = 'Fiction' AND Height \u003e 175 AND Height \u003c 178 AND Author != 'Doyle, Arthur Conan'\", 'raw_result': \"[(32, 'Pillars of the Earth, The', 'Follett, Ken', 'fiction', 176, 'Random House'), (37, 'Veteran, The', 'Forsyth, Frederick', 'fiction', 177, 'Transworld'), (38, 'False Impressions', 'Archer, Jeffery', 'fiction', 177, 'Pan'), (72, 'Prisoner of Birth, A', 'Archer, Jeffery', 'fiction', 176, 'Pan'), (87, 'City of Joy, The', 'Lapierre, Dominique', 'fiction', 177, 'vikas'), (128, 'Rosy is My Relative', 'Durrell, Gerald', 'fiction', 176, 'nan')]\", 'processed_result': '\\nThe books \"Pillars of the Earth, The\" by Ken Follet, \"Veteran, The\" by Frederick Forsyth, \"False Impressions\" by Jeffery Archer, \"Prisoner of Birth, A\" by Jeffery Archer, \"City of Joy, The\" by Dominique Lapierre and \"Rosy is My Relative\" by Gerald Durrell are all fiction books with page count 176 or 177 and published by Random House, Transworld, Pan, Vikas or Nan.'}\nTime: 7.1615989208221436\n```\nBe careful !! -\u003e If you want to use docker, you should configure network in docker. For example, if you are using a mac device and connecting to your mysql database via localhost, you should set **\"host.docker.internal\"** instead of **\"localhost\"** (in [conf.json](conf.json) file - **\"HOST\": \"host.docker.internal\"**) for docker.\n\n\n\n### Extra Info\n\nIn the examples so far, the column names in the database were always meaningful. ChatGPT can generate queries by understanding the column names. However, in some cases, column names are meaningless or chatgpt may not understand them. If we add enough detailed information about the database to the [info.json](info.json) file, we will continue to get the results we want. For example, let's change the column names to be aa, bb, cc, dd, ee. \n\n```\n+-----+--------------------------------------------------------+------------------------+-------------------+------+------------------+\n| ID  | aa                                                     | bb                     | cc                | dd   | ee               |\n+-----+--------------------------------------------------------+------------------------+-------------------+------+------------------+\n|   1 | Fundamentals of Wavelets                               | Goswami, Jaideva       | signal_processing |  228 | Wiley            |\n|   2 | Data Smart                                             | Foreman, John          | data_science      |  235 | Wiley            |\n|   3 | God Created the Integers                               | Hawking, Stephen       | mathematics       |  197 | Penguin          |\n|   4 | Superfreakonomics                                      | Dubner, Stephen        | economics         |  179 | HarperCollins    |\n|   5 | Orientalism                                            | Said, Edward           | history           |  197 | Penguin          |\n|  .  | .......                                                | .......                | ....              |    ... | ....           |\n|  .  | .......                                                | .......                | ....              |    ... | ....           |\n```\n\nIf we explain the column names in detail and run the client.py --\u003e\n```\n{'query': \"SELECT aa, bb, cc, dd FROM bt WHERE cc = 'fiction' AND dd \u003e 175 AND dd \u003c 178 AND bb != 'Doyle, Arthur Conan'\", 'raw_result': \"[('Pillars of the Earth, The', 'Follett, Ken', 'fiction', 176), ('Veteran, The', 'Forsyth, Frederick', 'fiction', 177), ('False Impressions', 'Archer, Jeffery', 'fiction', 177), ('Prisoner of Birth, A', 'Archer, Jeffery', 'fiction', 176), ('City of Joy, The', 'Lapierre, Dominique', 'fiction', 177), ('Rosy is My Relative', 'Durrell, Gerald', 'fiction', 176)]\", 'processed_result': '\\nThe books \"Pillars of the Earth, The\" by Ken Follett, \"Veteran, The\" by Frederick Forsyth, \"False Impressions\" by Jeffery Archer, \"Prisoner of Birth, A\" by Jeffery Archer, \"City of Joy, The\" by Dominique Lapierre and \"Rosy is My Relative\" by Gerald Durrell are all fiction and have page lengths of 176 or 177.'}\n```\n\nThe next project could be on generating queries (mongo, sql) from prompts with free models (Llama).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fademakdogan%2Fchatsql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fademakdogan%2Fchatsql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fademakdogan%2Fchatsql/lists"}