{"id":24732574,"url":"https://github.com/gonzalo123/sql_llm","last_synced_at":"2026-04-17T17:32:20.857Z","repository":{"id":248243766,"uuid":"828167845","full_name":"gonzalo123/sql_llm","owner":"gonzalo123","description":"Transforming Natural Language to SQL Queries with Python and LangChain","archived":false,"fork":false,"pushed_at":"2024-07-14T09:25:10.000Z","size":17699,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-22T16:16:10.333Z","etag":null,"topics":["agents","ia","llm","python","sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gonzalo123.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-13T10:16:37.000Z","updated_at":"2025-03-01T14:03:56.000Z","dependencies_parsed_at":"2024-07-13T11:45:27.101Z","dependency_job_id":"10776549-98e9-47e4-86f7-2351d2fbb691","html_url":"https://github.com/gonzalo123/sql_llm","commit_stats":null,"previous_names":["gonzalo123/sql_llm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gonzalo123/sql_llm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gonzalo123%2Fsql_llm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gonzalo123%2Fsql_llm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gonzalo123%2Fsql_llm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gonzalo123%2Fsql_llm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gonzalo123","download_url":"https://codeload.github.com/gonzalo123/sql_llm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gonzalo123%2Fsql_llm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31938702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-17T17:29:20.459Z","status":"ssl_error","status_checked_at":"2026-04-17T17:28:47.801Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ia","llm","python","sql"],"created_at":"2025-01-27T17:52:49.478Z","updated_at":"2026-04-17T17:32:20.841Z","avatar_url":"https://github.com/gonzalo123.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Transforming Natural Language to SQL Queries with Python and LangChain\n\nLLMs are highly proficient at generating code, including SQL queries from natural language text. Today, we're going to experiment with this capability to see how effectively we can transform natural language instructions into SQL queries. The idea is to leverage the power of natural language processing to simplify the process of writing complex SQL statements. For this experiment, I've downloaded a CSV file containing data from IMDB, which includes various attributes related to movies, such as titles, release years, genres, and ratings. By using this dataset, we can test the LLM's ability to generate accurate and efficient SQL queries based on different natural language prompts. Here's an example of what the data looks like:\n\n```csv\nnconst,primaryname,birthyear,deathyear,primaryprofession,knownfortitles\nnm0325022,Käthe Gold,1907,1997,\"actress,archive_footage\",\"tt0026069,tt0032498,tt0436641,tt0026066\"\nnm0325025,Lee Gold,1919,1985,writer,\"tt0034433,tt0040392,tt0048226,tt0099219\"\nnm0325028,Louise Gold,1956,,\"actress,miscellaneous,soundtrack\",\"tt0074028,tt0104940,tt0083791,tt2281587\"\n...\n```\n\nNow, we will create a PostgreSQL database using Docker. Docker allows us to quickly set up and manage containerized applications, making it an ideal tool for this purpose. Below is the Dockerfile we will use to set up our PostgreSQL database:\n\n\n```dockerfile\nFROM postgres:16.3-alpine\nCOPY actors.csv /docker-entrypoint-initdb.d/actors.csv\nCOPY init.sql /docker-entrypoint-initdb.d/\n```\n\nNext, we will set up the database and import the CSV data into an 'actors' table using the Docker entrypoint. Below is how we configure the Docker entrypoint script to initialize the PostgreSQL database and import the CSV data:\n\n\n```sql\nCREATE TABLE actors (\n    nconst TEXT PRIMARY KEY,\n    primaryname TEXT,\n    birthyear INTEGER,\n    deathyear INTEGER,\n    primaryprofession TEXT,\n    knownfortitles TEXT\n);\n\nCOPY actors FROM '/docker-entrypoint-initdb.d/actors.csv' CSV HEADER;\n```\n\nThat's the docker-compose file to set up the PostgreSQL database\n\n```dockerfile\nversion: '3.6'\n\nservices:\n  pg:\n    build:\n      context: .docker/pg\n      dockerfile: Dockerfile\n    ports:\n      - 5432:5432\n    environment:\n      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}\n      POSTGRES_USER: ${POSTGRES_USER}\n      POSTGRES_DB: ${POSTGRES_DB}\n      PGDATA: /var/lib/postgresql/data/pgdata\n```\n\nNow we can start with the python script. We're going to use cick library to build cli scrpt.\nThe python application interacts with a database to execute SQL queries generated from user input. The process begins with obtaining a MovieChain object through the get_chain function, which takes an argument llm. This MovieChain object is then used to generate an SQL query based on the user's input q through its get_sql method. After that we just execute the SQL query into the PostgreSQL and print the results.\n\n```python\nimport click\nfrom dbutils import get_conn, Db, get_cursor\nfrom lib.chains.movie import get_chain\nfrom lib.llm.groq import llm\nfrom settings import DSN\n\n\n@click.command()\n@click.option('--q', required=True, help='question to ask')\ndef run(q):\n    chain = get_chain(llm)\n    sql = chain.get_sql(q)\n    click.echo(f\"q: {q}\")\n    click.echo(sql)\n    click.echo('')\n    if sql:\n        conn = get_conn(DSN, named=True, autocommit=True)\n        db = Db(get_cursor(conn=conn))\n        data = db.fetch_all(sql)\n        for row in data:\n            print(row)\n```\n\nThe MovieChain class interacts with an LLM (in this example, we're using Groq).\n\n```python\nimport logging\nfrom langchain_core.messages import SystemMessage, HumanMessage\n\nfrom .prompts import PROMPT\n\nlogger = logging.getLogger(__name__)\n\n\nclass MovieChain:\n\n    def __init__(self, llm):\n        self.llm = llm\n\n        self.prompt = SystemMessage(content=PROMPT)\n\n    def get_sql(self, q: str):\n        user_message = HumanMessage(content=q)\n        try:\n            ai_msg = self.llm.invoke([self.prompt, user_message])\n            output_message = ai_msg.content if not isinstance(ai_msg, str) else ai_msg\n\n            return output_message\n        except Exception as e:\n            logger.error(f\"Error during question processing: {e}\")\n```\n\nThe Chain uses two prompts: the system prompt that creates the proper context to assist the LLM in generating the SQL query. We're providing the create table script.\n\n```python\nPROMPT = \"\"\"\nYou are an expert in generating SQL queries based on user questions.\nYou have access to a database with the following table schema:\n\nCREATE TABLE actors (\n    nconst TEXT PRIMARY KEY,\n    primaryname TEXT,\n    birthyear INTEGER,\n    deathyear INTEGER,\n    primaryprofession TEXT,\n    knownfortitles TEXT\n);\n\nPlease generate an SQL query to answer the following user question.\nEnsure the query is valid, secure, and tailored to the provided schema.\nReturn only the SQL query without additional explanations.\nDon't use quotes around the query in any case.\n\"\"\"\n```\n\nAnd that's all. With it we can ask quetions about this dataset and llm genetes the SQL for us.\n\n```commandline\npython cli.py movie --q=\"List the living actors under 10 years old.\"\n\nq: List the living actors under 10 years old.\nSELECT * FROM actors WHERE deathyear IS NULL AND birthyear \u003e (EXTRACT(YEAR FROM CURRENT_DATE) - 10);\n...\n```\n\n```commandline\npython cli.py movie --q=\"List the living actors who were born in the same year as Mel Gibson.\"\n\nq: List the living actors who were born in the same year as Mel Gibson\nSELECT * FROM actors WHERE birthyear = (SELECT birthyear FROM actors WHERE primaryname = 'Mel Gibson') AND deathyear IS NULL;\n...\n```\n\n```commandline\ncli.py movie --q=\"List the deceased actors who were born in the same year as Mel Gibson.\"\n\nq: List the deceased actors who were born in the same year as Mel Gibson.\nSELECT * \nFROM actors \nWHERE deathyear IS NOT NULL \nAND birthyear = (SELECT birthyear \n                 FROM actors \n                 WHERE primaryname = 'Mel Gibson');\n...\n```\n\n```commandline\npython cli.py movie --q=\"What is the name, date of birth, and age of the oldest living actor born in the 70s?\"\n\nq: What is the name, date of birth, and age of the oldest living actor born in the 70s?\nSELECT primaryname, birthyear, (2023 - birthyear) AS age \nFROM actors \nWHERE birthyear \u003e= 1970 AND birthyear \u003c 1980 AND deathyear IS NULL \nORDER BY birthyear ASC \nLIMIT 1;\n\n{'primaryname': 'Missy Gold', 'birthyear': 1970, 'age': 53}\n```\n\nWith projects like these, where we execute \"random\" SQL generated by an LLM, it's crucial to manage user access to the database carefully. Restricting access helps mitigate potential SQL injection risks, especially depending on the prompts provided by the user when interacting with the LLM.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgonzalo123%2Fsql_llm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgonzalo123%2Fsql_llm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgonzalo123%2Fsql_llm/lists"}