{"id":30880082,"url":"https://github.com/cloud-shuttle/text2sql-blog","last_synced_at":"2025-09-08T06:12:00.389Z","repository":{"id":261195768,"uuid":"875572592","full_name":"cloud-shuttle/text2sql-blog","owner":"cloud-shuttle","description":"This is a quick text2sql demo for a bit of fun that I built in a few hours on a Sunday afternoon.","archived":false,"fork":false,"pushed_at":"2024-10-20T13:29:24.000Z","size":16,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-11-05T08:28:11.588Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cloud-shuttle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-20T10:51:56.000Z","updated_at":"2024-10-21T00:07:46.000Z","dependencies_parsed_at":"2024-11-05T08:38:18.568Z","dependency_job_id":null,"html_url":"https://github.com/cloud-shuttle/text2sql-blog","commit_stats":null,"previous_names":["cloud-shuttle/text2sql-blog"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cloud-shuttle/text2sql-blog","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloud-shuttle%2Ftext2sql-blog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloud-shuttle%2Ftext2sql-blog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloud-shuttle%2Ftext2sql-blog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloud-shuttle%2Ftext2sql-blog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cloud-shuttle","download_url":"https://codeload.github.com/cloud-shuttle/text2sql-blog/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloud-shuttle%2Ftext2sql-blog/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274141007,"owners_count":25229147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-08T06:11:58.920Z","updated_at":"2025-09-08T06:12:00.370Z","avatar_url":"https://github.com/cloud-shuttle.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\nHey folks, we are going to build out a text2sql solution. To do that first up we need to install ollama and download the duckdb-nsql 7b parameter LLM.\n\n[![Watch the video]](https://youtu.be/XDktyydC3hQ)\n\n\nGo to https://ollama.com/ and download the binary (going to assume you're on a mac for this blog)\n\n```bash\nollama run duckdb-nsql\n```\n\n## Part 2: Ask a question of duckdb-nsql LLM\n\nFirst up we want to check the response from the duckdb-nsql LLM with no context\n```bash\ncurl http://localhost:11434/api/generate -d '{\n  \"model\": \"duckdb-nsql\",\n  \"prompt\": \"The total number of sales for APAC this year\",\n  \"stream\": false\n}'\n```\n\nOkay that returns a response, let's use jq to make that legible\n```bash\ncurl http://localhost:11434/api/generate -d '{\n  \"model\": \"duckdb-nsql\",\n  \"prompt\": \"The total number of sales for APAC this year\",\n  \"stream\": false\n}' | jq '.response'\n```\n\nOkay that looks like sql but we haven't even got a database yet so the LLM has basically guessed the answer. Let's get to work on spinning up a database.\n\n## Part 3: Spin up Northwind database locally\n\nYes the blast from the past. Let's spin up Northwind but this time let's do it using PostgreSQL. Luckily I found this repo on Github by a chap by the name Pascal Thomas (please give him a star for his awesome work when you visit): https://github.com/pthom/northwind_psql\n\nThis has a docker compose with both postgres database, pgAdmin and a bootstrap script to load in the Northwind database. Let's go ahead and clone it and spin it up:\n\n```bash\ngit clone git@github.com:pthom/northwind_psql.git\ncd northwind_psql\ndocker-compose up\n```\n\nNo that it has spun up, let's go to the browser at http://localhost:5050/ and login to pgAdmin with the following credentials (let's not do this in prod, folks!):\n- General Tab:\n    - Name = db\n- Connection Tab:\n    - Host name: db\n    - Username: postgres\n    - Password: postgres\n\n## Part 4: Take a look at the information schema\n\nIn there, let's take a look at a few tables and then look at the information schema with this simple query:\n```sql\nselect * from information_schema.columns\nwhere table_schema = 'public'\norder by table_name, ordinal_position;\n```\n\nAs you can see we've got a bunch of information in regards to the tables in postgres loaded from the Northwind database. We can use this information to form the basis of our Prompt engineering. In other words, to ground our prompt with some context about the types of tables and columns available in the database.\n\n## Part 5: Spin up Meilisearch as our RAG\n\nOkay so let's spin up Meilisearch, a rust based search database that we can use to store the table and column information that we can use as a RAG for the LLM.\n\n```bash\ndocker pull getmeili/meilisearch:v1.10\n\ndocker run -it --rm \\\n  -p 7700:7700 \\\n  -v $(pwd)/meili_data:/meili_data \\\n  getmeili/meilisearch:v1.10\n```\n\nNow that this service is up, let's go ahead and get some data in there!\n\n## Part 6: Python scripts\n\nFirst up we need to load some data in from the PostgreSQL information schema into Meilisearch so let's clone my repo and get a python environment setup and clone the repo with the code in it.\n\n```shell\ngit clone git@github.com:cloud-shuttle/text2sql-blog.git\ncd text2sql-blog\nuv venv\nsource .venv/bin/activate\nuv pip install psycopg2-binary python-dotenv meilisearch ollama\n```\n\nThen copy the below into your dot env (.env) file (again this is for local testing only and not for prod use cases)\n\n```bash\nPG_HOST=\"localhost\"\nPG_DATABASE=\"northwind\"\nPG_USER=\"postgres\"\nPG_PASSWORD=\"postgres\"\n\n# Meilisearch connection parameters\nMEILISEARCH_HOST=\"http://localhost:7700\"\nMEILISEARCH_API_KEY=\"ADSF\"\n```\n\nNow onto the fun part.\n\n### Hydrate Meilisearch\n\nNow we want to load up the metadata from the information schema in PostgreSQL into Meilisearch\n```bash\npython hydrate_search.py\nIndexed 14 tables in Meilisearch\n```\n\n### Query Meilisearch to sample the results\n\nNext we want to query Meilisearch to sample the results\n\n```bash\npython search_results.py orders\nCREATE TABLE orders (\n    order_id int2,\n    employee_id int2,\n    order_date date,\n    required_date date,\n    shipped_date date,\n    ship_via int2,\n    freight float4,\n    customer_id varchar,\n    ship_name varchar,\n    ship_address varchar,\n    ship_city varchar,\n    ship_region varchar,\n    ship_postal_code varchar,\n    ship_country varchar\n);\n```\n\n### Use the context to prompt Ollama\n\nNow that we have the context from our search database RAG, we can use it as part of our prompt to the duckdb-nsql model in Ollama to get our text2sql result.\n\n```bash\npython ollama_e2e.py orders \"the total number of orders from Belgium in 1996\"\nSELECT COUNT(*) FROM orders WHERE ship_country = 'Belgium' AND order_date BETWEEN '1996-01-01' AND '1996-12-31';\n```\n\nNow let's go ahead and use the outputted query in PGAdmin to see if it works?\n\n```sql\nSELECT COUNT(*) FROM orders WHERE ship_country = 'Belgium' AND order_date BETWEEN '1996-01-01' AND '1996-12-31';\n--result 2\n```\n\nAwesome so it works for this one use case and obviously productionising this and going through the edge case would be some work but I thought, it's Sunday, let's have some fun.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloud-shuttle%2Ftext2sql-blog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloud-shuttle%2Ftext2sql-blog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloud-shuttle%2Ftext2sql-blog/lists"}