{"id":23337409,"url":"https://github.com/gkvoelkl/python-geochat","last_synced_at":"2026-05-09T04:32:10.069Z","repository":{"id":223311382,"uuid":"759379723","full_name":"gkvoelkl/python-geochat","owner":"gkvoelkl","description":"Talk with a Digital Twin In Natural Language","archived":false,"fork":false,"pushed_at":"2024-02-18T14:50:25.000Z","size":556,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-02-19T16:07:43.416Z","etag":null,"topics":["chromadb","digital-twins","llama-i","llm","ollama","openai-api","python","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gkvoelkl.png","metadata":{"files":{"readme":"README.ipynb","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-18T12:41:02.000Z","updated_at":"2024-04-15T15:05:13.033Z","dependencies_parsed_at":"2024-04-15T15:05:11.649Z","dependency_job_id":"f1fba7ee-7a00-449a-bfbc-4cf27e05c064","html_url":"https://github.com/gkvoelkl/python-geochat","commit_stats":null,"previous_names":["gkvoelkl/python-geochat"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gkvoelkl/python-geochat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gkvoelkl%2Fpython-geochat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gkvoelkl%2Fpython-geochat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gkvoelkl%2Fpython-geochat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gkvoelkl%2Fpython-geochat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gkvoelkl","download_url":"https://codeload.github.com/gkvoelkl/python-geochat/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gkvoelkl%2Fpython-geochat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32807185,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T08:22:46.396Z","status":"online","status_checked_at":"2026-05-09T02:00:06.633Z","response_time":123,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chromadb","digital-twins","llama-i","llm","ollama","openai-api","python","streamlit"],"created_at":"2024-12-21T02:17:17.953Z","updated_at":"2026-05-09T04:32:10.054Z","avatar_url":"https://github.com/gkvoelkl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"76f070e8-ad39-41d2-bee4-b6c9f079183c\",\n   \"metadata\": {},\n   \"source\": [\n    \"# GeoChat - Talk with a Digital Twin In Natural Language\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"662b4ab6-c8e9-4263-85c1-b3090de9fc3a\",\n   \"metadata\": {},\n   \"source\": [\n    \"gkvoelkl@nelson-games.de\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f3582a82-826c-479a-b01b-ad98a233682b\",\n   \"metadata\": {},\n   \"source\": [\n    \"\u003cimg src=\\\"img/start.jpg\\\" width=\\\"320\\\" align=\\\"left\\\"\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"1e01f39f-5039-40c3-9c74-347995f24b3c\",\n   \"metadata\": {},\n   \"source\": [\n    \"Normally, the data that makes up a **digital twin** is displayed in **3D**.\\n\",\n    \"This looks good but is not easy to use and understand.\\n\",\n    \"\\n\",\n    \"Programs like **ChatGPT** and other **LLMs** have shown how to **easily bring huge amounts of information to people**.\\n\",\n    \"\\n\",\n    \"With technologies such as\\n\",\n    \"\\n\",\n    \"* LlamaIndex,\\n\",\n    \"* ChromaDB,\\n\",\n    \"* OpenAI API,\\n\",\n    \"* Ollama or\\n\",\n    \"* Streamlit,\\n\",\n    \"\\n\",\n    \"it is relatively easy to create a **natural language interface** for a digital twin and its geoinformation.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"cf63ff61-467d-4f8d-bc31-b5036701de72\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Part 1: How many Buildings?\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"b60fb318-3d3e-4325-b9d6-e1b2be45fbaa\",\n   \"metadata\": {},\n   \"source\": [\n    \"My **first digital twin** consists of the **buildings of a large city** that I got **from OpenStreetMap**. These are stored in a **relational database (postgresql)**.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5c7f616c-4a15-447e-98b1-915ff079d443\",\n   \"metadata\": {},\n   \"source\": [\n    \"\u003cimg src=\\\"img/db1.png\\\" width=\\\"320\\\" align=\\\"left\\\"\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"934a3721-fbee-4622-ae3b-04173b31aff6\",\n   \"metadata\": {},\n   \"source\": [\n    \"An **LLM** turns the **user's question** into a suitable **query to the database**.\\n\",\n    \"And a suitable answer from the data received.\\n\",\n    \"\\n\",\n    \"Some examples:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"355b13cd-de29-423d-86c6-7976348c4666\",\n   \"metadata\": {},\n   \"source\": [\n    \"🤖 Chat: Ask me a question about the Database!\\n\",\n    \"\\n\",\n    \"👤 User: How many buildings?\\n\",\n    \"\\n\",\n    \"🤖 Chat:There are a total of 931,866 buildings in the database.\\n\",\n    \"\\n\",\n    \"👤 User: I stand in Baker Street. Where is the next bank?\\n\",\n    \"\\n\",\n    \"🤖 Chatbot: The nearest bank to Baker Street is TSB located in London with the postcode W1U 7DL. It is right on Baker Street itself.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"76a6159b-98f9-40ef-a5cf-621a8c1be5ca\",\n   \"metadata\": {},\n   \"source\": [\n    \"\u003cimg src=\\\"img/chat1.png\\\" width=\\\"320\\\" align=\\\"left\\\"\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"id\": \"7edd7d77-35c3-4b54-8f5a-e81cffab71fc\",\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Overwriting geochat.py\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"%%writefile geochat.py\\n\",\n    \"import streamlit as st\\n\",\n    \"\\n\",\n    \"from sqlalchemy import create_engine, MetaData\\n\",\n    \"from geoalchemy2 import Geometry\\n\",\n    \"\\n\",\n    \"from llama_index.core import SQLDatabase, Settings\\n\",\n    \"from llama_index.llms.openai import OpenAI\\n\",\n    \"from llama_index.llms.ollama import Ollama\\n\",\n    \"from llama_index.core.query_engine import NLSQLTableQueryEngine\\n\",\n    \"       \\n\",\n    \"import pandas\\n\",\n    \"from pprint import pprint\\n\",\n    \"\\n\",\n    \"USE_OPENAI = True\\n\",\n    \"\\n\",\n    \"if USE_OPENAI:\\n\",\n    \"    # -- connect to openai\\n\",\n    \"    import openai\\n\",\n    \"    openai.api_key = st.secrets.openai_key\\n\",\n    \"\\n\",\n    \"# -- include tables\\n\",\n    \"include_tables = [\\\"osm_buildings\\\"]\\n\",\n    \"\\n\",\n    \"# -- page config\\n\",\n    \"title = \\\"GeoChat - Talk with your Data 💬 📚\\\"\\n\",\n    \"\\n\",\n    \"st.set_page_config(\\n\",\n    \"    page_title=title,\\n\",\n    \"    layout=\\\"centered\\\",\\n\",\n    \"    initial_sidebar_state=\\\"auto\\\",\\n\",\n    \"    menu_items=None,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"st.header(title)\\n\",\n    \"\\n\",\n    \"# -- init message history\\n\",\n    \"if \\\"messages\\\" not in st.session_state.keys():\\n\",\n    \"    st.session_state.messages = [\\n\",\n    \"        {\\\"role\\\": \\\"assistant\\\", \\n\",\n    \"         \\\"content\\\": \\\"Ask me a question about the Database!\\\"}\\n\",\n    \"    ]\\n\",\n    \"\\n\",\n    \"# -- prepare data\\n\",\n    \"@st.cache_resource(show_spinner=False)\\n\",\n    \"def load_data():\\n\",\n    \"    with st.spinner(text=\\\"Initalizing Data – hang tight! This should take 1-2 minutes.\\\"):\\n\",\n    \"        url = 'postgresql+psycopg2://postgres:mysecretpassword@localhost:5432/postgres'\\n\",\n    \"        engine = create_engine(url)\\n\",\n    \"    \\n\",\n    \"        custom_table_info = {\\n\",\n    \"            \\\"osm_buildings\\\": \\\"stores all the buildings of a great city\\\"\\n\",\n    \"        }\\n\",\n    \"\\n\",\n    \"        if USE_OPENAI:\\n\",\n    \"            Settings.llm = OpenAI(\\n\",\n    \"                temperature=0.1,\\n\",\n    \"                model=\\\"gpt-3.5-turbo\\\"\\n\",\n    \"            )\\n\",\n    \"        else:\\n\",\n    \"            Settings.llm = Ollama(\\n\",\n    \"                model=\\\"llama2\\\", \\n\",\n    \"                request_timeout=120.0\\n\",\n    \"            )\\n\",\n    \"            \\n\",\n    \"        sql_database = SQLDatabase(\\n\",\n    \"            engine, \\n\",\n    \"            include_tables=include_tables,\\n\",\n    \"            custom_table_info = custom_table_info\\n\",\n    \"        )\\n\",\n    \"\\n\",\n    \"    return sql_database, engine\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"sql_database, engine = load_data()\\n\",\n    \"\\n\",\n    \"# -- Sidebar\\n\",\n    \"def sidebar_infos(engine):\\n\",\n    \"    st.sidebar.image(\\\"./img/logo.png\\\",\\n\",\n    \"                     width = 50,\\n\",\n    \"                     use_column_width=None)\\n\",\n    \"    \\n\",\n    \"    st.sidebar.markdown(\\\"## Database\\\")\\n\",\n    \"\\n\",\n    \"    metadata = MetaData()\\n\",\n    \"    metadata.reflect(bind=engine)\\n\",\n    \"\\n\",\n    \"    table_names = include_tables # metadata.tables.keys()\\n\",\n    \"    selected_table = st.sidebar.selectbox(\\\"Select a Table\\\", table_names)\\n\",\n    \"        \\n\",\n    \"    if selected_table:\\n\",\n    \"        table = metadata.tables[selected_table]\\n\",\n    \"        columns_info = [{'Column': column.name, 'Type': str(column.type)} for column in table.columns]\\n\",\n    \"        df = pandas.DataFrame(columns_info, index=None)\\n\",\n    \"        st.sidebar.dataframe(df)\\n\",\n    \"                \\n\",\n    \"    # Sidebar Intro\\n\",\n    \"    st.sidebar.markdown('## Created By')\\n\",\n    \"    st.sidebar.markdown(\\\"gkvoelkl@nelson-games.de\\\")\\n\",\n    \"    \\n\",\n    \"    st.sidebar.markdown('## Disclaimer')\\n\",\n    \"    st.sidebar.markdown(\\\"This application is only for demonstration purposes.\\\")\\n\",\n    \"\\n\",\n    \"st.sidebar.header(\\\"GeoChat\\\")\\n\",\n    \"info_on = st.sidebar.toggle('Activate info')\\n\",\n    \"sidebar_infos(engine)\\n\",\n    \"\\n\",\n    \"# -- create engine\\n\",\n    \"if \\\"query_engine\\\" not in st.session_state:\\n\",\n    \"    st.session_state[\\\"query_engine\\\"] = NLSQLTableQueryEngine(\\n\",\n    \"        sql_database = sql_database,\\n\",\n    \"        #streaming=True\\n\",\n    \"    )    \\n\",\n    \"\\n\",\n    \"# -- ask user\\n\",\n    \"if prompt := st.chat_input(\\\"Your question\\\"):\\n\",\n    \"    st.session_state.messages.append( # save prompt\\n\",\n    \"        {\\\"role\\\": \\\"user\\\", \\n\",\n    \"         \\\"content\\\": prompt}\\n\",\n    \"    )\\n\",\n    \"\\n\",\n    \"for message in st.session_state.messages: # Display the prior chat messages\\n\",\n    \"    with st.chat_message(message[\\\"role\\\"]):\\n\",\n    \"        st.write(message[\\\"content\\\"])\\n\",\n    \"\\n\",\n    \"# -- get answer\\n\",\n    \"if st.session_state.messages[-1][\\\"role\\\"] != \\\"assistant\\\":\\n\",\n    \"    with st.chat_message(\\\"assistant\\\"):\\n\",\n    \"        with st.spinner(\\\"Thinking...\\\"):\\n\",\n    \"            response = st.session_state[\\\"query_engine\\\"].query(\\\"User Question:\\\"+prompt+\\\". \\\")\\n\",\n    \"            if info_on:\\n\",\n    \"                st.info(f\\\"sql {response.metadata['sql_query']}\\\",icon=\\\"ℹ️\\\")\\n\",\n    \"            st.write(response.response)\\n\",\n    \"            message = {\\\"role\\\": \\\"assistant\\\", \\\"content\\\": response.response}\\n\",\n    \"            st.session_state.messages.append(message) # Add response to message history\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"401bebf5-9246-4b66-8bb8-d879a56b2b1e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!streamlit run geochat.py\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"34e289f1-bd13-4123-b6ee-cc51c1337b7b\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Part 2: There is no Bakerstreet 🔍 in London! Challenges 🧩\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0367af26-06b1-4e0b-9d42-53c0b01e29aa\",\n   \"metadata\": {},\n   \"source\": [\n    \"When you try to talk to your **digital twin**, you **don't** always get the **answers you expect**.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"04223fa4-f359-48bd-919c-786e931c9bc2\",\n   \"metadata\": {},\n   \"source\": [\n    \"🤖 Chat: Ask me a question about the Database!\\n\",\n    \"\\n\",\n    \"👤 User: How many buildings are in Bakerstreet?\\n\",\n    \"\\n\",\n    \"🤖 Chat: There are no buildings listed in the database for Bakerstreet.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5c8d776a-7fa9-4687-82f1-606564819e5f\",\n   \"metadata\": {},\n   \"source\": [\n    \"After many questions, it turned out that there are three basic challenges.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f2dc5d80-9a3b-4413-abc5-235a65033d61\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 🧩 Challenge 1️⃣: Values in Columns that are Difficult to Understand \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"722bcbe1-1db5-4dfa-85a0-642dda682170\",\n   \"metadata\": {},\n   \"source\": [\n    \"The LLM cannot find \\\"Bakerstreet\\\" because it is not written the way it is written in the database (\\\"Baker Street\\\") - **different spellings**.\\n\",\n    \"\\n\",\n    \"The value range of an attribute is an important information for the LLM. Example: In a column the value “1” stands for \\\"true\\\" and “0” for \\\"false\\\". The meaning of the content is important for meaningful answers. - **different meanings**\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f6e77272-babc-4d59-945c-788c5253ffb1\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 🧩 Challenge 2️⃣: Spatial Relationships and Connectivity between Real Things\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"2cfffcfe-4a3a-47ce-9c18-a504d57ab672\",\n   \"metadata\": {},\n   \"source\": [\n    \"Geo databases have special features that are used by Digital Twins: **spatial query and spatial join**\\n\",\n    \"\\n\",\n    \"**spatial query** uses topological relationships betweeen objects \\\"Which buildings touches the building in Baker Street 221b?\\\"\\n\",\n    \"\\n\",\n    \"**spatial join** combines two datasets with rows being matched based on a desired topological relationship, rather than using a stored values as in a normal table join in a relational database.\\n\",\n    \" \\\"How many buildings are in the boundary of Westminster?\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"b1178ec4-aa6f-4f09-8b54-b74bdb1d6f2e\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 🧩 Challenge 3️⃣: Databases with Many Tables\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e50af83e-03ac-4213-b22b-9f90e38a550a\",\n   \"metadata\": {},\n   \"source\": [\n    \"The **database of a digital twin** usually consists of **many tables**. Often there are **hundreds** or **thousands**. \\n\",\n    \"\\n\",\n    \"Since the **query** to the LLM is **limited in size**, it is not possible to provide a description of all tables. The **LLM** therefore does not know **which tables are available**\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"f0dc3f16-e162-44da-89f5-7e19bae5345f\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3 (ipykernel)\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.13\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgkvoelkl%2Fpython-geochat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgkvoelkl%2Fpython-geochat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgkvoelkl%2Fpython-geochat/lists"}