Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gkvoelkl/python-geochat
Talk with a Digital Twin In Natural Language
https://github.com/gkvoelkl/python-geochat
chromadb digital-twins llama-i llm ollama openai-api python streamlit
Last synced: about 1 month ago
JSON representation
Talk with a Digital Twin In Natural Language
- Host: GitHub
- URL: https://github.com/gkvoelkl/python-geochat
- Owner: gkvoelkl
- Created: 2024-02-18T12:41:02.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2024-02-18T14:50:25.000Z (11 months ago)
- Last Synced: 2024-02-19T16:07:43.416Z (11 months ago)
- Topics: chromadb, digital-twins, llama-i, llm, ollama, openai-api, python, streamlit
- Language: Python
- Homepage:
- Size: 543 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.ipynb
Awesome Lists containing this project
README
{
"cells": [
{
"cell_type": "markdown",
"id": "76f070e8-ad39-41d2-bee4-b6c9f079183c",
"metadata": {},
"source": [
"# GeoChat - Talk with a Digital Twin In Natural Language"
]
},
{
"cell_type": "markdown",
"id": "662b4ab6-c8e9-4263-85c1-b3090de9fc3a",
"metadata": {},
"source": [
"[email protected]"
]
},
{
"cell_type": "markdown",
"id": "f3582a82-826c-479a-b01b-ad98a233682b",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"id": "1e01f39f-5039-40c3-9c74-347995f24b3c",
"metadata": {},
"source": [
"Normally, the data that makes up a **digital twin** is displayed in **3D**.\n",
"This looks good but is not easy to use and understand.\n",
"\n",
"Programs like **ChatGPT** and other **LLMs** have shown how to **easily bring huge amounts of information to people**.\n",
"\n",
"With technologies such as\n",
"\n",
"* LlamaIndex,\n",
"* ChromaDB,\n",
"* OpenAI API,\n",
"* Ollama or\n",
"* Streamlit,\n",
"\n",
"it is relatively easy to create a **natural language interface** for a digital twin and its geoinformation.\n"
]
},
{
"cell_type": "markdown",
"id": "cf63ff61-467d-4f8d-bc31-b5036701de72",
"metadata": {},
"source": [
"# Part 1: How many Buildings?"
]
},
{
"cell_type": "markdown",
"id": "b60fb318-3d3e-4325-b9d6-e1b2be45fbaa",
"metadata": {},
"source": [
"My **first digital twin** consists of the **buildings of a large city** that I got **from OpenStreetMap**. These are stored in a **relational database (postgresql)**."
]
},
{
"cell_type": "markdown",
"id": "5c7f616c-4a15-447e-98b1-915ff079d443",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"id": "934a3721-fbee-4622-ae3b-04173b31aff6",
"metadata": {},
"source": [
"An **LLM** turns the **user's question** into a suitable **query to the database**.\n",
"And a suitable answer from the data received.\n",
"\n",
"Some examples:"
]
},
{
"cell_type": "markdown",
"id": "355b13cd-de29-423d-86c6-7976348c4666",
"metadata": {},
"source": [
"đ¤ Chat: Ask me a question about the Database!\n",
"\n",
"đ¤ User: How many buildings?\n",
"\n",
"đ¤ Chat:There are a total of 931,866 buildings in the database.\n",
"\n",
"đ¤ User: I stand in Baker Street. Where is the next bank?\n",
"\n",
"đ¤ Chatbot: The nearest bank to Baker Street is TSB located in London with the postcode W1U 7DL. It is right on Baker Street itself."
]
},
{
"cell_type": "markdown",
"id": "76a6159b-98f9-40ef-a5cf-621a8c1be5ca",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7edd7d77-35c3-4b54-8f5a-e81cffab71fc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting geochat.py\n"
]
}
],
"source": [
"%%writefile geochat.py\n",
"import streamlit as st\n",
"\n",
"from sqlalchemy import create_engine, MetaData\n",
"from geoalchemy2 import Geometry\n",
"\n",
"from llama_index.core import SQLDatabase, Settings\n",
"from llama_index.llms.openai import OpenAI\n",
"from llama_index.llms.ollama import Ollama\n",
"from llama_index.core.query_engine import NLSQLTableQueryEngine\n",
" \n",
"import pandas\n",
"from pprint import pprint\n",
"\n",
"USE_OPENAI = True\n",
"\n",
"if USE_OPENAI:\n",
" # -- connect to openai\n",
" import openai\n",
" openai.api_key = st.secrets.openai_key\n",
"\n",
"# -- include tables\n",
"include_tables = [\"osm_buildings\"]\n",
"\n",
"# -- page config\n",
"title = \"GeoChat - Talk with your Data đŦ đ\"\n",
"\n",
"st.set_page_config(\n",
" page_title=title,\n",
" layout=\"centered\",\n",
" initial_sidebar_state=\"auto\",\n",
" menu_items=None,\n",
")\n",
"\n",
"st.header(title)\n",
"\n",
"# -- init message history\n",
"if \"messages\" not in st.session_state.keys():\n",
" st.session_state.messages = [\n",
" {\"role\": \"assistant\", \n",
" \"content\": \"Ask me a question about the Database!\"}\n",
" ]\n",
"\n",
"# -- prepare data\n",
"@st.cache_resource(show_spinner=False)\n",
"def load_data():\n",
" with st.spinner(text=\"Initalizing Data â hang tight! This should take 1-2 minutes.\"):\n",
" url = 'postgresql+psycopg2://postgres:mysecretpassword@localhost:5432/postgres'\n",
" engine = create_engine(url)\n",
" \n",
" custom_table_info = {\n",
" \"osm_buildings\": \"stores all the buildings of a great city\"\n",
" }\n",
"\n",
" if USE_OPENAI:\n",
" Settings.llm = OpenAI(\n",
" temperature=0.1,\n",
" model=\"gpt-3.5-turbo\"\n",
" )\n",
" else:\n",
" Settings.llm = Ollama(\n",
" model=\"llama2\", \n",
" request_timeout=120.0\n",
" )\n",
" \n",
" sql_database = SQLDatabase(\n",
" engine, \n",
" include_tables=include_tables,\n",
" custom_table_info = custom_table_info\n",
" )\n",
"\n",
" return sql_database, engine\n",
"\n",
"\n",
"sql_database, engine = load_data()\n",
"\n",
"# -- Sidebar\n",
"def sidebar_infos(engine):\n",
" st.sidebar.image(\"./img/logo.png\",\n",
" width = 50,\n",
" use_column_width=None)\n",
" \n",
" st.sidebar.markdown(\"## Database\")\n",
"\n",
" metadata = MetaData()\n",
" metadata.reflect(bind=engine)\n",
"\n",
" table_names = include_tables # metadata.tables.keys()\n",
" selected_table = st.sidebar.selectbox(\"Select a Table\", table_names)\n",
" \n",
" if selected_table:\n",
" table = metadata.tables[selected_table]\n",
" columns_info = [{'Column': column.name, 'Type': str(column.type)} for column in table.columns]\n",
" df = pandas.DataFrame(columns_info, index=None)\n",
" st.sidebar.dataframe(df)\n",
" \n",
" # Sidebar Intro\n",
" st.sidebar.markdown('## Created By')\n",
" st.sidebar.markdown(\"[email protected]\")\n",
" \n",
" st.sidebar.markdown('## Disclaimer')\n",
" st.sidebar.markdown(\"This application is only for demonstration purposes.\")\n",
"\n",
"st.sidebar.header(\"GeoChat\")\n",
"info_on = st.sidebar.toggle('Activate info')\n",
"sidebar_infos(engine)\n",
"\n",
"# -- create engine\n",
"if \"query_engine\" not in st.session_state:\n",
" st.session_state[\"query_engine\"] = NLSQLTableQueryEngine(\n",
" sql_database = sql_database,\n",
" #streaming=True\n",
" ) \n",
"\n",
"# -- ask user\n",
"if prompt := st.chat_input(\"Your question\"):\n",
" st.session_state.messages.append( # save prompt\n",
" {\"role\": \"user\", \n",
" \"content\": prompt}\n",
" )\n",
"\n",
"for message in st.session_state.messages: # Display the prior chat messages\n",
" with st.chat_message(message[\"role\"]):\n",
" st.write(message[\"content\"])\n",
"\n",
"# -- get answer\n",
"if st.session_state.messages[-1][\"role\"] != \"assistant\":\n",
" with st.chat_message(\"assistant\"):\n",
" with st.spinner(\"Thinking...\"):\n",
" response = st.session_state[\"query_engine\"].query(\"User Question:\"+prompt+\". \")\n",
" if info_on:\n",
" st.info(f\"sql {response.metadata['sql_query']}\",icon=\"âšī¸\")\n",
" st.write(response.response)\n",
" message = {\"role\": \"assistant\", \"content\": response.response}\n",
" st.session_state.messages.append(message) # Add response to message history\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "401bebf5-9246-4b66-8bb8-d879a56b2b1e",
"metadata": {},
"outputs": [],
"source": [
"!streamlit run geochat.py"
]
},
{
"cell_type": "markdown",
"id": "34e289f1-bd13-4123-b6ee-cc51c1337b7b",
"metadata": {},
"source": [
"# Part 2: There is no Bakerstreet đ in London! Challenges đ§Š"
]
},
{
"cell_type": "markdown",
"id": "0367af26-06b1-4e0b-9d42-53c0b01e29aa",
"metadata": {},
"source": [
"When you try to talk to your **digital twin**, you **don't** always get the **answers you expect**."
]
},
{
"cell_type": "markdown",
"id": "04223fa4-f359-48bd-919c-786e931c9bc2",
"metadata": {},
"source": [
"đ¤ Chat: Ask me a question about the Database!\n",
"\n",
"đ¤ User: How many buildings are in Bakerstreet?\n",
"\n",
"đ¤ Chat: There are no buildings listed in the database for Bakerstreet."
]
},
{
"cell_type": "markdown",
"id": "5c8d776a-7fa9-4687-82f1-606564819e5f",
"metadata": {},
"source": [
"After many questions, it turned out that there are three basic challenges."
]
},
{
"cell_type": "markdown",
"id": "f2dc5d80-9a3b-4413-abc5-235a65033d61",
"metadata": {},
"source": [
"### 𧊠Challenge 1ī¸âŖ: Values in Columns that are Difficult to Understand "
]
},
{
"cell_type": "markdown",
"id": "722bcbe1-1db5-4dfa-85a0-642dda682170",
"metadata": {},
"source": [
"The LLM cannot find \"Bakerstreet\" because it is not written the way it is written in the database (\"Baker Street\") - **different spellings**.\n",
"\n",
"The value range of an attribute is an important information for the LLM. Example: In a column the value â1â stands for \"true\" and â0â for \"false\". The meaning of the content is important for meaningful answers. - **different meanings**"
]
},
{
"cell_type": "markdown",
"id": "f6e77272-babc-4d59-945c-788c5253ffb1",
"metadata": {},
"source": [
"### 𧊠Challenge 2ī¸âŖ: Spatial Relationships and Connectivity between Real Things"
]
},
{
"cell_type": "markdown",
"id": "2cfffcfe-4a3a-47ce-9c18-a504d57ab672",
"metadata": {},
"source": [
"Geo databases have special features that are used by Digital Twins: **spatial query and spatial join**\n",
"\n",
"**spatial query** uses topological relationships betweeen objects \"Which buildings touches the building in Baker Street 221b?\"\n",
"\n",
"**spatial join** combines two datasets with rows being matched based on a desired topological relationship, rather than using a stored values as in a normal table join in a relational database.\n",
" \"How many buildings are in the boundary of Westminster?\""
]
},
{
"cell_type": "markdown",
"id": "b1178ec4-aa6f-4f09-8b54-b74bdb1d6f2e",
"metadata": {},
"source": [
"### 𧊠Challenge 3ī¸âŖ: Databases with Many Tables"
]
},
{
"cell_type": "markdown",
"id": "e50af83e-03ac-4213-b22b-9f90e38a550a",
"metadata": {},
"source": [
"The **database of a digital twin** usually consists of **many tables**. Often there are **hundreds** or **thousands**. \n",
"\n",
"Since the **query** to the LLM is **limited in size**, it is not possible to provide a description of all tables. The **LLM** therefore does not know **which tables are available**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0dc3f16-e162-44da-89f5-7e19bae5345f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}