{"id":13593310,"url":"https://github.com/hardbyte/qabot","last_synced_at":"2025-04-13T04:59:36.661Z","repository":{"id":107154222,"uuid":"609319752","full_name":"hardbyte/qabot","owner":"hardbyte","description":"CLI based natural language queries on local or remote data","archived":false,"fork":false,"pushed_at":"2025-03-05T21:10:51.000Z","size":3930,"stargazers_count":242,"open_issues_count":2,"forks_count":20,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-13T04:59:27.016Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hardbyte.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"Security.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-03T21:18:39.000Z","updated_at":"2025-04-09T21:06:24.000Z","dependencies_parsed_at":"2024-01-07T21:45:24.843Z","dependency_job_id":"836cd372-6fd1-42ff-b8a9-f5c3ec6f6e1f","html_url":"https://github.com/hardbyte/qabot","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hardbyte%2Fqabot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hardbyte%2Fqabot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hardbyte%2Fqabot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hardbyte%2Fqabot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hardbyte","download_url":"https://codeload.github.com/hardbyte/qabot/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248665756,"owners_count":21142123,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:19.042Z","updated_at":"2025-04-13T04:59:36.619Z","avatar_url":"https://github.com/hardbyte.png","language":"Python","funding_links":[],"categories":["others","Python","开源项目","Open Source Projects"],"sub_categories":["其他聊天机器人","Other / Chatbots"],"readme":"# qabot\n\nQuery local or remote files with natural language queries powered by\nOpenAI's `gpt` and `duckdb` 🦆.\n\nCan query local and remote files (CSV, parquet)\n\n## Installation\n\nInstall with `uv`, `pipx`, `pip` etc:\n\n```\nuv tool install qabot\n```\n\n## Features\n\nWorks on local CSV, sqlite and Excel files:\n\n![](.github/local_csv_query.png)\n\nremote CSV files:\n\n```\n$ qabot -f https://duckdb.org/data/holdings.csv -q \"Tell me how many Apple holdings I currently have\"\n 🦆 Creating local DuckDB database...\n 🦆 Loading data...\ncreate view 'holdings' as select * from 'https://duckdb.org/data/holdings.csv';\n 🚀 Sending query to LLM\n 🧑 Tell me how many Apple holdings I currently have\n 🤖 You currently have 32.23 shares of Apple.\n\n\nThis information was obtained by summing up all the Apple ('APPL') shares in the holdings table.\n\nSELECT SUM(shares) as total_shares FROM holdings WHERE ticker = 'APPL'\n```\n\nEven on (public) data stored in S3:\n\n```\n$ qabot -f s3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv -q \"how many confirmed cases of covid are there by month?\" -v\n\n🤖 Monthly confirmed cases from January to May 2020: ranging from 7 in January, 24 in February, 188,123 in March, 1,069,172 in April and 1,745,582 in May.\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003eExtra Details (from qabot)\u003c/summary\u003e\n  \n  The above figures were computed by aggregating the dataset on a per-entity basis (using a unique identifier `uid`), selecting the last available (maximum) date in each month, and summing the confirmed case counts. Here is the SQL query that was used:\n  \n  ```sql\n  WITH monthly_data AS (\n      SELECT uid, strftime('%Y-%m', date) AS month, MAX(date) AS max_date\n      FROM memory.main.jhu_csse_covid_19_timeseries_merged\n      GROUP BY uid, strftime('%Y-%m', date)\n  )\n  SELECT m.month, SUM(j.confirmed) AS confirmed\n  FROM monthly_data m\n  JOIN memory.main.jhu_csse_covid_19_timeseries_merged j\n    ON m.uid = j.uid AND m.max_date = j.date\n  GROUP BY m.month\n  ORDER BY m.month;\n  ```\n\n  This method ensures that for each month, the cumulative confirmed case count is captured at the end of the month based on the latest data available for each entity (uid).\n\u003c/details\u003e\n\n\n### Load data within a session\n\nYou can even load data from disk/URL via the natural language query:\n\n\u003e Load the file 'data/titanic.csv' into a table called 'raw_passengers'. \n\u003e Create a view of the raw passengers table for just the male passengers. What \n\u003e was the average fare for surviving male passengers?\n\n```\n 🦆 Creating local DuckDB database...\n 🚀 Sending query to LLM\n 🤖 The average fare for surviving male passengers is approximately $40.82.\n\n\nI created a table called `raw_passengers` from the Titanic dataset loaded from 'data/titanic.csv'. Then, I created a view called `male_passengers` that\nincludes only male passengers. Finally, I calculated the average fare for surviving male passengers, which is approximately $40.82.\n\nSELECT AVG(Fare) AS average_fare_surviving_male FROM male_passengers WHERE Survived = 1;\n\n```\n\n## Quickstart\n\nYou need to set the `OPENAI_API_KEY` environment variable to your OpenAI API key, \nwhich you can get from [here](https://platform.openai.com/account/api-keys). Other OpenAI compatible\nAPIs can also be used by setting `OPENAI_BASE_URL`.\n\nInstall the `qabot` command line tool using uv/pip/pipx:\n\n\n```bash\n$ uv tool install qabot\n```\n\nThen run the `qabot` command with optional files (`-f my-file.csv`) and an initial query `-q \"How many...\"`.\n\nSee all options with `qabot --help`\n\n## Security Risks\n\nThis program gives an LLM access to your local and network accessible files and allows it to execute arbitrary SQL \nqueries in a DuckDB database, see [Security](Security.md) for more information.\n\n\n## LLM Providers\n\nqabot works with any OpenAI compatible api including Ollama and deepseek. Simple set the base URL:\n```\nexport OPENAI_BASE_URL=https://api.deepseek.com\n```\n\nOr Ollama:\n```\nOPENAI_BASE_URL=http://localhost:11434/v1/ \nQABOT_MODEL_NAME=qwen2.5-coder:7b \nQABOT_PLANNING_MODEL_NAME=deepseek-r1:14b \n```\n\n## Python API\n\n```python\nfrom qabot import ask_wikidata, ask_file, ask_database\n\nprint(ask_wikidata(\"How many hospitals are there in New Zealand?\"))\nprint(ask_file(\"How many men were aboard the titanic?\", 'data/titanic.csv'))\nprint(ask_database(\"How many product images are there?\", 'postgresql://user:password@localhost:5432/dbname'))\n```\n\nOutput:\n```text\nThere are 54 hospitals in New Zealand.\nThere were 577 male passengers on the Titanic.\nThere are 6,225 product images.\n```\n\n\n## Examples\n\n### Local CSV file/s\n\n```bash\n$ qabot -q \"Show the survival rate by gender, and ticket class shown as an ASCII graph\" -f data/titanic.csv\n🦆 Loading data from files...\nLoading data/titanic.csv into table titanic...\n\nHere’s the survival count represented as a horizontal bar graph grouped by ticket class and gender:\n\nClass 1:\nFemales  | ██████████████████████████████████████████ (91)\nMales    | ██████████████ (45)\n\nClass 2:\nFemales  | ██████████████████████████ (70)\nMales    | ██████████ (17)\n\nClass 3:\nFemales  | ██████████████████████████████ (72)\nMales    | ██████████████ (47)\n\n\nThis representation allows us to observe that in all classes, a greater number of female passengers survived compared to male passengers, and also highlights the number of survivors is notably higher in the first class compared to the other classes.\n```\n\n\n## Query WikiData\n\nUse the `-w` flag to query wikidata.\n\n```bash\n$ qabot -w -q \"How many Hospitals are there located in Beijing\"\n```\n\n## Intermediate steps and database queries\n\nUse the `-v` flag to see the intermediate steps and database queries.\nSometimes it takes a long route to get to the answer, but it's often interesting to see how it gets there.\n\n## Data accessed via http/s3\n\nUse the `-f \u003curl\u003e` flag to load data from a url, e.g. a csv file on s3:\n\n```bash\n$ qabot -f s3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv -q \"how many confirmed cases of covid are there?\" -v\n🦆 Loading data from files...\ncreate table jhu_csse_covid_19_timeseries_merged as select * from 's3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv';\n\nResult:\n264308334 confirmed cases\n```\n\n## Docker Usage\n\nYou can run `qabot` via Docker:\n\n```bash\ndocker run --rm \\\n  -e OPENAI_API_KEY=\u003cyour_openai_api_key\u003e \\\n  -v ./data:/opt\n  ghcr.io/hardbyte/qabot -f /opt/titanic.csv -q \"What ratio of passengers were under 30?\"\n```\n\nReplace the mount path to your actual data along with replacing `your_openai_api_key`.\n\n## Ideas\n- G-Sheets via https://github.com/evidence-dev/duckdb_gsheets\n- Streaming mode to output results as they come in\n- token limits and better reporting of costs\n- Supervisor agent - assess whether a query is \"safe\" to run, could ask for user confirmation to run anything that gets flagged.\n- Often we can zero-shot the question and get a single query out - perhaps we try this before the MKL chain\n- test each zeroshot agent individually\n- Generate and pass back assumptions made to the user\n- Add an optional \"clarify\" tool to the chain that asks the user to clarify the question\n- Create a query checker tool that checks if the query looks valid and/or safe\n- Inject AWS credentials into duckdb for access to private resources in S3\n- Automatic publishing to pypi e.g. using [trusted publishers](https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhardbyte%2Fqabot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhardbyte%2Fqabot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhardbyte%2Fqabot/lists"}