{"id":15628364,"url":"https://github.com/abhijithneilabraham/tableqa","last_synced_at":"2025-04-04T16:13:13.011Z","repository":{"id":41973924,"uuid":"284113747","full_name":"abhijithneilabraham/tableQA","owner":"abhijithneilabraham","description":"AI Tool for querying natural language on tabular data.","archived":false,"fork":false,"pushed_at":"2023-11-29T00:20:34.000Z","size":29537,"stargazers_count":307,"open_issues_count":27,"forks_count":47,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T15:04:47.111Z","etag":null,"topics":["ai","csv","database","machine-learning","nl2sql","nlp","qa","querying-natural-language","question-answering","sql","sql-generation","sql-query","table-qa","tableqa","tabular-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abhijithneilabraham.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-07-31T19:26:44.000Z","updated_at":"2025-02-10T11:41:34.000Z","dependencies_parsed_at":"2024-01-13T21:26:01.409Z","dependency_job_id":"2b5a1358-6bb7-4961-afd3-2b759550448a","html_url":"https://github.com/abhijithneilabraham/tableQA","commit_stats":{"total_commits":109,"total_committers":9,"mean_commits":12.11111111111111,"dds":"0.11009174311926606","last_synced_commit":"70313e73a195084bf0bd80f2e57a5b3df1dfa470"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhijithneilabraham%2FtableQA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhijithneilabraham%2FtableQA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhijithneilabraham%2FtableQA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhijithneilabraham%2FtableQA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abhijithneilabraham","download_url":"https://codeload.github.com/abhijithneilabraham/tableQA/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247208139,"owners_count":20901570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","csv","database","machine-learning","nl2sql","nlp","qa","querying-natural-language","question-answering","sql","sql-generation","sql-query","table-qa","tableqa","tabular-data"],"created_at":"2024-10-03T10:22:12.329Z","updated_at":"2025-04-04T16:13:12.991Z","avatar_url":"https://github.com/abhijithneilabraham.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tableQA\nAI Tool for querying natural language on tabular data.Built using QA models from [transformers](https://huggingface.co/transformers/model_doc/bert.html#tfbertforquestionanswering).\n\nThis work is described in the following paper:   \n[TableQuery: Querying tabular data with natural language, by Abhijith Neil Abraham, Fariz Rahman and Damanpreet Kaur](https://arxiv.org/abs/2202.00454).   \nIf you use TableQA, please cite the paper. \n\n\nHere is a detailed [blog](https://dev.to/abhijithneilabraham/tableqa-query-your-tabular-data-with-natural-language-39o) to understand how this works.   \n\nA tabular data can be:\n\n- Dataframes\n- CSV files\n\n[![Build Status](https://travis-ci.com/abhijithneilabraham/tableQA.svg?branch=master)](https://travis-ci.com/abhijithneilabraham/tableQA).  \n[![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/abhijithneilabraham/tableQA/blob/master/examples/sample.ipynb).  \n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Bgd3L-839NVZiP3QqWfpkYIufQIm4Rar?usp=sharing)\n\n\n\n#### Features    \n* Supports detection from multiple csvs (csvs can also be read from Amazon s3)\n* Supports FuzzyString implementation. i.e, incomplete column values in query can be automatically detected and filled in the query.\n* Supports Databases - SQLite, Postgresql, MySQL, Amazon RDS (Postgresql, MySQL).\n* Open-Domain, No training required.\n* Add manual schema for customized experience\n* Auto-generate schemas in case schema not provided\n* Data visualisations.  \n\n#### Supported operations.\n- [X] SELECT\n\t- [X] one column\n\t- [X] multiple columns\n\t- [X] all columns\n\t- [X] aggregate functions\n    - [X] distinct select\n\t\t- [X] count-select\n\t\t- [X] sum-select\n\t\t- [X] avg-select\n\t\t- [X] min-select\n\t\t- [X] max-select\n- [X] WHERE\n\t- [X] one condition\n\t- [X] multiple conditions\n\t- [X] operators\n\t\t- [X] equal operator\n\t\t- [X] greater-than operator\n\t\t- [X] less-than operator\n\t\t- [X] between operator \n\n\n### Configuration:\n\n##### install via pip:   \n\n```pip install tableqa```\n\n##### installing from source:   \n\n```git clone https://github.com/abhijithneilabraham/tableQA ```  \n\n```cd tableqa```\n\n```python setup.py install```\n\n\n## Quickstart\n\n\n#### Do sample query\n\n```\nfrom tableqa.agent import Agent\nagent=Agent(df) #input your dataframe\nresponse=agent.query_db(\"Your question here\")\nprint(response)\n```\n\n#### Get an SQL query from the question\n```\nsql=agent.get_query(\"Your question here\")  \nprint(sql) #returns an sql query\n```\n\n\n#### Adding Manual schema\n\n\n\n##### Schema Format:\n```\n{\n    \"name\": DATABASE NAME,\n    \"keywords\":[DATABASE KEYWORDS],\n    \"columns\":\n    [\n        {\n        \"name\": COLUMN 1 NAME,\n        \"mapping\":{\n            CATEGORY 1: [CATEGORY 1 KEYWORDS],\n            CATEGORY 2: [CATEGORY 2 KEYWORDS]\n        }\n\n        },\n        {\n        \"name\": COLUMN 2 NAME,\n        \"keywords\": [COLUMN 2 KEYWORDS]\n        },\n        {\n        \"name\": \"COLUMN 3 NAME\",\n        \"keywords\": [COLUMN 3 KEYWORDS],\n        \"summable\":\"True\"\n        }\n    ]\n}\n\n```\n* Mappings are for those columns whose values have only few distinct classes.\n* Include only the column names which need to have manual keywords or mappings.Rest will will be autogenerated.\n* ```summable``` is included for Numeric Type columns whose values are already count representations. Eg. ```Death Count,Cases``` etc. consists values which already represent a count.\n\n\n\nExample (with manual schema):    \n\n\n##### Database query\n\n* Default Database - SQLite (File-based database, does not require creation of a separate connection.)\n```\nfrom tableqa.agent import Agent\nagent=Agent(df,schema) #pass the dataframe and schema objects\nresponse=agent.query_db(\"how many people died of stomach cancer in 2011\")\nprint(response)\n#Response =[(22,)]\n```\n\n* To use PostgreSQL, you must have a postgresql server installed and running on your local. To download postgresql, visit the [page](https://www.postgresql.org).\n```\nfrom tableqa.agent import Agent\nagent = Agent(df, schema_file, 'postgres', username='username', password='password', database='DBname', host='localhost', port=5432, aws_db=False)\nresponse=agent.query_db(\"how many people died of stomach cancer in 2011\")\nprint(response)\n#Response =[(22,)]\n```\n\n* To use MySQL, you must have a mysql server installed and running on your local. To download mysql, visit the [page](https://www.mysql.com/downloads/).\n```\nfrom tableqa.agent import Agent\nagent = Agent(df, schema_file, 'mysql', username='username', password='password', database='DBname', host='localhost', port=5432, aws_db=False)\nresponse=agent.query_db(\"how many people died of stomach cancer in 2011\")\nprint(response)\n#Response =[(22,)]\n\n```\n\n* To use PostgreSQL or MySQL on Amazon RDS, you must create a database on Amazon RDS. The RDS must be in public subnet with security groups allowing connections from outside of AWS. \n\nRefer to step 1 in the [document](https://aws.amazon.com/getting-started/hands-on/create-mysql-db/) to create a mysql db instance on Amazon RDS. Same steps can be followed for creating a PostgreSQL db instance by selecting PostgreSQL in the Engine tab. Obtain the username, password, database, endpoint, and port from your database connection details on Amazon RDS.\n```\nfrom tableqa.agent import Agent\nagent = Agent(df, schema_file, 'postgres', username='Master username', password='Master password', database='DB name', host='Endpoint', port='Port', aws_db=True)\nresponse=agent.query_db(\"how many people died of stomach cancer in 2011\")\nprint(response)\n#Response =[(22,)]\n\n```\n\n##### SQL query\n```\nsql=agent.get_query(\"How many people died of stomach cancer in 2011\")\nprint(sql)\n#sql query: SELECT SUM(Death_Count) FROM cancer_death WHERE Cancer_site = \"Stomach\" AND Year = \"2011\"\n```\n\n#### Multiple CSVs\n\n* Pass the absolute path of the directories containing the csvs and schemas respectively. Refer [cleaned_data](tableqa/cleaned_data)  and [schema](tableqa/schema) for examples.\n\n##### Example \n* Read CSV and Schema from local machine-\n```\ncsv_path=\"/content/tableQA/tableqa/cleaned_data\"\nschema_path=\"/content/tableQA/tableqa/schema\"\nagent=Agent(csv_path,schema_path)\n\n```\n\n* Read CSV and schema files from Amazon s3 - \n1) [Create a bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) on Amazon s3. \n2) [Upload objects](https://docs.aws.amazon.com/AmazonS3/latest/gsg/PuttingAnObjectInABucket.html) to the bucket.\n3) [Create an IAM user](https://www.atensoftware.com/p90.php?q=309) and provide it access to read files from Amazon s3 storage.\n4) Obtain the access key and secret access key for the user and pass it as an argument to the agent.\n\n```\ncsv_path=\"s3://{bucket}/cleaned_data\"\nschema_path=\"s3://{bucket}/schema\"\nagent = Agent(csv_path, schema_path, aws_s3=True, access_key_id=access_key_id, secret_access_key=secret_access_key)\n\n```\n\n#### Join us\n\nJoin our workspace:[Slack](https://join.slack.com/t/newworkspace-ehh1873/shared_invite/zt-hp3i6ic7-exMal1I4ZmFMWaHAwXk8HA)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhijithneilabraham%2Ftableqa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabhijithneilabraham%2Ftableqa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhijithneilabraham%2Ftableqa/lists"}