{"id":23932869,"url":"https://github.com/ivankqw/sg-data-analyst","last_synced_at":"2025-09-11T15:32:47.376Z","repository":{"id":188925480,"uuid":"679636494","full_name":"ivankqw/sg-data-analyst","owner":"ivankqw","description":"LLMs as Data Analysts over Singapore Datasets 🤖","archived":false,"fork":false,"pushed_at":"2023-08-22T10:32:09.000Z","size":25,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-08-22T12:49:12.442Z","etag":null,"topics":["faiss","langchain-python","openai-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ivankqw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-08-17T09:27:45.000Z","updated_at":"2023-08-22T12:49:12.443Z","dependencies_parsed_at":null,"dependency_job_id":"7a369af1-e58f-44e1-95a9-2baa01b50892","html_url":"https://github.com/ivankqw/sg-data-analyst","commit_stats":null,"previous_names":["ivankqw/sg-data-analyst"],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivankqw%2Fsg-data-analyst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivankqw%2Fsg-data-analyst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivankqw%2Fsg-data-analyst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivankqw%2Fsg-data-analyst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ivankqw","download_url":"https://codeload.github.com/ivankqw/sg-data-analyst/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":232657705,"owners_count":18556887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["faiss","langchain-python","openai-api"],"created_at":"2025-01-06T00:29:24.390Z","updated_at":"2025-01-06T00:29:49.475Z","avatar_url":"https://github.com/ivankqw.png","language":"Python","funding_links":[],"categories":["Building"],"sub_categories":["Datasets"],"readme":"# sg-data-analyst\n\n[Medium Article](https://ivankqw.medium.com/sg-data-analyst-question-answering-over-a-dataset-repository-with-reasoning-agents-dab7f05ed8bf)\n\n## Explore Singapore Datasets with SG Data Analyst 🤖\n\nBased on your query, SG Data Analyst will find the most relevant datasets from [data.gov.sg](https://data.gov.sg/) and [data.gov.sg's API](https://developers.data.gov.sg/). \n\nUsing Langchain's Pandas DataFrame Agent, SG Data Analyst will then perform exploratory data analysis on the datasets and gather insights to answer your query.\n\n## Quickstart\n\n1. Install requirements\n```\npip install -r requirements.txt\n```\n2. Add your OpenAI API key to the environment\n\n- It's best to have access to GPT-4, `gpt-4-0613` in particular since we rely on the [OpenAI functions API](https://openai.com/blog/function-calling-and-other-api-updates)\n- Alternatively, `gpt-3.5-turbo-0613` or `gpt-3.5-turbo-16k-0613` should work as well, though not as well as GPT-4\n- Changes can be made in `config.py` \n```\nexport OPENAI_API_KEY=\u003cyour-api-key\u003e\n```\nAlternatively, you can add your API key to a `.env` file in the root directory of this project. Refer to `dev.env` for the format.\n\n3. Run the streamlit app\n```\nstreamlit run app.py\n```\n4. Try it out!\n\n\u003e😃: What are some observations in university education?\n\n\u003e👨‍🔬: Some observations in university education based on the data include:\n\u003e1. The data covers a wide range of fields of study, including 'Accountancy', 'Architecture, Building \u0026 Real Estate', 'Business \u0026 Administration', 'Dentistry', 'Education', 'Engineering Sciences', 'Fine \u0026 Applied Arts', 'Health Sciences', 'Humanities \u0026 Social Sciences', 'Information Technology', 'Law', 'Mass Communication', 'Medicine', 'Natural, Physical \u0026 Mathematical Sciences', 'Services', and 'Natural \u0026 Mathematical Sciences'.\n\u003e2. The data includes information on both male and female students, as well as a combined category for both sexes.\n\u003e3. The data covers a wide range of years, from 1982 to 2021.\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fivankqw%2Fsg-data-analyst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fivankqw%2Fsg-data-analyst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fivankqw%2Fsg-data-analyst/lists"}