{"id":31654699,"url":"https://github.com/4n33sh/oceanquery","last_synced_at":"2025-10-07T12:15:08.912Z","repository":{"id":317580137,"uuid":"1060262785","full_name":"4n33sh/OceanQuery","owner":"4n33sh","description":"Prototype ARGO ChatBot (FloatChat) for Smart India Hackathon 2025 by Team Ocean Eyes.","archived":false,"fork":false,"pushed_at":"2025-10-01T18:48:01.000Z","size":102,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-01T20:37:53.067Z","etag":null,"topics":["bert-encoder","chromadb","flask","langchain","oceangpt","rag-chatbot","spacy"],"latest_commit_sha":null,"homepage":"https://youtu.be/HLMw5QCKNGE","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/4n33sh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-19T16:16:32.000Z","updated_at":"2025-10-01T19:33:54.000Z","dependencies_parsed_at":"2025-10-01T20:37:55.349Z","dependency_job_id":null,"html_url":"https://github.com/4n33sh/OceanQuery","commit_stats":null,"previous_names":["4n33sh/oceanquery"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/4n33sh/OceanQuery","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4n33sh%2FOceanQuery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4n33sh%2FOceanQuery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4n33sh%2FOceanQuery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4n33sh%2FOceanQuery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/4n33sh","download_url":"https://codeload.github.com/4n33sh/OceanQuery/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4n33sh%2FOceanQuery/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278772678,"owners_count":26043221,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert-encoder","chromadb","flask","langchain","oceangpt","rag-chatbot","spacy"],"created_at":"2025-10-07T12:15:07.509Z","updated_at":"2025-10-07T12:15:08.905Z","avatar_url":"https://github.com/4n33sh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# OceanQuery 🌊 \n## AI-Powered ChatBot for ARGO Ocean Data Discovery and Visualization\n\n\u003cimg src=\"https://img.shields.io/badge/License_-GPL%203.0-orange\"\u003e \n\u003cimg src=\"https://img.shields.io/badge/python_-\u003e=%203.1-blue\"\u003e \n\u003cimg src=\"https://img.shields.io/badge/Version-v0.7.3-yellow\"\u003e\n\n---\n\n**OceanQuery is an AI-powered chatbot that helps you in finding out the optimal solution for your oceanographic needs by taking in your query as input and generating an solution that matches your query by processing it in through our RAG-modal.**\n\n### [Video Demo](https://youtu.be/HLMw5QCKNGE) | [Source Code](https://github.com/4n33sh/OceanQuery/blob/main/main.py)\n\n\u003c/div\u003e\n\n---\n\n# Our Tech Approach towards building the Tool\n\nDuring initial conception we set some fundamental **constraints** before proceeding forward:\n\n1. The tool must be able to take in and process the user's query through either **plain text or netCDF file(s) or both**.\n\n2. Use of **RAG** (paired with an netCDF parser alongside BeRT) alongside an **LLM** (we chose toeither implement either **LLAmarine or OceanGPT** since these models were trained specially for oceanographic/marine purposes) to properly **map individual queries to their respective vectorestore DB** (with MCP). This helps us in implementing the best from each of the tech stack.\n\n3. Display the **generated output** (alongside their **references/proofs**) for the user to review. If he/she isi'nt satisfied with the output, we'll take another **feedback query** and **repeat the process until his/her satisfaction**.\n\nBased on above constraints, the following technical approach was devised.\n\n\u003cimg width=\"1814\" height=\"1109\" alt=\"final\" src=\"https://github.com/user-attachments/assets/0e6f8217-b5c4-49c8-99b8-40fd02427a3e\" /\u003e\n\n**Entity recognition** is achieved through **spaCy's NLP** (through it's large dataset) and ML Training based with **BeRT** was implemented through **TensorFlow and HF transformers**.\n\n---\n\n# Installation \u0026 Running\n\n* (optional) **Create \u0026 activate** new python **virtual (.venv) environment** and update pip configs :  ``` python3 -m venv ~/your/preffered/path \u0026\u0026 source ~/your/preffered/path/bin/activate \u0026\u0026 pip install --upgrade pip setuptools wheel ```\n\n* Install required **external packages/modules** (~250-300mb) : ``` pip install chroma pgvector langchain flask ```\n\n* **Clone** the repo into your preferred directory : ``` git clone https://github.com/4n33sh/OceanQuery.git ```\n\n* Change directory **(cd)** into OceanQuery and install the requirements : ``` cd OceanQuery \u0026\u0026 pip install -r requirements.txt ```\n\n* Install spaCy NLP (large) **dataset** (~540mb) : ``` python3 -m spacy download en_core_web_lg ```\n\n* Alter **permissions** of **main.py** file and **run** it : ``` chmod u+x main.py \u0026\u0026 python3 main.py ```\n\n---\n\n# Update Log\n\n- **[19-09-2025]** : Implemented crude version of the RAG-modal. Combo of LLAmarine + Claude (token-based retrieval) was used as base LLM and spaCy as NLP for tokenizing the duplicate query. Basic understanding formed and encoder work in progress.\n \n- **[24-09-2025]** : Vectorstore implemented (PostgreSQL used to store duplicate query) with chroma. Encoder setup and Feedback loop properly implemented.\n\n- **[29-09-2025]** : UI/UX work in progress but usable for the final demo (btw some issues still persist with encoder still). Also multilingual support added with (m)BeRT (enables tokenization in ~100 languages but no final context-based textual translation).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4n33sh%2Foceanquery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F4n33sh%2Foceanquery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4n33sh%2Foceanquery/lists"}