{"id":26333045,"url":"https://github.com/astrabert/phiqwenstem","last_synced_at":"2025-10-25T18:05:21.528Z","repository":{"id":277062269,"uuid":"923548222","full_name":"AstraBert/PhiQwenSTEM","owner":"AstraBert","description":"A reasoning assistant for your STEM education","archived":false,"fork":false,"pushed_at":"2025-02-11T22:40:23.000Z","size":1374,"stargazers_count":20,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-10T15:58:01.312Z","etag":null,"topics":["ai","datasets","huggingface","llm","phi-3-5","qdrant","qwen","react","reasoning","research","scientific","stem","vite","websockets"],"latest_commit_sha":null,"homepage":"https://pqstem.org","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AstraBert.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-28T12:57:09.000Z","updated_at":"2025-02-19T14:07:00.000Z","dependencies_parsed_at":"2025-02-11T23:34:40.779Z","dependency_job_id":"e7658a2a-4b28-42ed-adf3-ebd922c6f27b","html_url":"https://github.com/AstraBert/PhiQwenSTEM","commit_stats":null,"previous_names":["astrabert/phiqwenstem"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2FPhiQwenSTEM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2FPhiQwenSTEM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2FPhiQwenSTEM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2FPhiQwenSTEM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AstraBert","download_url":"https://codeload.github.com/AstraBert/PhiQwenSTEM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243806048,"owners_count":20350773,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","datasets","huggingface","llm","phi-3-5","qdrant","qwen","react","reasoning","research","scientific","stem","vite","websockets"],"created_at":"2025-03-15T23:37:44.572Z","updated_at":"2025-10-25T18:05:21.437Z","avatar_url":"https://github.com/AstraBert.png","language":"TypeScript","funding_links":["https://github.com/sponsors/AstraBert"],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003ePhiQwenSTEM\u003c/h1\u003e\r\n\u003ch2 align=\"center\"\u003eA reasoning assistant for your STEM education\u003c/h2\u003e\r\n\u003cdiv align=\"center\"\u003e\r\n    \u003cimg src=\"./phiQwenSTEM.png\" alt=\"PhiQwenSTEM Logo\"\u003e\r\n\u003c/div\u003e\r\n\r\n\u003e [!IMPORTANT]\r\n\u003e _The Proof Of Concept live at: https://pqstem.org is **no longer available**_\r\n\r\n**PhiQwenSTEM** is an assistant aimed at helping you solve complex STEM questions through reasoning. It is based on **Phi-3.5** and **QwQ-32B-preview** by Microsoft, provided by [HuggingFace](https://huggingface.co) Inference API, and has a vast knowledge base (more than 15,000 STEM questions) managed via [Qdrant](https://qdrant.tech).\r\n\r\n## Workflow\r\n\r\n\u003cdiv\u003e\r\n    \u003cimg src=\"PhiQwenSTEM_workflow.png\" width=700 height=600\u003e\r\n\u003c/div\u003e\r\n\r\n## How PhiQwenSTEM works\r\n\r\nPhiQwenSTEM operates through three main components:\r\n\r\n- **Front-end**: Utilizes Vite to render a landing page and a ChatGPT-like chat interface.\r\n- **Back-end**: Employs a Python-based websocket to process messages from the front-end and send responses.\r\n- **Database**: Uses a vector database built on [Qdrant](https://qdrant.tech) to store data for retrieval-augmented generation and semantic caching.\r\n\r\nOnce you launch the application, the vector database will ingest more than 15,000 STEM-related questions. Each question is associated with:\r\n- The question itself\r\n- `QwQ-32B-preview` reasoning about the question\r\n\r\nThe questions span the following domains of science:\r\n\r\n- Chemistry (General, Organic, and Biochemistry)\r\n- Physics\r\n- Physical Chemistry\r\n- Quantum Mechanics\r\n- Differential Equations\r\n- Linear Algebra\r\n- Electromagnetism\r\n- Mathematics\r\n- Engineering\r\n- Classical Mechanics\r\n\r\nThe data comes from the HuggingFace dataset [EricLu/SCP-116K](https://huggingface.co/datasets/EricLu/SCP-116K), made by more than 116,000 STEM-related questions accompanied by the ground truth answer, [`QwQ-32B-preview`](https://huggingface.co/Qwen/QwQ-32B-Preview) reasoning and solution and `o1` reasoning and solution: we selected questions (from the most represented domains in the dataset) in which reasoning by `QwQ-32B-preview` produced the correct answer. \r\n\r\nDense embeddings are obtained using the static text encoder [`tomaarsen/static-retrieval-mrl-en-v1`](https://huggingface.co/tomaarsen/static-retrieval-mrl-en-v1) (embedding size is truncated to 384), while sparse embeddings are generated with [`Qdrant/bm25`](https://huggingface.co/Qdrant/bm25). To speed up retrieval, the medical vector database leverages [binary quantization](https://qdrant.tech/articles/binary-quantization/).\r\n\r\nWhen a user asks a medical question:\r\n1. The backend first checks for similar questions in the semantic cache using [`modernbert-embed-base`](https://huggingface.co/nomic-ai/modernbert-embed-base). If a match is found, the corresponding answer is returned.\r\n2. If no significant match is found, it prompts [`Phi-3.5-mini-instruct`](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) (served on HuggingFace Inference API) to produce a question for searching the vector database.\r\n3. The optimized question prompts a hybrid search within the medical vector database. The top 2 ranking matches for both sparse and dense vectors are retrieved and re-scored by `modernbert-embed-base`.\r\n4. The top-ranking retrieved match (after re-scoring) is retained, and `QwQ-32B-preview`-generating reasoning (from the match payload) is passed on as a \"reasoning\" context.\r\n5. `QwQ-32B-preview` is prompted to produce an answer based on the reasoning.\r\n\r\n\u003e [!NOTE]\r\n\u003e `QwQ-32B-preview` is instructed to assess if the reasoning and the answer provided are valid, relevant to the user's question, and correct. It is also instructed to output an \"I don't know\" answer when the question is ambiguous and the solution is not completely clear.\r\n\r\n\r\n## Installation and usage\r\n\r\n### 1. Docker\r\n\r\n\u003e _Required: [Docker](https://docs.docker.com/desktop/) and [docker compose](https://docs.docker.com/compose/)_\r\n\r\n- Clone this repository\r\n\r\n```bash\r\ngit clone https://github.com/AstraBert/PhiQwenSTEM.git\r\ncd PhiQwenSTEM/docker-workflow/\r\n```\r\n\r\n- Add the `hf_token` secret in the [`.env.example`](./docker/.env.example) file and modify the name of the file to `.env`. You can get your HuggingFace token by [registering](https://huggingface.co/join) to HuggingFace and creating a [fine-grained token](https://huggingface.co/settings/tokens) that has access to the Inference API.\r\n\r\n```bash\r\n# modify your access token, e.g. hf_token=\"hf_abcdefg1234567\"\r\nmv .env.example .env\r\n```\r\n\r\n- Launch the docker application:\r\n\r\n```bash\r\n# If you are on Linux/macOS\r\nbash start_services.sh\r\n# If you are on Windows\r\n.\\start_services.ps1\r\n```\r\n\r\nYou will see the application running on http://localhost:8501 and you will be able to use it successfully only after the backend is set up (you can see it from the logs). Depending on your connection and on your hardware, this might take some time (up to 30 mins to set up).\r\n\r\n## Local\r\n\r\n\u003e _Required: [Docker](https://docs.docker.com/desktop/), [docker compose](https://docs.docker.com/compose/) and [conda](https://anaconda.org/anaconda/conda)_\r\n\r\n- Clone this repository\r\n\r\n```bash\r\ngit clone https://github.com/AstraBert/PhiQwenSTEM.git\r\ncd PhiQwenSTEM/local\r\n```\r\n\r\n- If you are on macOS/Linux, you can run:\r\n\r\n```bash\r\nbash local_setup.sh\r\n```\r\n\r\n- If you are on Windows, running all the commands separately might be optimal:\r\n\r\n```bash\r\n# Launch Qdrant\r\ndocker compose up -d\r\n\r\n# Create conda environment for the backend\r\nconda env create -f ./backend/environment.yml\r\nconda activate backend\r\n\r\n# Ingest data\r\npython3 data/toDatabase.py\r\n\r\n# Create a semantic cache\r\npython3 data/createCache.py\r\n\r\nconda deactivate\r\n\r\n# Install necessary dependencies for the UI\r\ncd chatbot-ui/\r\nnpm install\r\n\r\n# Back to the local folder\r\ncd ..\r\n```\r\n\r\n- Once you are done with the set-up, launch the UI:\r\n\r\n```bash\r\ncd chatbot-ui/\r\nnpm run dev\r\n```\r\n\r\n- And, on a separate terminal window, launch the backend:\r\n\r\n```bash\r\nconda activate backend\r\ncd backend/\r\npython3 backend.py\r\n```\r\n\r\nHead over to http://localhost:8501 and you should see PhiQwenSTEM up and running in less than one minute!\r\n\r\n## Contributions\r\n\r\nContributions are more than welcome! See [contribution guidelines](./CONTRIBUTING.md) for more information :)\r\n\r\n## Funding\r\n\r\nIf you found this project useful, please consider to [fund it](https://github.com/sponsors/AstraBert) and make it grow: let's support open-source together!😊\r\n\r\n## License and rights of usage\r\n\r\nThe software is hereby provided under an MIT license and is free to use.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrabert%2Fphiqwenstem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastrabert%2Fphiqwenstem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrabert%2Fphiqwenstem/lists"}