{"id":21228969,"url":"https://github.com/quantum-software-development/q-star","last_synced_at":"2025-04-13T23:44:59.235Z","repository":{"id":216995235,"uuid":"742894293","full_name":"Quantum-Software-Development/Q-Star","owner":"Quantum-Software-Development","description":"QMaths","archived":false,"fork":false,"pushed_at":"2025-03-07T11:50:22.000Z","size":1775,"stargazers_count":4,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T13:51:18.425Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Quantum-Software-Development.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"Quantum-Software-Developmen","Custom":"https://github.com/sponsors/Quantum-Software-Development/card"}},"created_at":"2024-01-13T17:29:07.000Z","updated_at":"2025-03-07T11:50:25.000Z","dependencies_parsed_at":"2025-01-21T17:45:14.341Z","dependency_job_id":"99a16983-1a4d-4e10-93a9-f97206a5422d","html_url":"https://github.com/Quantum-Software-Development/Q-Star","commit_stats":null,"previous_names":["quantum-software-development/qmaths","quantum-software-development/q-star"],"tags_count":0,"template":false,"template_full_name":"Quantum-Software-Development/.github","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2FQ-Star","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2FQ-Star/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2FQ-Star/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2FQ-Star/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Quantum-Software-Development","download_url":"https://codeload.github.com/Quantum-Software-Development/Q-Star/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248799672,"owners_count":21163398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-20T23:23:43.024Z","updated_at":"2025-04-13T23:44:59.222Z","avatar_url":"https://github.com/Quantum-Software-Development.png","language":"Jupyter Notebook","readme":"\n\u003c!-- header --\u003e\n\n\u003c!--\n\u003cdiv style=\"text-align:center;\"\u003e\n    \u003cspan style=\"display: block; background-color: black; padding: 20px;\"\u003e\n        \u003cimg src=\"https://via.placeholder.com/1200x200/000000/FFFFFF?text=%20%20%20%20%20%20%20%20%20%20%20%20Q%2A\" alt=\"Q*\" style=\"font-size: 50px; font-weight: bold;\"\u003e\n    \u003c/span\u003e\n\u003c/div\u003e\n\n\u003c!-- end header --\u003e\n\n\n\n# Q-Star [Q*]() in Reinforcement Learning\n\n*Author: Miquel Noguer i Alonso - Founder at AI Finance Institute*  \n*Date: November 23, 2023*\n\n#\n\n## [Q*]() is the currently accepted notation for the Optimal Action Value Function in RL. \n\nQ* RL algorithm might be using AI generated data (Logic + Maths) and teaches the LLM to solve multi-step logic problems. Q* might be applied to GPT-5, giving it excellent reasoning and retrieval skills.\n\n#\n\n## Reasoning\n\nThe biggest gains on reasoning come from strong reward models, as opposed to more SFT data or tools.\n\nMuch of (unpublished) research is now focused on finding a general planning algorithm for LLMs, i.e. some equivalent of the dlPFC. So PLANNING is the name of the game.\n\n#\n\n## Maths\n\nIn the literature, we have seen different approaches to teaching math to AI models like Transformers + Beam Search or Large language models, which are capable of solving tasks that require complex multistep reasoning by generating solutions in a step-by-step chain-of-thought format. \n\nOne effective method in the second involves training reward models to discriminate between desirable and undesirable outputs. \n\n\n#\n\n## Abstract\n[Access this document](https://github.com/Quantum-Software-Development/Q-Star/blob/1e3dfd901f7ae1e9830f96f7e8c830cecbd5e804/Bellman%20Q*/Q*%20Bellman%20Doc.pdf) for a comprehensive overview of the Q-Star (Q*) concept in reinforcement learning, which delves into its mathematical formulation, significance, and the methods employed for approximation in learning algorithms.\n\n#\n\nQ* [Bellman Equality]()\n\n![Q* Bellman Equality](https://github.com/Quantum-Software-Development/Q-Star/assets/113218619/91c383e8-5c31-4695-8236-b56e58b2a59a)\n\n\n#\n\nIn the literature we see two distinct methods\n\nfor training reward models: outcome supervision \u0026 process supervision.\n\n\n\n[Hodge-RiemannN Cohomology Classes]()\n\n![Hodge-RiemannN Cohomology Classes](https://github.com/Quantum-Software-Development/Q-Star/assets/113218619/2aacaba9-dcc7-4a60-be18-9d2e4885b7a3)\n\n\n\n\n\n\n\n\n\n\u003c!-- https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/about-writing-and-formatting-on-github\n\n\nhttps://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/about-writing-and-formatting-on-github\n\n\nhttps://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables\n\n\n -https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-collapsed-sections \n\n -https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions\n\nhttps://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks\n\nhttps://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-diagrams\n\nhttps://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/autolinked-references-and-urls\n\nhttps://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/about-task-lists\n\nhttps://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-a-permanent-link-to-a-code-snippet\n\nhttps://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests\n\nhttps://docs.github.com/en/get-started/using-git/about-git\n\nhttps://docs.github.com/en/get-started/using-git/pushing-commits-to-a-remote-repository --\u003e\n\n\n#\n\n\u003c!-- footer \n\n\u003cdiv align=\"center\"\u003e\n  \u003cp\u003e \u003cem\u003emade with vibe, frequency \u0026 joy\u003c/em\u003e \u003c/p\u003e --\u003e\n\n### \u003cp align=\"center\"\u003e   [![Sponsor Quantum Software Development](https://img.shields.io/badge/Sponsor-Quantum%20Software%20Development-brightgreen?logo=GitHub)](https://github.com/sponsors/Quantum-Software-Development)\n\n\u003c/div\u003e\n\n\u003c!-- end footer --\u003e\n\n#\n\n######  \u003cp align=\"center\"\u003e [Copyright 2024 Quantum-Software-Development. Code released under the MIT license.](https://github.com/Quantum-Software-Development/Q-Star/blob/f5115a1a073bdb3fa68c51bb3b3414c8e0b0270e/LICENSE)\n\n","funding_links":["https://github.com/sponsors/Quantum-Software-Developmen","https://github.com/sponsors/Quantum-Software-Development/card","https://github.com/sponsors/Quantum-Software-Development"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantum-software-development%2Fq-star","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantum-software-development%2Fq-star","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantum-software-development%2Fq-star/lists"}