https://github.com/quantum-software-development/q-star
QMaths
https://github.com/quantum-software-development/q-star
Last synced: about 1 month ago
JSON representation
QMaths
- Host: GitHub
- URL: https://github.com/quantum-software-development/q-star
- Owner: Quantum-Software-Development
- License: mit
- Created: 2024-01-13T17:29:07.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-07T11:50:22.000Z (3 months ago)
- Last Synced: 2025-03-27T13:51:18.425Z (about 2 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 1.69 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Q-Star [Q*]() in Reinforcement Learning
*Author: Miquel Noguer i Alonso - Founder at AI Finance Institute*
*Date: November 23, 2023*#
## [Q*]() is the currently accepted notation for the Optimal Action Value Function in RL.
Q* RL algorithm might be using AI generated data (Logic + Maths) and teaches the LLM to solve multi-step logic problems. Q* might be applied to GPT-5, giving it excellent reasoning and retrieval skills.
#
## Reasoning
The biggest gains on reasoning come from strong reward models, as opposed to more SFT data or tools.
Much of (unpublished) research is now focused on finding a general planning algorithm for LLMs, i.e. some equivalent of the dlPFC. So PLANNING is the name of the game.
#
## Maths
In the literature, we have seen different approaches to teaching math to AI models like Transformers + Beam Search or Large language models, which are capable of solving tasks that require complex multistep reasoning by generating solutions in a step-by-step chain-of-thought format.
One effective method in the second involves training reward models to discriminate between desirable and undesirable outputs.
#
## Abstract
[Access this document](https://github.com/Quantum-Software-Development/Q-Star/blob/1e3dfd901f7ae1e9830f96f7e8c830cecbd5e804/Bellman%20Q*/Q*%20Bellman%20Doc.pdf) for a comprehensive overview of the Q-Star (Q*) concept in reinforcement learning, which delves into its mathematical formulation, significance, and the methods employed for approximation in learning algorithms.#
Q* [Bellman Equality]()

#
In the literature we see two distinct methods
for training reward models: outcome supervision & process supervision.
[Hodge-RiemannN Cohomology Classes]()

#
###
[](https://github.com/sponsors/Quantum-Software-Development)
#
######
[Copyright 2024 Quantum-Software-Development. Code released under the MIT license.](https://github.com/Quantum-Software-Development/Q-Star/blob/f5115a1a073bdb3fa68c51bb3b3414c8e0b0270e/LICENSE)