https://github.com/quantum-software-development/q-star

QMaths
https://github.com/quantum-software-development/q-star

Last synced: 3 months ago
JSON representation

QMaths

Host: GitHub
URL: https://github.com/quantum-software-development/q-star
Owner: Quantum-Software-Development
License: mit
Created: 2024-01-13T17:29:07.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-07T11:50:22.000Z (4 months ago)
Last Synced: 2025-03-27T13:51:18.425Z (4 months ago)
Language: Jupyter Notebook
Homepage:
Size: 1.69 MB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

# Q-Star [Q*]() in Reinforcement Learning

*Author: Miquel Noguer i Alonso - Founder at AI Finance Institute*
*Date: November 23, 2023*

## [Q*]() is the currently accepted notation for the Optimal Action Value Function in RL.

Q* RL algorithm might be using AI generated data (Logic + Maths) and teaches the LLM to solve multi-step logic problems. Q* might be applied to GPT-5, giving it excellent reasoning and retrieval skills.

## Reasoning

The biggest gains on reasoning come from strong reward models, as opposed to more SFT data or tools.

Much of (unpublished) research is now focused on finding a general planning algorithm for LLMs, i.e. some equivalent of the dlPFC. So PLANNING is the name of the game.

## Maths

In the literature, we have seen different approaches to teaching math to AI models like Transformers + Beam Search or Large language models, which are capable of solving tasks that require complex multistep reasoning by generating solutions in a step-by-step chain-of-thought format.

One effective method in the second involves training reward models to discriminate between desirable and undesirable outputs.

## Abstract
[Access this document](https://github.com/Quantum-Software-Development/Q-Star/blob/1e3dfd901f7ae1e9830f96f7e8c830cecbd5e804/Bellman%20Q*/Q*%20Bellman%20Doc.pdf) for a comprehensive overview of the Q-Star (Q*) concept in reinforcement learning, which delves into its mathematical formulation, significance, and the methods employed for approximation in learning algorithms.

Q* [Bellman Equality]()

![Q* Bellman Equality](https://github.com/Quantum-Software-Development/Q-Star/assets/113218619/91c383e8-5c31-4695-8236-b56e58b2a59a)

In the literature we see two distinct methods

for training reward models: outcome supervision & process supervision.

[Hodge-RiemannN Cohomology Classes]()

![Hodge-RiemannN Cohomology Classes](https://github.com/Quantum-Software-Development/Q-Star/assets/113218619/2aacaba9-dcc7-4a60-be18-9d2e4885b7a3)

###

[![Sponsor Quantum Software Development](https://img.shields.io/badge/Sponsor-Quantum%20Software%20Development-brightgreen?logo=GitHub)](https://github.com/sponsors/Quantum-Software-Development)

######

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/quantum-software-development/q-star

Awesome Lists containing this project

README