An open API service indexing awesome lists of open source software.

https://github.com/quantum-software-development/q-star

QMaths
https://github.com/quantum-software-development/q-star

Last synced: about 1 month ago
JSON representation

QMaths

Awesome Lists containing this project

README

        

# Q-Star [Q*]() in Reinforcement Learning

*Author: Miquel Noguer i Alonso - Founder at AI Finance Institute*
*Date: November 23, 2023*

#

## [Q*]() is the currently accepted notation for the Optimal Action Value Function in RL.

Q* RL algorithm might be using AI generated data (Logic + Maths) and teaches the LLM to solve multi-step logic problems. Q* might be applied to GPT-5, giving it excellent reasoning and retrieval skills.

#

## Reasoning

The biggest gains on reasoning come from strong reward models, as opposed to more SFT data or tools.

Much of (unpublished) research is now focused on finding a general planning algorithm for LLMs, i.e. some equivalent of the dlPFC. So PLANNING is the name of the game.

#

## Maths

In the literature, we have seen different approaches to teaching math to AI models like Transformers + Beam Search or Large language models, which are capable of solving tasks that require complex multistep reasoning by generating solutions in a step-by-step chain-of-thought format.

One effective method in the second involves training reward models to discriminate between desirable and undesirable outputs.

#

## Abstract
[Access this document](https://github.com/Quantum-Software-Development/Q-Star/blob/1e3dfd901f7ae1e9830f96f7e8c830cecbd5e804/Bellman%20Q*/Q*%20Bellman%20Doc.pdf) for a comprehensive overview of the Q-Star (Q*) concept in reinforcement learning, which delves into its mathematical formulation, significance, and the methods employed for approximation in learning algorithms.

#

Q* [Bellman Equality]()

![Q* Bellman Equality](https://github.com/Quantum-Software-Development/Q-Star/assets/113218619/91c383e8-5c31-4695-8236-b56e58b2a59a)

#

In the literature we see two distinct methods

for training reward models: outcome supervision & process supervision.

[Hodge-RiemannN Cohomology Classes]()

![Hodge-RiemannN Cohomology Classes](https://github.com/Quantum-Software-Development/Q-Star/assets/113218619/2aacaba9-dcc7-4a60-be18-9d2e4885b7a3)

#

###

[![Sponsor Quantum Software Development](https://img.shields.io/badge/Sponsor-Quantum%20Software%20Development-brightgreen?logo=GitHub)](https://github.com/sponsors/Quantum-Software-Development)

#

######

[Copyright 2024 Quantum-Software-Development. Code released under the MIT license.](https://github.com/Quantum-Software-Development/Q-Star/blob/f5115a1a073bdb3fa68c51bb3b3414c8e0b0270e/LICENSE)