https://github.com/pixelcaliber/q-learning

Q-learning agent: a type of reinforcement learning algorithm where an agent learns to take actions in an environment by maximizing a cumulative reward.
https://github.com/pixelcaliber/q-learning

gametheory machine-learning python qlearning-algorithm reinforcement-learning

Last synced: 12 days ago
JSON representation

Q-learning agent: a type of reinforcement learning algorithm where an agent learns to take actions in an environment by maximizing a cumulative reward.

Host: GitHub
URL: https://github.com/pixelcaliber/q-learning
Owner: pixelcaliber
Created: 2025-02-21T18:10:29.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2025-02-23T18:06:34.000Z (about 1 year ago)
Last Synced: 2025-10-30T00:36:55.476Z (6 months ago)
Topics: gametheory, machine-learning, python, qlearning-algorithm, reinforcement-learning
Language: Python
Homepage: https://t3-ai.vercel.app/
Size: 55.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Q-Learning Agent

### Concept and Theory

Q-learning is a type of reinforcement learning algorithm where an agent learns to take actions in an environment by maximizing a cumulative reward. Key concepts include:

- **State:**
The current configuration of the Tic Tac Toe board.
- **Action:**
A move (placing an 'X' or 'O') in an available cell.
- **Reward:**
Feedback received after each move (e.g., winning gives a positive reward, losing gives a negative reward).
- **Q-Value:**
The expected future reward for taking a certain action from a given state. The agent updates these Q-values over time based on its experience.
- **Exploration vs. Exploitation:**
The agent uses an epsilon-greedy strategy to balance between exploring new moves and exploiting known moves with high Q-values.

### Agent Training and Decision Making

1. **Initialization:**
The agent loads a pre-saved model (if available) at the start of each request.
2. **Choosing an Action:**
The agent examines the board state, evaluates available moves, and selects a move by balancing between exploration (random moves) and exploitation (best-known move).
3. **Learning from Experience:**
After each game, the game logger stores the moves and results. The agent uses this log to update its Q-values through the learning process.
4. **Model Persistence:**
The updated model is saved to disk so that the agent can retain its learning across sessions.

## API Endpoints

- **GET /health:**
Checks the health of the application.
- **GET /game?session_id=YOUR_SESSION_ID:**
Returns the current game state, including the board, result, and scoreboard.
- **POST /move:**
Submits a move. Requires a JSON payload with `move` and `session_id`.
- **GET /reset?session_id=YOUR_SESSION_ID:**
Resets the game board (preserving the scoreboard).
- **GET /delete_session?session_id=YOUR_SESSION_ID:**
Deletes the session data when the user closes the tab.

> **Rate Limiting:**
> All endpoints are rate-limited using Flask-Limiter to prevent abuse. For example, the `/move` endpoint is limited to 60 requests per minute.

## Running the Project

- **Backend Setup:**
- Install dependencies: `pip install -r requirements.txt`
- Set up configuration (e.g., `Config.RATE_LIMIT_STORAGE_URL`, `MODEL_SAVE_PATH`).
- Run the Flask app: `flask run`
- **Frontend**: https://github.com/pixelcaliber/t3-ai/blob/master/README.md

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pixelcaliber/q-learning

Awesome Lists containing this project

README