https://github.com/pixelcaliber/q-learning
Q-learning agent: a type of reinforcement learning algorithm where an agent learns to take actions in an environment by maximizing a cumulative reward.
https://github.com/pixelcaliber/q-learning
gametheory machine-learning python qlearning-algorithm reinforcement-learning
Last synced: 12 days ago
JSON representation
Q-learning agent: a type of reinforcement learning algorithm where an agent learns to take actions in an environment by maximizing a cumulative reward.
- Host: GitHub
- URL: https://github.com/pixelcaliber/q-learning
- Owner: pixelcaliber
- Created: 2025-02-21T18:10:29.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2025-02-23T18:06:34.000Z (about 1 year ago)
- Last Synced: 2025-10-30T00:36:55.476Z (6 months ago)
- Topics: gametheory, machine-learning, python, qlearning-algorithm, reinforcement-learning
- Language: Python
- Homepage: https://t3-ai.vercel.app/
- Size: 55.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Q-Learning Agent
### Concept and Theory
Q-learning is a type of reinforcement learning algorithm where an agent learns to take actions in an environment by maximizing a cumulative reward. Key concepts include:
- **State:**
The current configuration of the Tic Tac Toe board.
- **Action:**
A move (placing an 'X' or 'O') in an available cell.
- **Reward:**
Feedback received after each move (e.g., winning gives a positive reward, losing gives a negative reward).
- **Q-Value:**
The expected future reward for taking a certain action from a given state. The agent updates these Q-values over time based on its experience.
- **Exploration vs. Exploitation:**
The agent uses an epsilon-greedy strategy to balance between exploring new moves and exploiting known moves with high Q-values.
### Agent Training and Decision Making
1. **Initialization:**
The agent loads a pre-saved model (if available) at the start of each request.
2. **Choosing an Action:**
The agent examines the board state, evaluates available moves, and selects a move by balancing between exploration (random moves) and exploitation (best-known move).
3. **Learning from Experience:**
After each game, the game logger stores the moves and results. The agent uses this log to update its Q-values through the learning process.
4. **Model Persistence:**
The updated model is saved to disk so that the agent can retain its learning across sessions.
## API Endpoints
- **GET /health:**
Checks the health of the application.
- **GET /game?session_id=YOUR_SESSION_ID:**
Returns the current game state, including the board, result, and scoreboard.
- **POST /move:**
Submits a move. Requires a JSON payload with `move` and `session_id`.
- **GET /reset?session_id=YOUR_SESSION_ID:**
Resets the game board (preserving the scoreboard).
- **GET /delete_session?session_id=YOUR_SESSION_ID:**
Deletes the session data when the user closes the tab.
> **Rate Limiting:**
> All endpoints are rate-limited using Flask-Limiter to prevent abuse. For example, the `/move` endpoint is limited to 60 requests per minute.
## Running the Project
- **Backend Setup:**
- Install dependencies: `pip install -r requirements.txt`
- Set up configuration (e.g., `Config.RATE_LIMIT_STORAGE_URL`, `MODEL_SAVE_PATH`).
- Run the Flask app: `flask run`
- **Frontend**: https://github.com/pixelcaliber/t3-ai/blob/master/README.md