https://github.com/melvinmo/genetic-programming-with-indexed-memory-for-cartpole
An implementation of Genetic Programming using Indexed Memory to solve the Partially Observable CartPole problem
https://github.com/melvinmo/genetic-programming-with-indexed-memory-for-cartpole
cartpole deap genetic-programming memory-allocation partially-observable-environment reinforcement-learning
Last synced: 3 months ago
JSON representation
An implementation of Genetic Programming using Indexed Memory to solve the Partially Observable CartPole problem
- Host: GitHub
- URL: https://github.com/melvinmo/genetic-programming-with-indexed-memory-for-cartpole
- Owner: MelvinMo
- License: mit
- Created: 2025-05-31T20:13:13.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-05-31T20:15:37.000Z (4 months ago)
- Last Synced: 2025-06-01T08:09:59.527Z (4 months ago)
- Topics: cartpole, deap, genetic-programming, memory-allocation, partially-observable-environment, reinforcement-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 609 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Genetic Programming with Indexed Memory for Partially Observable CartPole
This repository contains an implementation of Genetic Programming (GP) augmented with indexed memory to solve a partially observable version of the CartPole environment from Gymnasium.
## Overview
The goal is to evolve policies using GP that leverage external memory to compensate for missing velocity observations in the CartPole environment, where only cart position and pole angle are observable. The system is implemented using the DEAP library and interfaces with Gymnasium.
## Key Features
- **Partially Observable Environment**: A custom `PartialObservationWrapper` modifies the CartPole-v1 environment to provide only `[cart_position, pole_angle]`, omitting velocities.
- **Indexed Memory**: A fixed-size memory vector (size=4) stores floating-point values, reset to zeros per episode, and persists across time steps within an episode.
- **GP Primitives**:
- **Memory Operations**:
- `mem_read(index)`: Reads from memory at `int(index) % memory_size`.
- `mem_write(index, value)`: Writes value to memory at `int(index) % memory_size` and returns value.
- **Arithmetic**: `add`, `sub`, `mul`, `protectedDiv`.
- **Trigonometric**: `sin`, `cos`.
- **Conditional**: `if_then_else(condition, out1, out2)` returns `out1` if `condition > 0`, else `out2`.
- **Ephemeral Constants**: Random floats in `[-1, 1]`.
- **Action Selection**: GP tree outputs a float, thresholded at 0 to select discrete actions (0: left, 1: right).
- **Fitness**: Average reward over 20 episodes (max 500 steps per episode).## Implementation Details
- **DEAP Setup**:
- Population: 100 individuals.
- Generations: 50.
- Crossover Probability: 0.8.
- Mutation Probability: 0.2.
- Selection: Tournament (size=3).
- Tree Constraints: Max height of 17.- **Memory Handling**:
- Memory is a global list within `evalRL`, ensuring each multiprocessing worker has its own instance.
- No autoregressive state is hardcoded, adhering to assignment constraints.## Experiments
- **With Memory**: Full primitive set including `mem_read` and `mem_write`.
- **Without Memory**: Excludes memory primitives for baseline comparison.
- **Fully Observable**: Uses all four observation variables (baseline from `genetic_programming_1_400611523.ipynb`).## Results
- **Training Curves**: Plotted in `gp_indexed_memory_400611523.ipynb`:
- Indexed Memory: Converges to ~500 fitness by generation 20, matching the fully observable policy.
- No Memory: Stagnates at low fitness (~50-100), unable to compensate for missing velocities.
- Fully Observable: Reaches ~500 fitness, serving as an upper bound.- **Best Tree**: Visualized using PyGraphviz, showing complex use of memory and conditionals (saved as `best_tree.png`).
## Dependencies
- Python 3.10
- deap==1.4.1
- gymnasium[classic-control]==1.0.0
- numpy==1.26.4
- matplotlib
- pygraphviz
- pygame==2.6.1## Running the Code
1. Install dependencies: `pip install -r requirements.txt`.
2. Run the notebook: `jupyter notebook gp_indexed_memory_400611523.ipynb`.
- Executes both experiments and generates plots.## Approach
- **Environment**: Wrapped CartPole-v1 to limit observations.
- **GP System**: Extended DEAP’s primitive set with memory primitives inspired by Teller’s "Learning Mental Models".
- **Evaluation**: Parallelized using multiprocessing, with memory isolation per process.