https://github.com/morvanzhou/mevo

Evolutionary algorithms, alternative to Reinforcement Learning
https://github.com/morvanzhou/mevo
evolution-strategies evolutionary-algorithms genetic-algorithm neural-evolution reinforcement-learning
Last synced: 6 months ago
JSON representation
Evolutionary algorithms, alternative to Reinforcement Learning
Host: GitHub
URL: https://github.com/morvanzhou/mevo
Owner: MorvanZhou
License: mit
Created: 2023-02-02T01:57:13.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-02-14T02:58:51.000Z (over 2 years ago)
Last Synced: 2025-03-24T12:55:32.462Z (7 months ago)
Topics: evolution-strategies, evolutionary-algorithms, genetic-algorithm, neural-evolution, reinforcement-learning
Language: Python
Homepage:
Size: 1.64 MB
Stars: 38
Watchers: 6
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # MEvo

---

[![Unittest](https://github.com/MorvanZhou/mevo/actions/workflows/python-package.yml/badge.svg)](https://github.com/MorvanZhou/mevo/actions/workflows/python-package.yml)

[![GitHub license](https://img.shields.io/github/license/MorvanZhou/mevo)](https://github.com/MorvanZhou/mevo/blob/master/LICENSE)

MEvo is a evolutionary algorithm package, which implements Genetic Algorithm and Evolution Strategies Algorithms.

It can be used to form an agent strategy or optimize math problems.

# Quick look

Training a cartpole policy by using Genetic Algorithm.

```python

import mevo

import gymnasium

import numpy as np

# define a fitness function to get fitness for every individual

def fitness_fn(ind: mevo.individuals.Individual, conf: dict) -> float:

    ep_r = 0

    env = gymnasium.make('CartPole-v1')

    for _ in range(2):

        s, _ = env.reset()

        for _ in range(500):  # in one episode

            logits = ind.predict(s)

            a = np.argmax(logits)

            s, r, done, _, _ = env.step(a)

            ep_r += r

            if done:

                break

    return ep_r

# training

with mevo.GeneticAlgoNet(max_size=20, layer_size=[4, 8, 2], drop_rate=0.7, mutate_rate=0.5) as pop:

    for generation in range(40):

        pop.evolve(fitness_fn=fitness_fn)

        print(f"generation={generation}, top_fitness={pop.top.fitness:.f2}")

```

After only 40 generations in a population (20 individuals), it gives a greate result.

Use following code to visualize the learned policy.

```python

# deploy the best individual

env = gymnasium.make('CartPole-v1', render_mode="human")

while True:

    s, _ = env.reset()

    while True:  # in one episode

        logits = pop.top.predict(s)

        a = np.argmax(logits)

        s, _, done = env.step(a)[:3]

        if done:

            break

```

![cartpole](https://github.com/MorvanZhou/mevo/raw/main/images/cartpole.gif)

# What is MEvo

In the MEvo, the smallest data segment is Gene. A set of gene can be packed into a Chromosome.

Chromosomes then formed an Individual. A Population consists of many individuals.

![data](https://github.com/MorvanZhou/mevo/raw/main/images/dataStructural.png)

It is different from the classical Genetic Algorithms whose chromosome only consists of a 1D array.

In other words, chromosome in the MEvo is n-D array with different shapes.

![data](https://github.com/MorvanZhou/mevo/raw/main/images/dataShape.png)

**Why making data shape varied?**

Because we want to possibly evolve a neural network which has multiple layers in various shape.

MEvo has two different algorithm families.

- Genetic Algorithm family

- Evolution Strategy family

The Genetic Algorithm basically has two steps:

1. crossover the chromosomes in a population

2. mutate gene in new generation

The following image shows how crossover on two parents chromosomes,

and how to mutate the new chromosome.

![crossoverMutation](https://github.com/MorvanZhou/mevo/raw/main/images/crossoverMutation.png)

And MEvo's Evolution Strategy has two different steps:

1. mutate gene (normal distribution) from same chromosomes

2. update this chromosome by the performance of all mutated children

![es](https://github.com/MorvanZhou/mevo/raw/main/images/es.png)

# Parallel training

MEvo support parallel training. Simply set `n_worker` > 1 to unlock your machine power for training on multiple cores.

When `n_worker=-1` then use all your cores.

Simply replace the `pop` definition by including `n_worker` from previous code.

```python

# parallel training

if __name__ == "__main__":

    with mevo.GeneticAlgoNet(max_size=20, layer_size=[4, 8, 2], n_worker=-1, drop_rate=0.7, mutate_rate=0.5) as pop:

        for generation in range(40):

            pop.evolve(fitness_fn=fitness_fn)

            print(f"generation={generation}, top_fitness={pop.top.fitness:.2f}")

```

Note that the parallel code must be run under `if __name__ == "__main__"：` scope,

otherwise a python multiprocessing error will occur.

# Install

Using pip to install MEvo

```shell

pip3 install mevo

```

# Methods and Parameters

MEvo support following populations:

- mevo.GeneticAlgoInt()

- mevo.GeneticAlgoFloat()

- mevo.GeneticAlgoOrder()

- mevo.GeneticAlgoNet()

- mevo.EvolutionStrategyNet()

## A classical genetic algorithm problem

such as Travel Sales Problem (TSP) can be solved by following classical Populations:

- mevo.GeneticAlgoInt()

- mevo.GeneticAlgoFloat()

- mevo.GeneticAlgoOrder()

### A Travel Sales Problem example using `pop = mevo.GeneticAlgoOrder()`:

```python

import mevo

import numpy as np

positions = [np.random.rand(2) for _ in range(20)]

def distance_fitness_fn(ind: mevo.individuals.Individual, conf: dict) -> float:

    order = [c.data[0] for c in ind.chromosomes]

    cost = 0

    for i in range(len(order) - 1):

        p1, p2 = positions[order[i]], positions[order[i + 1]]

        cost += np.square(p1 - p2).sum()

    fitness = -cost

    return fitness

pop = mevo.GeneticAlgoOrder(

    max_size=50,

    chromo_size=len(positions),

    drop_rate=0.3,

    mutate_rate=0.01,

)

pop.run(fitness_fn=distance_fitness_fn, step=30)

```

### An optimization problem example using `pop = mevo.GeneticAlgoInt()`:

```python

import mevo

import numpy as np

def wave_fitness_fn(ind: mevo.individuals.Individual, conf: dict) -> float:

    binary = [c.data for c in ind.chromosomes]

    c = np.concatenate(binary, axis=0)

    a = 2 ** np.arange(len(c))[::-1]

    decimal = c.dot(a)

    x = decimal / float(2 ** len(c) - 1) * 5

    o = np.sin(10 * x) * x + np.cos(2 * x) * x

    return o

pop = mevo.GeneticAlgoInt(

    max_size=20,

    chromo_size=10,

    drop_rate=0.3,

    chromo_initializer=mevo.chromosomes.initializers.RandomInt(0, 2),

    mutate_rate=0.01,

)

pop.run(step=20, fitness_fn=wave_fitness_fn)

```

## Deep Net Evolution

For policy learning or Reinforcement learning alternative, the following two methods has their advantages.

- mevo.GeneticAlgoNet()

- mevo.EvolutionStrategyNet()

|              | Reinforcement Learning               | MEvo                                                                                                  |

|--------------|--------------------------------------|-------------------------------------------------------------------------------------------------------|

| Training     | Has forward and backward propagation | Only has forward propagation, but need crossover or mutation operation (lighter than backpropagation) |

| Exploration  | Needs carefully set explore policy   | Different children setting automatically ensure the exploration                                       |

| Memory needs | Can only keep one set of parameters  | In each generation, must compute all children's parameters (parallel computing save time)             |

| Network Size | Generally large and deep net         | With a large scale exploration, a relatively small net can perform a good job                         |

### A `mevo.GeneticAlgoNet()` example:

```python

import mevo

import gymnasium

import numpy as np

def fitness_fn(ind: mevo.individuals.Individual, conf: dict) -> float:

    ep_r = 0

    env = gymnasium.make('Pendulum-v1')

    env.reset(seed=conf["seed"])

    for _ in range(2):

        s, _ = env.reset()

        for _ in range(150):  # in one episode

            logits = ind.predict(s)

            a = np.tanh(logits) * 2

            s, r, _, _, _ = env.step(a)

            ep_r += r

    return ep_r

def train():

    with mevo.GeneticAlgoNet(

            max_size=30,

            layer_size=[3, 32, 1],

            drop_rate=0.7,

            mutate_rate=0.5,

            n_worker=-1,

    ) as pop:

        for ep in range(700):

            pop.evolve(

                fitness_fn=fitness_fn,

            )

            print(ep, pop.top.fitness)

    return pop.top

def show(top):

    env = gymnasium.make('Pendulum-v1', render_mode="human")

    while True:

        s, _ = env.reset()

        for _ in range(200):  # in one episode

            logits = top.predict(s)

            a = np.tanh(logits) * 2

            s, _, _, _, _ = env.step(a)

if __name__ == "__main__":

    top = train()

    show(top)

```

![es](https://github.com/MorvanZhou/mevo/raw/main/images/pendulum_ga.gif)

### A `mevo.EvolutionStrategyNet()` example:

```python

import mevo

import gymnasium

import numpy as np

def fitness_fn(ind: mevo.individuals.EvolutionStrategyDense, conf: dict) -> float:

    ep_r = 0

    seed = conf["seed"]

    index = conf["index"]

    env = gymnasium.make('Pendulum-v1')

    # ! must set seed and clone when using mevo.EvolutionStrategyNet()

    env.reset(seed=conf["seed"])

    c_ind = ind.clone_with_mutate(index, seed)

    # ###############

    for _ in range(2):

        s, _ = env.reset()

        for _ in range(100):

            logits = c_ind.predict(s)

            a = np.tanh(logits) * 2

            s, r, _, _, _ = env.step(a)

            ep_r += r

    return ep_r

def train():

    with mevo.EvolutionStrategyNet(

            max_size=15,

            layer_size=[3, 32, 1],

            mutate_strength=0.05,

            learning_rate=0.1,

            n_worker=-1,

            seed=2

    ) as pop:

        for ep in range(700):

            pop.evolve(fitness_fn=fitness_fn)

            print(ep, pop.top.fitness)

    return pop.top

def show(top):

    env = gymnasium.make(

        'Pendulum-v1',

        render_mode="human"

    )

    while True:

        s, _ = env.reset()

        for _ in range(200):  # in one episode

            logits = top.predict(s)

            a = np.tanh(logits) * 2

            s, _, _, _, _ = env.step(a)

if __name__ == "__main__":

    top = train()

    show(top)

```

![es](https://github.com/MorvanZhou/mevo/raw/main/images/pendulum_es.gif)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/morvanzhou/mevo

Awesome Lists containing this project

README