Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pykong/borg-dqn

A Stream-Fueled Hive Mind for Reinforcement Learning.
https://github.com/pykong/borg-dqn

assignment deep-q-learning elk-stack gym iubh kafka pytorch redis reinforcement-learning

Last synced: 30 days ago
JSON representation

A Stream-Fueled Hive Mind for Reinforcement Learning.

Host: GitHub
URL: https://github.com/pykong/borg-dqn
Owner: pykong
License: mit
Created: 2023-11-14T16:28:24.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2023-11-24T09:17:04.000Z (about 1 year ago)
Last Synced: 2024-10-17T04:44:36.534Z (3 months ago)
Topics: assignment, deep-q-learning, elk-stack, gym, iubh, kafka, pytorch, redis, reinforcement-learning
Language: Python
Homepage:
Size: 4.39 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Borg-DQN

**A Stream-Fueled Hive Mind for Reinforcement Learning.**

This project originated as implementing the portfolio assignment for the data engineering module DLMDSEDE02 at the International University of Applied Sciences. It demonstrates how to build a streaming data-intensive application with a machine-learning focus.

Borg-DQN presents a distributed approach to reinforcement learning centered around a **shared replay
memory**. Echoing the collective intelligence of the [Borg](https://memory-alpha.fandom.com/wiki/Borg_Collective)
from the Star Trek universe, the system enables individual agents to tap into a hive-mind-like pool of communal
experiences to enhance learning efficiency and robustness.

This system adopts a containerized microservices architecture enhanced with real-time streaming capabilities.
Agents employ Deep Q-Networks (DQN) within game containers for training on the Atari Pong environment
from OpenAI Gym. The replay memory resides in a separate container, consisting of a Redis Queue, wherein
agents interface via protocol buffer messages.

The architecture continuously streams agents' learning progress and replay memory metrics to Kafka,
enabling instant analysis and visualization of learning trajectories and memory growth on a Kibana
dashboard.

## Gettings Started

### Requirements

The execution of Borg-DQN requires a working installation of `Docker`, as well as the `nvidia-container-toolkit` to pass through CUDA acceleration to the game container instances. Refer to the respective documentation for installation instructions:

- [Install Docker Engine](https://docs.docker.com/engine/install/)
- [Installing the NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)

The development of the game and monitor containers furthermore requires a working Python 3.11 interpreter and `poetry` for dependency management:

- [Python Releases](https://www.python.org/downloads/)
- [Poetry installation](https://python-poetry.org/docs/#installation)

### Starting Up

To start the application, run from the root directory:

```sh
docker compose up
```

Observe the learning progress and memory growth on the [live dashboard](http://localhost:5601/app/dashboards#/view/6c58f7d0-71c5-11ee-bccb-318d0f7f71cb?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))).

To start the application with multiple game containers, run:

```sh
docker compose up --scale game=3
```

The [Elasticsearch indices](http://localhost:9200/_cat/indices?pretty) can also be looked into.

#### Persistence Features

Upon startup, game containers load the most recent model checkpoint from the mode store location, while the replay memory will be prefilled with persisted transitions.

## Architecture

The application follows an infrastructure-as-code (`IaC`) approach, wherein individual services run inside Docker containers, whose configuration and interconnectivity are defined in a [`compose.yaml`](https://github.com/pykong/Borg-DQN/blob/readme/compose.yaml) at its root directory.

In the following, there is a short overview of each component of the application.

### Game Container

The game container encapsulates an Atari Pong environment (OpenAI gym) and a double deep Q-network agent (using PyTorch). The code is adapted from [MERLIn](https://github.com/pykong/merlin), an earlier reinforcement learning project by [pykong](https://github.com/pykong).

#### Configuration

The game container instances can be configured via environment variables. The easiest way is to place a `.env` file at the project's root; keys must bear the prefix `CONFIG_`, for example, `CONFIG_alpha=1e-2`, would configure the learning rate. For a complete list of configuration parameters, consult [config.py](https://github.com/pykong/Borg-DQN/blob/main/game/src/config/config.py).

#### Serializing Game Transitions

The game container will put each game transition into the shared replay memory and sample minibatches from that memory again. [Protocol Buffers](https://protobuf.dev/) short **protobuf** is used for serialization, which is fast and byte-safe, allowing for efficient transformation of the NumPy arrays of the game states.

This approach, however, requires the definition and maintenance of a [`.proto`](https://github.com/pykong/Borg-DQN/blob/main/game/src/transition/proto/transition.proto) schema file, from which native Python code is derived:

```.proto
syntax = "proto3";

package transition.proto;

message Transition {
bytes state = 1;
uint32 action = 2;
float reward = 3;
bytes next_state = 4;
bool done = 5;
...
}
```

### Replay Memory

The shared replay memory employs [Redis](https://redis.io/) to hold game transitions. Redis is performant and allows storing the transitions as serialized **protobuf** messages due to its byte-safe characteristics.

Redis, however, does not natively support queues, as demanded by the use case. The workaround used is to emulate queue behavior by the client-side execution of the [`LTRIM`](https://redis.io/commands/ltrim/) command.

### Memory Monitor

The memory monitor is a Python microservice that periodically polls the Redis shared memory for transition count and memory usage statistics and publishes those under a dedicated Kafka topic.
While ready-made monitoring solutions, like a Kibana integration, exist, the memory monitor demonstrates using Kafka with multiple topics, the other being the training logs.

### Kafka

[Apache Kafka](https://kafka.apache.org/) is a distributed streaming platform that excels in handling high-throughput, fault-tolerant messaging. In Borg-DQN, Kafka serves as the middleware that decouples the data-producing game environments from the consuming analytics pipeline, allowing for robust scalability and the flexibility to introduce additional consumers without architectural changes. Specifically, Kafka channels log to two distinct topics, 'training_log' and 'memory_monitoring', both serialized as JSON, ensuring structured and accessible data for any downstream systems.

### ELK Stack

The [ELK stack](https://www.elastic.co/en/elastic-stack), comprising `Elasticsearch`, `Logstash`, and `Kibana`, serves as a battle-tested trio for managing, processing, and visualizing data in real-time, making it ideal for observing training progress and replay memory growth in Borg-DQN. **Elasticsearch** is a search and analytics engine with robust database characteristics, allowing for quick retrieval and analysis of large datasets. **Logstash** seamlessly ingests data from Kafka through a declarative pipeline configuration, eliminating the need for custom code. **Kibana** leverages this integration to provide a user-customizable dashboard, all components being from Elastic, ensuring compatibility and stability.

### Development

## Plans

- [ ] Create external documentation, preferably using [MkDocs](https://www.mkdocs.org/)
- [ ] Allow game container instances to be individually configured (e.g., different epsilon values to address the exploitation-exploration tradeoff)
- [ ] Upgrade the replay memory to one featuring prioritization of transitions.

## Contributions Welcome

If you like Borg-DQN and want to develop it further, feel free to fork and open any pull request. 🤓

## Links

1. [Borg Collective](https://memory-alpha.fandom.com/wiki/Borg_Collective)
2. [Docker Engine](https://docs.docker.com/engine/)
3. [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/)
4. [Poetry Docs](https://python-poetry.org/docs/)
5. [Redis Docs](https://redis.io/docs/)
6. [Apache Kafka](https://kafka.apache.org/)
7. [ELK Stack](https://www.elastic.co/en/elastic-stack)
8. [Protocol Buffers](https://protobuf.dev/)
9. [Massively Parallel Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1507.04296.pdf)
- a more intricate architecture than Borg-DQN, also featuring a shared replay memory