{"id":17370001,"url":"https://github.com/pykong/borg-dqn","last_synced_at":"2026-05-03T10:33:35.266Z","repository":{"id":208973042,"uuid":"718707060","full_name":"pykong/Borg-DQN","owner":"pykong","description":"A Stream-Fueled Hive Mind for Reinforcement Learning.","archived":false,"fork":false,"pushed_at":"2023-11-24T09:17:04.000Z","size":4601,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-13T08:49:48.337Z","etag":null,"topics":["assignment","deep-q-learning","elk-stack","gym","iubh","kafka","pytorch","redis","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pykong.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-11-14T16:28:24.000Z","updated_at":"2023-11-24T08:57:09.000Z","dependencies_parsed_at":"2023-12-04T15:16:43.820Z","dependency_job_id":null,"html_url":"https://github.com/pykong/Borg-DQN","commit_stats":{"total_commits":14,"total_committers":2,"mean_commits":7.0,"dds":0.2857142857142857,"last_synced_commit":"8534bce5db95ff617b1f0ab35d56ecb66fb64c7f"},"previous_names":["pykong/borg-dqn"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/pykong/Borg-DQN","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FBorg-DQN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FBorg-DQN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FBorg-DQN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FBorg-DQN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pykong","download_url":"https://codeload.github.com/pykong/Borg-DQN/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pykong%2FBorg-DQN/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32566444,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assignment","deep-q-learning","elk-stack","gym","iubh","kafka","pytorch","redis","reinforcement-learning"],"created_at":"2024-10-16T00:23:04.348Z","updated_at":"2026-05-03T10:33:35.233Z","avatar_url":"https://github.com/pykong.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- markdownlint-disable MD041 --\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/PyVersion/3.11/purple\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Code-Quality/A+/green\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Black/OK/green\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Coverage/99.0/green\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/MyPy/78.0/blue\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Docs/42.0/blue\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/pykong/Borg-DQN/main/LICENSE\"\u003e\u003cimg alt=\"License\" src=\"https://badgen.net/static/license/MIT/blue\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/Build/1.0.0/pink\"\u003e\u003c/a\u003e\n    \u003ca href=\"#readme\"\u003e\u003cimg alt=\"PlaceholderBadge\" src=\"https://badgen.net/static/stars/★★★★★/yellow\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"#readme\"\u003e\n        \u003cimg alt=\"Title picture\" src=\"https://raw.githubusercontent.com/pykong/Borg-DQN/main/docs/img/title_picture.png\"\u003e\n        \u003c!-- Title picture credits: Benjamin Felder --\u003e\n        \u003c!-- Title picture created using DALL-E --\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n# Borg-DQN\n\n**A Stream-Fueled Hive Mind for Reinforcement Learning.**\n\nThis project originated as implementing the portfolio assignment for the data engineering module DLMDSEDE02 at the International University of Applied Sciences. It demonstrates how to build a streaming data-intensive application with a machine-learning focus.\n\nBorg-DQN presents a distributed approach to reinforcement learning centered around a **shared replay\nmemory**. Echoing the collective intelligence of the [Borg](https://memory-alpha.fandom.com/wiki/Borg_Collective)\nfrom the Star Trek universe, the system enables individual agents to tap into a hive-mind-like pool of communal\nexperiences to enhance learning efficiency and robustness.\n\nThis system adopts a containerized microservices architecture enhanced with real-time streaming capabilities.\nAgents employ Deep Q-Networks (DQN) within game containers for training on the Atari Pong environment\nfrom OpenAI Gym. The replay memory resides in a separate container, consisting of a Redis Queue, wherein\nagents interface via protocol buffer messages.\n\nThe architecture continuously streams agents' learning progress and replay memory metrics to Kafka,\nenabling instant analysis and visualization of learning trajectories and memory growth on a Kibana\ndashboard.\n\n## Gettings Started\n\n### Requirements\n\nThe execution of Borg-DQN requires a working installation of `Docker`, as well as the `nvidia-container-toolkit` to pass through CUDA acceleration to the game container instances. Refer to the respective documentation for installation instructions:\n\n- [Install Docker Engine](https://docs.docker.com/engine/install/)\n- [Installing the NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)\n\nThe development of the game and monitor containers furthermore requires a working Python 3.11 interpreter and `poetry` for dependency management:\n\n- [Python Releases](https://www.python.org/downloads/)\n- [Poetry installation](https://python-poetry.org/docs/#installation)\n\n### Starting Up\n\nTo start the application, run from the root directory:\n\n```sh\ndocker compose up\n```\n\nObserve the learning progress and memory growth on the [live dashboard](http://localhost:5601/app/dashboards#/view/6c58f7d0-71c5-11ee-bccb-318d0f7f71cb?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))).\n\nTo start the application with multiple game containers, run:\n\n```sh\ndocker compose up --scale game=3\n```\n\nThe [Elasticsearch indices](http://localhost:9200/_cat/indices?pretty) can also be looked into.\n\n#### Persistence Features\n\nUpon startup, game containers load the most recent model checkpoint from the mode store location, while the replay memory will be prefilled with persisted transitions.\n\n## Architecture\n\nThe application follows an infrastructure-as-code (`IaC`) approach, wherein individual services run inside Docker containers, whose configuration and interconnectivity are defined in a [`compose.yaml`](https://github.com/pykong/Borg-DQN/blob/readme/compose.yaml) at its root directory.\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"#readme\"\u003e\n        \u003cimg alt=\"Architecture diagram\" src=\"https://raw.githubusercontent.com/pykong/Borg-DQN/main/docs/img/architecture.svg\"\u003e\n        \u003c!-- Architecture diagram credits: Benjamin Felder --\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\nIn the following, there is a short overview of each component of the application.\n\n### Game Container\n\nThe game container encapsulates an Atari Pong environment (OpenAI gym) and a double deep Q-network agent (using PyTorch). The code is adapted from [MERLIn](https://github.com/pykong/merlin), an earlier reinforcement learning project by [pykong](https://github.com/pykong).\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"#readme\"\u003e\n        \u003cimg alt=\"Pong screenshot\" src=\"https://raw.githubusercontent.com/pykong/Borg-DQN/main/docs/img/pong.png\" width=\"200px\" align=\"right\"\u003e\n        \u003c!-- Pong screenshot credits: Benjamin Felder --\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n#### Configuration\n\nThe game container instances can be configured via environment variables. The easiest way is to place a `.env` file at the project's root; keys must bear the prefix `CONFIG_`, for example, `CONFIG_alpha=1e-2`, would configure the learning rate. For a complete list of configuration parameters, consult [config.py](https://github.com/pykong/Borg-DQN/blob/main/game/src/config/config.py).\n\n#### Serializing Game Transitions\n\nThe game container will put each game transition into the shared replay memory and sample minibatches from that memory again. [Protocol Buffers](https://protobuf.dev/) short **protobuf** is used for serialization, which is fast and byte-safe, allowing for efficient transformation of the NumPy arrays of the game states.\n\nThis approach, however, requires the definition and maintenance of a [`.proto`](https://github.com/pykong/Borg-DQN/blob/main/game/src/transition/proto/transition.proto) schema file, from which native Python code is derived:\n\n```.proto\nsyntax = \"proto3\";\n\npackage transition.proto;\n\nmessage Transition {\n    bytes state = 1;\n    uint32 action = 2;\n    float reward = 3;\n    bytes next_state = 4;\n    bool done = 5;\n    ...\n}\n```\n\n### Replay Memory\n\nThe shared replay memory employs [Redis](https://redis.io/) to hold game transitions. Redis is performant and allows storing the transitions as serialized **protobuf** messages due to its byte-safe characteristics.\n\nRedis, however, does not natively support queues, as demanded by the use case. The workaround used is to emulate queue behavior by the client-side execution of the [`LTRIM`](https://redis.io/commands/ltrim/) command.\n\n### Memory Monitor\n\nThe memory monitor is a Python microservice that periodically polls the Redis shared memory for transition count and memory usage statistics and publishes those under a dedicated Kafka topic.\nWhile ready-made monitoring solutions, like a Kibana integration, exist, the memory monitor demonstrates using Kafka with multiple topics, the other being the training logs.\n\n### Kafka\n\n[Apache Kafka](https://kafka.apache.org/) is a distributed streaming platform that excels in handling high-throughput, fault-tolerant messaging. In Borg-DQN, Kafka serves as the middleware that decouples the data-producing game environments from the consuming analytics pipeline, allowing for robust scalability and the flexibility to introduce additional consumers without architectural changes. Specifically, Kafka channels log to two distinct topics, 'training_log' and 'memory_monitoring', both serialized as JSON, ensuring structured and accessible data for any downstream systems.\n\n### ELK Stack\n\nThe [ELK stack](https://www.elastic.co/en/elastic-stack), comprising `Elasticsearch`, `Logstash`, and `Kibana`, serves as a battle-tested trio for managing, processing, and visualizing data in real-time, making it ideal for observing training progress and replay memory growth in Borg-DQN. **Elasticsearch** is a search and analytics engine with robust database characteristics, allowing for quick retrieval and analysis of large datasets. **Logstash** seamlessly ingests data from Kafka through a declarative pipeline configuration, eliminating the need for custom code. **Kibana** leverages this integration to provide a user-customizable dashboard, all components being from Elastic, ensuring compatibility and stability.\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"#readme\"\u003e\n        \u003cimg alt=\"Kibana screenshot\" src=\"https://raw.githubusercontent.com/pykong/Borg-DQN/main/docs/img/kibana.png\"\u003e\n        \u003c!-- Kibana screenshot credits: Benjamin Felder --\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n### Development\n\n\u003c!-- [multi-stage builds](https://docs.docker.com/build/building/multi-stage/) --\u003e\n\n## Plans\n\n- [ ] Create external documentation, preferably using [MkDocs](https://www.mkdocs.org/)\n- [ ] Allow game container instances to be individually configured (e.g., different epsilon values to address the exploitation-exploration tradeoff)\n- [ ] Upgrade the replay memory to one featuring prioritization of transitions.\n\n## Contributions Welcome\n\nIf you like Borg-DQN and want to develop it further, feel free to fork and open any pull request. 🤓\n\n## Links\n\n1. [Borg Collective](https://memory-alpha.fandom.com/wiki/Borg_Collective)\n2. [Docker Engine](https://docs.docker.com/engine/)\n3. [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/)\n4. [Poetry Docs](https://python-poetry.org/docs/)\n5. [Redis Docs](https://redis.io/docs/)\n6. [Apache Kafka](https://kafka.apache.org/)\n7. [ELK Stack](https://www.elastic.co/en/elastic-stack)\n8. [Protocol Buffers](https://protobuf.dev/)\n9. [Massively Parallel Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1507.04296.pdf)\n   - a more intricate architecture than Borg-DQN, also featuring a shared replay memory\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpykong%2Fborg-dqn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpykong%2Fborg-dqn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpykong%2Fborg-dqn/lists"}