{"id":16487363,"url":"https://github.com/andremaz/transformer-pointer-critic","last_synced_at":"2025-10-27T20:30:43.389Z","repository":{"id":45469227,"uuid":"279578441","full_name":"AndreMaz/transformer-pointer-critic","owner":"AndreMaz","description":"Implementation of Transformer Pointer-Critic Deep Reinforcement Learning Algorithm","archived":false,"fork":false,"pushed_at":"2022-10-12T17:30:45.000Z","size":10522,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-01T07:41:14.841Z","etag":null,"topics":["actor-critic","deep-reinforcement-learning","pointer-networks","ptr-net","reinforcement-learning","tensorflow2","tf2","transformer","transformer-architecture","transformer-network","transformer-tensorflow2"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AndreMaz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-14T12:25:50.000Z","updated_at":"2024-01-17T09:47:35.000Z","dependencies_parsed_at":"2023-01-19T23:03:47.946Z","dependency_job_id":null,"html_url":"https://github.com/AndreMaz/transformer-pointer-critic","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreMaz%2Ftransformer-pointer-critic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreMaz%2Ftransformer-pointer-critic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreMaz%2Ftransformer-pointer-critic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreMaz%2Ftransformer-pointer-critic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AndreMaz","download_url":"https://codeload.github.com/AndreMaz/transformer-pointer-critic/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238552940,"owners_count":19491345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actor-critic","deep-reinforcement-learning","pointer-networks","ptr-net","reinforcement-learning","tensorflow2","tf2","transformer","transformer-architecture","transformer-network","transformer-tensorflow2"],"created_at":"2024-10-11T13:33:54.255Z","updated_at":"2025-10-27T20:30:42.050Z","avatar_url":"https://github.com/AndreMaz.png","language":"Python","readme":"# Transformer Pointer-Critic\n\nThis is a repo with the source code for the [Attention-Based Model and Deep Reinforcement Learning for Distribution of Event Processing Tasks](https://doi.org/10.1016/j.iot.2022.100563). If this code is useful for your work, please cite our paper:\n\n```\n@article{MAZAYEV2022100563,\n\ttitle = {Attention-based model and deep reinforcement learning for distribution of event processing tasks},\n\tjournal = {Internet of Things},\n\tpages = {100563},\n\tyear = {2022},\n\tissn = {2542-6605},\n\tdoi = {https://doi.org/10.1016/j.iot.2022.100563},\n\turl = {https://www.sciencedirect.com/science/article/pii/S2542660522000580},\n\tauthor = {Andriy Mazayev and Faroq Al-Tam and Noélia Correia}\n}\n```\n\n## Contents\n\n- [Problem Statement](#problem-statement)\n- [Installation](#installation)\n- [Repository Structure](#repo-structure)\n- [Agent, Env, Training and Testing configuration](#configuration)\n- [Training and Testing](#training-and-testing)\n- [Results](#results)\n- [Potential Improvements and Interesting ToDos](#potential-improvements-and-interesting-todos)\n- [References and Useful Links](#useful-links)\n\n\u003c!-- **For the rationale behind the architecture please check [Architecture Rationale](./Arch_Rationale.md)** --\u003e\n\n## Installation\n\n```bash\npython3 -m venv --system-site-packages ./venv\nsource ./venv/bin/activate\npip install --upgrade pip\n\n# Install the actual deps\npip install -r requirements.txt\n```\n\nFor more info check Tensorflow's [installation guide](https://www.tensorflow.org/install/pip).\n\n## Architecture\n\n**Simple Overview**\n![simple_arch](./media/paper_arch.png)\n\n## Problem Statement\n\n### Goal\n\nGiven a set of tasks (a.k.a. `Rules`, and web `Resources`), decide for the best `Rule` distribution across a set of devices a.k.a `Nodes` (each having a random amount of CPU, RAM and storage resources) while taking into account the QoS.\n\nThree QoS are considered:\n\n- **Greedy Distribution** - Place as much `Rules` as possible\n- **Fair Distribution** - Place as much `Rules` as possible but ensure that all `Nodes` receive a fair amount of `Rules` to process\n- **Cost Distribution** - Place as much `Rules` as possible but minimize the number of `Nodes` while doing it\n\n### Input Representation\n\nThe input has two parts: `Nodes` and `Rules`.\nEach entry in the `Nodes` part describes the amount of available resources in the node, while each entry in the `Rules` part describes the demanded resources.\nThe `Nodes` part has a **_dummy_** node that receives rejected `Rules`.\n\n**Input example with 2 Nodes and 2 `Rules` to distribute**\n\n```python\narray([\n    [ 0.00, 0.00, 0.00], -\u003e Node dummy. Rejected `Rules` will be \"placed\" here\n    [ 0.70, 0.80, 0.40], -\u003e Node 1. Available CPU: 0.70 | Available RAM: 0.80 | Available Storage: 0.40\n    [ 0.50, 0.40, 0.20], -\u003e Node 2. Available CPU: 0.50 | Available RAM: 0.40 | Available Storage: 0.20\n    [ 0.10, 0.12, 0.17]  -\u003e Rule 1. Required CPU: 0.10 | Required RAM: 0.12 | Required Storage: 0.17\n    [ 0.18, 0.32, 0.16]  -\u003e Rule 2. Required CPU: 0.18 | Required RAM: 0.32 | Required Storage: 0.16\n    ],\n    dtype=float32, shape=(5, 3))\n```\n\n### Repo structure\n\n```\n.\n├── agents\n│ ├── agent.py - Agent Class implementation\n│ ├── models\n│ │ └── transformer - Contains of the actor and the critic models\n│ ├── plotter.py - Plots the losses and the stats\n│ └── trainer.py - Training function\n├── configs - Centralized location for configuring the Agent, Env., training and testing function\n│ ├── configs.py - Loader helper method\n│ └── ResourceV3.json - Actual configs file\n├── environment\n│ ├── env_factory.py - Helper method to init and load the environment\n│ ├── custom\n│ │ ├── resource_v3\n│ │ │ ├── attention_plotter.py - Plots attention (for potential policy analysis)\n│ │ │ ├── env.py - Environment Class implementation\n│ │ │ ├── heuristic - Contains implementation of baseline heuristics\n│ │ │ ├── misc - Contains helper functions\n│ │ │ ├── node.py - Node Class implementation\n│ │ │ ├── resource.py - Task/`Rule`/`Resource` Class implementation\n│ │ │ ├── reward.py - Contains the implementation of different rewards\n│ │ │ └── tester.py - Testing function\n├── main.py\n├── requirements.txt\n├── results\n├── test_agent.py\n└── tests - Unit and integrations tests\n    ├── runner.py\n    └── unit\n```\n\n### Configuration\n\nThe configuration of the Env., Agent, Training and Testing functions are centralized and located in `configs/ResourceV3.json`\n\n```js\n{\n    \"trainer_config\": {\n        \"description\": \"Trainer function configs.\",\n        \"n_iterations\": 100000,\n        \"n_steps_to_update\": 30,\n\n        \"export_stats\": {\n            \"export_stats\": true,\n            \"folder\": \"training\"\n        },\n\n        \"store_model_weights\": {\n            \"export_weights\": true,\n            \"folder\": \"model\",\n            \"filename\": \"actor\"\n        }\n    },\n\n    \"tester_config\": {\n        \"description\": \"Testing function configs.\",\n\n        \"add_brakes\": false,\n\n        \"show_per_test_stats\": true,\n        \"show_inference_progress\": true,\n        \"show_solutions\": false,\n        \"show_detailed_solutions\": false,\n        \"plot_attentions\": false,\n        \"batch_size\": 1,\n\n        \"testbed\" : {\n            \"num_tests\": 100,\n            \"node_sample_configs\": {\n                \"min\": 5,\n                \"max\": 50,\n                \"step\": 5\n            },\n            \"node_available_resources\": {\n                \"min\": 0,\n                \"max\": 100,\n                \"step\": 100\n            },\n            \"request_sample_configs\": {\n                \"min\": 10,\n                \"max\": 100,\n                \"step\": 10\n            }\n        },\n\n        \"heuristic\": {\n            \"dominant_resource\": {\n                \"generate_params_combos\": true,\n                \"resource_sort_descending\": true,\n                \"node_sort_descending\": true\n            },\n            \"random\": {},\n            \"cplex_greedy_and_critical\": {\n                // Disabled by default. If needed, you need to have docplex lib installed\n                // More info: https://ibmdecisionoptimization.github.io/docplex-doc/\n                \"use\": false,\n                \"greedy_with_critical_resource\": false,\n                \"time_limit_ms\": 60000,\n                \"num_threads\": 8\n            },\n            \"cplex_node_reduction\": {\n                // Disabled by default. If needed, you need to have docplex lib installed\n                // More info: https://ibmdecisionoptimization.github.io/docplex-doc/\n                \"use\": false,\n                \"time_limit_ms\": 60000,\n                \"num_threads\": 8\n            }\n        },\n\n        \"export_stats\": {\n            \"global_stats\": {\n                \"export_stats\": true,\n                \"folder\": \"tests\",\n                \"filename\": \"test\"\n            },\n            \"per_problem_stats\": {\n                \"export_stats\": false,\n                \"folder\": \"tests/per_instance\"\n            }\n        }\n    },\n\n    \"env_config\": {\n        \"description\": \"Environment configs.\",\n\n        \"batch_size\": 128,\n        \"mask_nodes_in_mha\": true,\n        \"generate_request_on_the_fly\": false,\n\n        \"seed_value\": 1235,\n\n        \"normalization_factor\": 100,\n        \"decimal_precision\": 2,\n\n        \"num_features\": 3,\n        \"num_profiles\": 1000,\n\n        \"profiles_sample_size\": 20,\n        \"node_sample_size\": 10,\n\n        \"EOS_CODE\": -2,\n        \"req_min_val\": 1,\n        \"req_max_val\": 30,\n\n        \"node_min_val\": 0,\n        \"node_max_val\": 100,\n\n        \"reward\": {\n            \"type\": \"greedy\",\n            \"greedy\": {},\n            \"single_node_dominant\": {\n                \"rejection_penalty\": -2\n            },\n            \"global_dominant\": {\n                \"rejection_penalty\": -2\n            },\n            \"reduced_node_usage\": {\n                \"rejection_penalty\": -2,\n                \"use_new_node_penalty\": -1\n            }\n        }\n    },\n\n    \"tpc\": {\n        \"description\": \"Transformer Pointer Critic Agent configs.\",\n        \"agent_config\": {\n\n            \"gamma\": 0.99,\n            \"values_loss_coefficient\": 1.0,\n            \"entropy_coefficient\": 0.01,\n            \"stochastic_action_selection\": true,\n\n            \"actor\": {\n                \"use_default_initializer\": true,\n                \"num_layers\": 1,\n                \"dim_model\": 128,\n                \"num_heads\": 8,\n                \"inner_layer_dim\": 128,\n                \"encoder_embedding_time_distributed\": true,\n                \"attention_dense_units\": 128,\n                \"logit_clipping_C\": 10.0,\n\n                \"learning_rate\": 0.0001,\n                \"clipnorm\": 1.0\n            },\n\n            \"critic\": {\n                \"use_default_initializer\": true,\n                \"num_layers\": 3,\n                \"dim_model\": 128,\n                \"num_heads\": 8,\n                \"inner_layer_dim\": 512,\n                \"encoder_embedding_time_distributed\": true,\n                \"last_layer_units\": 128,\n                \"last_layer_activation\": \"linear\",\n\n                \"learning_rate\": 0.0005,\n                \"clipnorm\": 1.0\n            }\n        }\n    }\n}\n```\n\n### Training and Testing\n\nAfter configuring (see [Configuration](#Configuration)) run `main.py`.\nThe `main.py` will train and test the agent. Also, if configured, it will solve problem instances with \"classic\" heuristics and store the overall results in `results` folder. After the completion you will see a `End... Goodbye!` message.\n\n### Results\n\nThe images below show the performance of the agent. As the baseline CPLEX (with a time limit of 60 seconds) and several simple heuristics are used. Green highlighted areas in the images below show the configurations where CPLEX was able to obtain optimal solutions.\n\n**Greedy Results**\n![simple_arch](./media/results/greedy.png)\n\n**Critical-Aware Results**\n![simple_arch](./media/results/critical.png)\n\n**Cost-Aware Results**\n![simple_arch](./media/results/cost.png)\n\n## Useful Links\n\n- [Deep Reinforcement Learning: Pong from Pixels](http://karpathy.github.io/2016/05/31/rl/)\n- [Deriving Policy Gradients and Implementing REINFORCE](https://medium.com/@thechrisyoon/deriving-policy-gradients-and-implementing-reinforce-f887949bd63)\n- [Understanding Actor Critic Methods and A2C](https://towardsdatascience.com/understanding-actor-critic-methods-931b97b6df3f)\n- [Beam Search](https://machinelearningmastery.com/beam-search-decoder-natural-language-processing/)\n\n### Pointer Critic\n\n- [Neural Combinatorial Optimization with Reinforcement Learning](https://arxiv.org/pdf/1611.09940.pdf)\n- [Presentation Video - Neural Combinatorial Optimization with Reinforcement Learning](https://www.youtube.com/watch?v=mxCVgVrUw50)\n- [Reviews - Neural Combinatorial Optimization with Reinforcement Learning](https://openreview.net/forum?id=rJY3vK9eg)\n- [Reinforcement Learning for Solving the Vehicle Routing Problem](https://arxiv.org/pdf/1802.04240.pdf)\n- [Order Matters: Sequence to sequence for sets](https://arxiv.org/pdf/1511.06391.pdf)\n- [Attention, Learn to Solve Routing Problems!](https://arxiv.org/abs/1803.08475)\n\n### Unit Test and Coverage\n\n```bash\npython environment/custom/resource/tests/runner.py\n```\n\nor to generate an HTML-based coverage file\n\n```\ncoverage run tests/runner.py \u0026\u0026 coverage html --omit=*/venv/*,*/usr/*,*/lib/*,*/tests/* -i\n```\n\n## Potential Improvements and Interesting ToDos\n\n### Implement Self-Critic\n\nInstead of using a dedicated network (the `Critic`) to estimate the state-value paris, which are used as a baseline, use [greedy rollout baseline](https://arxiv.org/abs/1612.00563). Greedy rollout baseline in [Attention, Learn to Solve Routing Problems!](https://arxiv.org/abs/1803.08475) shows promising results.\n\n#### How to do it\n\nThe easiest (not the cleanest) way to implement it is to create a `agents/baseline_trainer.py` file with two instances (`env` and `env_baseline`) of environment and agents (`agent` and `agent_baseline`).\n\nThen:\n\n- When we sample a state from `env` we would copy it's state into `env_baseline`.\n- Delete the `critic` model from `agent` and `agent_baseline` as it is no longer necessary.\n- Copy the network weighs for `agent` actor into `agent_baseline` actor.\n- Set `agent_baseline.stochastic_action_selection` to `False`. This way the agent will select the action in a greedy way.\n- The `agent` will gather rewards from `env` and `agent_baseline` will do the same with `env_baseline`.\n\n### Implement Vehicle Routing Problem environment\n\nIt would be interesting to see how the network performs in VRP\n\n#### How to do it\n\n- Look at the `KnapsackV2` and `ResourceV3` environments in `environments/custom` and adapt them to the VRP\n- Add the VRP env to `environments/env_factory.py`\n- Add the `JSON` config file into the `configs` folder.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandremaz%2Ftransformer-pointer-critic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandremaz%2Ftransformer-pointer-critic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandremaz%2Ftransformer-pointer-critic/lists"}