{"id":13419266,"url":"https://github.com/Svalorzen/AI-Toolbox","last_synced_at":"2025-03-15T05:30:39.272Z","repository":{"id":11891037,"uuid":"14453297","full_name":"Svalorzen/AI-Toolbox","owner":"Svalorzen","description":"A C++ framework for MDPs and POMDPs with Python bindings","archived":false,"fork":false,"pushed_at":"2024-01-07T19:48:43.000Z","size":21191,"stargazers_count":646,"open_issues_count":10,"forks_count":99,"subscribers_count":34,"default_branch":"master","last_synced_at":"2024-07-31T22:46:35.629Z","etag":null,"topics":["artificial-intelligence","c-plus-plus","markov-decision-processes","mdps","planning","pomdps","python","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Svalorzen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"License.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2013-11-16T19:30:35.000Z","updated_at":"2024-07-18T15:40:38.000Z","dependencies_parsed_at":"2023-02-14T01:45:18.518Z","dependency_job_id":"40d02140-25b6-40b3-af8b-997bd31b4d72","html_url":"https://github.com/Svalorzen/AI-Toolbox","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Svalorzen%2FAI-Toolbox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Svalorzen%2FAI-Toolbox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Svalorzen%2FAI-Toolbox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Svalorzen%2FAI-Toolbox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Svalorzen","download_url":"https://codeload.github.com/Svalorzen/AI-Toolbox/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221541853,"owners_count":16840111,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","c-plus-plus","markov-decision-processes","mdps","planning","pomdps","python","reinforcement-learning"],"created_at":"2024-07-30T22:01:13.598Z","updated_at":"2024-10-26T14:31:06.769Z","avatar_url":"https://github.com/Svalorzen.png","language":"C++","funding_links":[],"categories":["TODO scan for Android support in followings"],"sub_categories":[],"readme":"AI-Toolbox\n==========\n\n[![AI-Toolbox](https://github.com/Svalorzen/AI-Toolbox/actions/workflows/build_cmake.yml/badge.svg)](https://github.com/Svalorzen/AI-Toolbox/actions/workflows/build_cmake.yml)\n\n[![Library overview video](https://user-images.githubusercontent.com/1609228/99919181-3404dc00-2d1c-11eb-8593-0bf6af44cef8.png)](https://www.youtube.com/watch?v=qjSo41DVSXg)\n\nThis C++ toolbox is aimed at representing and solving common AI problems,\nimplementing an easy-to-use interface which should be hopefully extensible\nto many problems, while keeping code readable.\n\nCurrent development includes MDPs, POMDPs and related algorithms. This toolbox\nwas originally developed taking inspiration from the Matlab `MDPToolbox`, which\nyou can find [here](https://miat.inrae.fr/MDPtoolbox/), and from the\n`pomdp-solve` software written by A. R. Cassandra, which you can find\n[here](http://www.pomdp.org/code/index.shtml).\n\nIf you are new to the field of reinforcement learning, we have a few [simple\ntutorials](http://svalorzen.github.io/AI-Toolbox/tutorials.html) that can help\nyou get started. An excellent, more in depth introduction to the basics of\nreinforcement learning can be found freely online in [this\nbook](http://incompleteideas.net/book/ebook/the-book.html).\n\nIf you use this toolbox for research, please consider citing our [JMLR\narticle](https://www.jmlr.org/papers/volume21/18-402/18-402.pdf):\n\n```\n@article{JMLR:v21:18-402,\n  author  = {Eugenio Bargiacchi and Diederik M. Roijers and Ann Now\\'{e}},\n  title   = {AI-Toolbox: A C++ library for Reinforcement Learning and Planning (with Python Bindings)},\n  journal = {Journal of Machine Learning Research},\n  year    = {2020},\n  volume  = {21},\n  number  = {102},\n  pages   = {1-12},\n  url     = {http://jmlr.org/papers/v21/18-402.html}\n}\n```\n\nExample\n=======\n\n```cpp\n// The model can be any custom class that respects a 10-method interface.\nauto model = makeTigerProblem();\nunsigned horizon = 10; // The horizon of the solution.\n\n// The 0.0 is the convergence parameter. It gives a way to stop the\n// computation if the policy has converged before the horizon.\nAIToolbox::POMDP::IncrementalPruning solver(horizon, 0.0);\n\n// Solve the model and obtain the optimal value function.\nauto [bound, valueFunction] = solver(model);\n\n// We create a policy from the solution to compute the agent's actions.\n// The parameters are the size of the model (SxAxO), and the value function.\nAIToolbox::POMDP::Policy policy(2, 3, 2, valueFunction);\n\n// We begin a simulation with a uniform belief. We sample from the belief\n// in order to get a \"real\" state for the world, since this code has to\n// both emulate the environment and control the agent.\nAIToolbox::POMDP::Belief b(2); b \u003c\u003c 0.5, 0.5;\nauto s = AIToolbox::sampleProbability(b.size(), b, rand);\n\n// We sample the first action. The id is to follow the policy tree later.\nauto [a, id] = policy.sampleAction(b, horizon);\n\ndouble totalReward = 0.0;// As an example, we store the overall reward.\nfor (int t = horizon - 1; t \u003e= 0; --t) {\n    // We advance the world one step.\n    auto [s1, o, r] = model.sampleSOR(s, a);\n    totalReward += r;\n\n    // We select our next action from the observation we got.\n    std::tie(a, id) = policy.sampleAction(id, o, t);\n\n    s = s1; // Finally we update the world for the next timestep.\n}\n```\n\nDocumentation\n=============\n\nThe latest documentation is available [here](http://svalorzen.github.io/AI-Toolbox/).\nWe have a few [tutorials](http://svalorzen.github.io/AI-Toolbox/tutorials.html)\nthat can help you get started with the toolbox. The tutorials are in C++, but\nthe `examples` folder contains equivalent Python code which you can follow\nalong just as well.\n\nFor Python docs you can find them by typing `help(AIToolbox)` from the\ninterpreter. It should show the exported API for each class, along with any\ndifferences in input/output.\n\nFeatures\n========\n\n### Cassandra POMDP Format Parsing ###\n\nCassandra's POMDP format is a type of text file that contains a definition of an\nMDP or POMDP model. You can find some examples\n[here](http://pomdp.org/examples/). While it is absolutely not necessary to use\nthis format, and you can define models via code, we do parse a reasonable subset\nof Cassandra's POMDP format, which allows to reuse already defined problems with\nthis library. [Here's the docs on that](http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1CassandraParser.html).\n\n### Python 2 and 3 Bindings! ###\n\nThe user interface of the library is pretty much the same with Python than what\nyou would get by using simply C++. See the `examples` folder to see just how\nmuch Python and C++ code resemble each other. Since Python does not allow\ntemplates, the classes are binded with as many instantiations as possible.\n\nAdditionally, the library allows the usage of native Python generative models\n(where you don't need to specify the transition and reward functions, you only\nsample next state and reward). This allows for example to directly use OpenAI\ngym environments with minimal code writing.\n\nThat said, if you need to customize a specific implementation to make it perform\nbetter on your specific use-cases, or if you want to try something completely\nnew, you will have to use C++.\n\n### Utilities ###\n\nThe library has an extensive set of utilities which would be too long to\nenumerate here. In particular, we have utilities for [combinatorics][comb],\n[polytopes][poly], [linear programming][lipo], [sampling and distributions][dist],\n[automated statistics][stat], [belief updating][belu], [many][trie] [data][fgra] [structures][fmat],\n[logging][logg], [seeding][seed] and much more.\n\n[comb]: http://svalorzen.github.io/AI-Toolbox/Combinatorics_8hpp.html\n[poly]: http://svalorzen.github.io/AI-Toolbox/Polytope_8hpp.html\n[lipo]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1LP.html\n[dist]: http://svalorzen.github.io/AI-Toolbox/Probability_8hpp.html\n[stat]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Statistics.html\n[belu]: http://svalorzen.github.io/AI-Toolbox/include_2AIToolbox_2POMDP_2Utils_8hpp.html\n[trie]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Trie.html\n[fgra]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1FactorGraph.html\n[fmat]: http://svalorzen.github.io/AI-Toolbox/FactoredMatrix_8hpp.html\n[logg]: http://svalorzen.github.io/AI-Toolbox/logging.html\n[seed]: http://svalorzen.github.io/AI-Toolbox/Seeder_8hpp.html\n\n### Bandit/Normal Games: ###\n\n|                                                            | **Models**                                         |                                     |\n| :--------------------------------------------------------: | :------------------------------------------------: | :---------------------------------: |\n| [Basic Model][bmod]                                        |                                                    |                                     |\n|                                                            | **Policies**                                       |                                     |\n| [Exploring Selfish Reinforcement Learning (ESRL)][esrl]    | [Q-Greedy Policy][bqgr]                            | [Softmax Policy][bsof]              |\n| [Linear Reward Penalty][lrpe]                              | [Thompson Sampling (Student-t distribution)][btho] | [Random Policy][brnd]               |\n| [Top-Two Thompson Sampling (Student-t distribution)][ttho] | [Successive Rejects][sure]                         | [T3C (Normal distribution)][t3cp]   |\n\n[bmod]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1Model.html \"Reinforcement Learning: An Introduction, Ch 2.1, Sutton \u0026 Barto\"\n[esrl]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1ESRLPolicy.html \"Exploring selfish reinforcement learning in repeated games with stochastic rewards, Verbeeck et al.\"\n[bqgr]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1QGreedyPolicy.html \"A Tutorial on Thompson Sampling, Russo et al.\"\n[bsof]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1QSoftmaxPolicy.html \"Reinforcement Learning: An Introduction, Ch 2.3, Sutton \u0026 Barto\"\n[lrpe]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1LRPPolicy.html \"Self-organization in large populations of mobile robots, Ch 3: Stochastic Learning Automata, Unsal\"\n[btho]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1ThompsonSamplingPolicy.html \"Thompson Sampling for 1-Dimensional Exponential Family Bandits, Korda et al.\"\n[brnd]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1RandomPolicy.html\n[ttho]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1TopTwoThompsonSamplingPolicy.html \"Simple Bayesian Algorithms for Best Arm Identification, Russo\"\n[sure]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1SuccessiveRejectsPolicy.html \"Best Arm Identification in Multi-Armed Bandits, Audibert et al.\"\n[t3cp]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Bandit_1_1T3CPolicy.html \"Fixed-confidence guarantees for Bayesian best-arm identification, Shang et al.\"\n\n### Single Agent MDP/Stochastic Games: ###\n\n|                                         | **Models**                                                   |                                                   |\n| :-------------------------------------: | :----------------------------------------------------------: | :-----------------------------------------------: |\n| [Basic Model][mmod]                     | [Sparse Model][msmo]                                         | [Maximum Likelihood Model][mmlm]                  |\n| [Sparse Maximum Likelihood Model][msml] | [Thompson Model (Dirichlet + Student-t distributions)][mtmo] |                                                   |\n|                                         | **Algorithms**                                               |                                                   |\n| [Dyna-Q][dynq]                          | [Dyna2][dyn2]                                                | [Expected SARSA][esar]                            |\n| [Hysteretic Q-Learning][hqle]           | [Importance Sampling][imsa]                                  | [Linear Programming][m-lp]                        |\n| [Monte Carlo Tree Search (MCTS)][mcts]  | [Policy Evaluation][mpoe]                                    | [Policy Iteration][mpoi]                          |\n| [Prioritized Sweeping][mprs]            | [Q-Learning][qlea]                                           | [Double Q-Learning][dqle]                         |\n| [Q(λ)][qlam]                            | [R-Learning][rlea]                                           | [SARSA(λ)][sarl]                                  |\n| [SARSA][sars]                           | [Retrace(λ)][retl]                                           | [Tree Backup(λ)][trel]                            |\n| [Value Iteration][vait]                 |                                                              |                                                   |\n|                                         | **Policies**                                                 |                                                   |\n| [Basic Policy][mpol]                    | [Epsilon-Greedy Policy][megr]                                | [Softmax Policy][msof]                            |\n| [Q-Greedy Policy][mqgr]                 | [PGA-APP][pgaa]                                              | [Win or Learn Fast Policy Iteration (WoLF)][wolf] |\n\n[mmod]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1Model.html\n[msmo]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1SparseModel.html\n[mmlm]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1MaximumLikelihoodModel.html\n[msml]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1SparseMaximumLikelihoodModel.html\n[mtmo]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1ThompsonModel.html\n\n[dynq]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1DynaQ.html \"Reinforcement Learning: An Introduction, Ch 9.2, Sutton \u0026 Barto\"\n[dyn2]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1Dyna2.html \"Sample-Based Learning and Search with Permanent and Transient Memories, Silver et al.\"\n[esar]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1ExpectedSARSA.html \"A Theoretical and Empirical Analysis of Expected Sarsa, van Seijen et al.\"\n[hqle]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1HystereticQLearning.html \"Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams, Matignon et al.\"\n[imsa]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1ImportanceSampling.html \"Eligibility Traces for Off-Policy Policy Evaluation, Precup\"\n[m-lp]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1LinearProgramming.html\n[mcts]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1MCTS.html \"Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Coulom\"\n[mpoe]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1PolicyEvaluation.html \"Reinforcement Learning: An Introduction, Ch 4.1, Sutton \u0026 Barto\"\n[mpoi]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1PolicyIteration.html \"Reinforcement Learning: An Introduction, Ch 4.3, Sutton \u0026 Barto\"\n[mprs]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1PrioritizedSweeping.html \"Reinforcement Learning: An Introduction, Ch 9.4, Sutton \u0026 Barto\"\n[qlea]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1QLearning.html \"Reinforcement Learning: An Introduction, Ch 6.5, Sutton \u0026 Barto\"\n[dqle]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1DoubleQLearning.html \"Double Q-learning, van Hasselt\"\n[qlam]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1QL.html \"Q(λ) with Off-Policy Corrections, Harutyunyan et al.\"\n[rlea]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1RLearning.html \"A Reinforcement Learning Method for Maximizing Undiscounted Rewards, Schwartz\"\n[sarl]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1SARSAL.html \"Reinforcement Learning: An Introduction, Ch 7.5, Sutton \u0026 Barto\"\n[sars]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1SARSA.html \"Reinforcement Learning: An Introduction, Ch 6.4, Sutton \u0026 Barto\"\n[retl]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1RetraceL.html \"Safe and efficient off-policy reinforcement learning, Munos et al.\"\n[trel]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1TreeBackupL.html \"Eligibility Traces for Off-Policy Policy Evaluation, Precup\"\n[vait]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1ValueIteration.html \"Reinforcement Learning: An Introduction, Ch 4.4, Sutton \u0026 Barto\"\n\n[mpol]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1Policy.html\n[megr]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1EpsilonPolicy.html\n[msof]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1QSoftmaxPolicy.html\n[mqgr]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1QGreedyPolicy.html\n[pgaa]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1PGAAPPPolicy.html \"Multi-Agent Learning with Policy Prediction, Zhang et al.\"\n[wolf]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1MDP_1_1WoLFPolicy.html \"Rational and Convergent Learning in Stochastic Games, Bowling et al.\"\n\n### Single Agent POMDP: ###\n\n|                              | **Models**                                    |                                            |\n| :--------------------------: | :-------------------------------------------: | :----------------------------------------: |\n| [Basic Model][pmod]          | [Sparse Model][pmsm]                          |                                            |\n|                              | **Algorithms**                                |                                            |\n| [Augmented MDP (AMDP)][amdp] | [Blind Strategies][blin]                      | [Fast Informed Bound][faib]                |\n| [GapMin][gapm]               | [Incremental Pruning][incp]                   | [Linear Support][lisu]                     |\n| [PERSEUS][pers]              | [POMCP with UCB1][pomc]                       | [Point Based Value Iteration (PBVI)][pbvi] |\n| [QMDP][qmdp]                 | [Real-Time Belief State Search (RTBSS)][rtbs] | [SARSOP][ssop]                             |\n| [Witness][witn]              | [rPOMCP][rpom]                                |                                            |\n|                              | **Policies**                                  |                                            |\n| [Basic Policy][ppol]         |                                               |                                            |\n\n[pmod]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1Model.html\n[pmsm]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1SparseModel.html\n\n[amdp]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1AMDP.html \"Probabilistic robotics, Ch 16: Approximate POMDP Techniques, Thrun\"\n[blin]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1BlindStrategies.html \"Incremental methods for computing bounds in partially observable Markov decision processes, Hauskrecht\"\n[faib]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1FastInformedBound.html \"Value-Function Approximations for Partially Observable Markov Decision Processes, Hauskrecht\"\n[gapm]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1GapMin.html \"Closing the Gap: Improved Bounds on Optimal POMDP Solutions, Poupart et al.\"\n[incp]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1IncrementalPruning.html \"Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes, Cassandra et al.\"\n[lisu]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1LinearSupport.html \"Algorithms for Partially Observable Markov Decision Processes, Phd Thesis, Cheng\"\n[pers]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1PERSEUS.html \"Perseus: Randomized Point-based Value Iteration for POMDPs, Spaan et al.\"\n[pomc]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1POMCP.html \"Monte-Carlo Planning in Large POMDPs, Silver et al.\"\n[pbvi]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1PBVI.html \"Point-based value iteration: An anytime algorithm for POMDPs, Pineau et al.\"\n[qmdp]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1QMDP.html \"Probabilistic robotics, Ch 16: Approximate POMDP Techniques, Thrun\"\n[rtbs]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1RTBSS.html \"Real-Time Decision Making for Large POMDPs, Paquet et al.\"\n[ssop]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1SARSOP.html \"SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces, Kurniawati et al.\"\n[witn]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1Witness.html \"Planning and acting in partially observable stochastic domains, Kaelbling et al.\"\n[rpom]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1rPOMCP.html \"Dynamic Resource Allocation for Multi-Camera Systems, Master Thesis, Bargiacchi\"\n\n[ppol]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1POMDP_1_1Policy.html\n\n### Factored/Joint Multi-Agent: ###\n\n#### Bandits: ####\n\nNot in Python yet.\n\n|                                                          | **Models**                                                     |                                                      |\n| :------------------------------------------------------: | :------------------------------------------------------------: | :--------------------------------------------------: |\n| [Basic Model][fbmo]                                      | [Flattened Model][fbfm]                                        |                                                      |\n|                                                          | **Algorithms**                                                 |                                                      |\n| [Max-Plus][mplu]                                         | [Multi-Objective Variable Elimination (MOVE)][move]            | [Upper Confidence Variable Elimination (UCVE)][ucve] |\n| [Variable Elimination][vael]                             | [Local Search][lose]                                           | [Reusing Iterative Local Search][rils]               |\n|                                                          | **Policies**                                                   |                                                      |\n| [Q-Greedy Policy][fbqg]                                  | [Random Policy][fbra]                                          | [Learning with Linear Rewards (LLR)][llre]           |\n| [Multi-Agent Upper Confidence Exploration (MAUCE)][mauc] | [Multi-Agent Thompson-Sampling (Student-t distribution)][mats] | [Multi-Agent RMax (MARMax)][mmax]                    |\n| [Single-Action Policy][fbsa]                             |\n\n[fbmo]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1Model.html \"Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems, Bargiacchi et al.\"\n[fbfm]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1FlattenedModel.html \"Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems, Bargiacchi et al.\"\n\n[mplu]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1MaxPlus.html \"Collaborative Multiagent Reinforcement Learning by Payoff Propagation, Kok et al.\"\n[move]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1MultiObjectiveVariableElimination.html \"Multi-Objective Variable Elimination for Collaborative Graphical Games, Roijers et al.\"\n[ucve]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1UCVE.html \"Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems, Bargiacchi et al.\"\n[vael]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1VariableElimination.html \"Multiagent Planning with Factored MDPs, Guestrin et al.\"\n[lose]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1LocalSearch.html \"Heuristic Coordination in Cooperative Multi-Agent Reinforcement Learning, Petri et al.\"\n[rils]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1ReusingIterativeLocalSearch.html \"Heuristic Coordination in Cooperative Multi-Agent Reinforcement Learning, Petri et al.\"\n\n[fbqg]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1QGreedyPolicy.html\n[fbra]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1RandomPolicy.html\n[llre]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1LLRPolicy.html \"Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards, Gai et al.\"\n[mauc]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1MAUCEPolicy.html \"Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems, Bargiacchi et al.\"\n[mats]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1ThompsonSamplingPolicy.html \"Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures, Verstraeten et al.\"\n[mmax]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1MARMaxPolicy.html \"Multi-agent RMax for Multi-Agent Multi-Armed Bandits, Bargiacchi et al.\"\n[fbsa]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1Bandit_1_1SingleActionPolicy.html\n\n#### MDP: ####\n\nNot in Python yet.\n\n|                                       | **Models**                                   |                                                                          |\n| :-----------------------------------: | :------------------------------------------: | :----------------------------------------------------------------------: |\n| [Cooperative Basic Model][fmcm]       | [Cooperative Maximum Likelihood Model][fmml] | [Cooperative Thompson Model (Dirichlet + Student-t distributions)][fmtm] |\n|                                       | **Algorithms**                               |                                                                          |\n| [FactoredLP][falp]                    | [Multi Agent Linear Programming][malp]       | [Joint Action Learners][jale]                                            |\n| [Sparse Cooperative Q-Learning][scql] | [Cooperative Prioritized Sweeping][cops]     |                                                                          |\n|                                       | **Policies**                                 |                                                                          |\n| [All Bandit Policies][fmbp]           | [Epsilon-Greedy Policy][fmeg]                | [Q-Greedy Policy][fmqg]                                                  |\n\n[fmcm]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1CooperativeModel.html\n[fmml]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1CooperativeMaximumLikelihoodModel.html\n[fmtm]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1CooperativeThompsonModel.html\n\n[falp]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1FactoredLP.html \"Max-norm Projections for Factored MDPs, Guestrin et al.\"\n[malp]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1LinearProgramming.html \"Multiagent Planning with Factored MDPs, Guestrin et al.\"\n[jale]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1JointActionLearner.html \"The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems, Claus et al.\"\n[scql]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1SparseCooperativeQLearning.html \"Sparse Cooperative Q-learning, Kok et al.\"\n[cops]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1CooperativePrioritizedSweeping.html \"Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping, Bargiacchi et al.\"\n\n[fmbp]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1BanditPolicyAdaptor.html\n[fmeg]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1EpsilonPolicy.html\n[fmqg]: http://svalorzen.github.io/AI-Toolbox/classAIToolbox_1_1Factored_1_1MDP_1_1QGreedyPolicy.html\n\nBuild Instructions\n==================\n\nDependencies\n------------\n\nTo build the library you need:\n\n- [cmake](http://www.cmake.org/) \u003e= 3.12\n- the [boost library](http://www.boost.org/) \u003e= 1.67\n- the [Eigen 3.4 library](http://eigen.tuxfamily.org/index.php?title=Main_Page).\n- the [lp\\_solve library](http://lpsolve.sourceforge.net/5.5/) (a shared library\n  must be available to compile the Python wrapper).\n\nIn addition, C++20 support is now required (**this means at least g++-10**)\n\nOn a Ubuntu system, you can install these dependencies with the following\ncommand:\n\n```bash\nsudo apt install g++-10 cmake libboost1.71-all-dev liblpsolve55-dev lp-solve libeigen3-dev\n```\n\nBuilding\n--------\n\nOnce you have all required dependencies, you can simply execute the following\ncommands from the project's main folder:\n\n```bash\nmkdir build\ncd build/\ncmake ..\nmake\n```\n\n`cmake` can be called with a series of flags in order to customize the output,\nif building everything is not desirable. The following flags are available:\n\n```bash\nCMAKE_BUILD_TYPE   # Defines the build type\nMAKE_ALL           # Builds all there is to build in the project, but Python.\nMAKE_LIB           # Builds the whole core C++ libraries (MDP, POMDP, etc..)\nMAKE_MDP           # Builds only the core C++ MDP library\nMAKE_FMDP          # Builds only the core C++ Factored/Multi-Agent and MDP libraries\nMAKE_POMDP         # Builds only the core C++ POMDP and MDP libraries\nMAKE_TESTS         # Builds the library's tests for the compiled core libraries\nMAKE_EXAMPLES      # Builds the library's examples using the compiled core libraries\nMAKE_PYTHON        # Builds Python bindings for the compiled core libraries\nAI_PYTHON_VERSION  # Selects the Python version you want (2 or 3). If not\n                   #   specified, we try to guess based on your default interpreter.\nAI_LOGGING_ENABLED # Whether the library logging code is enabled at runtime.\n```\n\nThese flags can be combined as needed. For example:\n\n```bash\n# Will build MDP and MDP Python 3 bindings\ncmake -DCMAKE_BUILD_TYPE=Debug -DMAKE_MDP=1 -DMAKE_PYTHON=1 -DAI_PYTHON_VERSION=3 ..\n```\n\nThe default flags when nothing is specified are `MAKE_ALL` and\n`CMAKE_BUILD_TYPE=Release`.\n\nNote that by default `MAKE_ALL` does not build the Python bindings, as they have\na minor performance hit on the C++ static libraries. You can easily enable them\nby using the flag `MAKE_PYTHON`.\n\nThe static library files will be available directly in the build directory.\nThree separate libraries are built: `AIToolboxMDP`, `AIToolboxPOMDP` and\n`AIToolboxFMDP`. In case you want to link against either the POMDP library or\nthe Factored MDP library, you will also need to link against the MDP one, since\nboth of them use MDP functionality.\n\nA number of small tests are included which you can find in the `test/` folder.\nYou can execute them after building the project using the following command\ndirectly from the `build` directory, just after you finish `make`:\n\n```bash\nctest\n```\n\nThe tests also offer a brief introduction for the framework, waiting for a\nmore complete descriptive write-up. Only the tests for the parts of the library\nthat you compiled are going to be built.\n\nTo compile the library's documentation you need\n[Doxygen](http://www.doxygen.nl/). To use it it is sufficient to execute the\nfollowing command from the project's root folder:\n\n```bash\ndoxygen\n```\n\nAfter that the documentation will be generated into an `html` folder in the\nmain directory.\n\nCompiling a Program\n===================\n\nFor an extensive pre-made setup of a C++/CMake project using AI-Toolbox *on\nLinux*, please do checkout [this\nrepository](https://github.com/Svalorzen/AI-Toolbox-Experiments). It contains\nthe setup I personally use when working with AI-Toolbox. It also comes with many\nadditional tools you might need, which are nevertheless all optional.\n\nAlternatively, to compile a program that uses this library, simply link it\nagainst the compiled libraries you need, and possibly to the `lp_solve`\nlibraries (if using POMDP or FMDP).\n\nPlease note that since both POMDP and FMDP libraries rely on the MDP code, you\n__MUST__ specify those libraries *before* the MDP library when linking,\notherwise it may result in `undefined reference` errors. The POMDP and Factored\nMDP libraries are not currently dependent on each other so their order does not\nmatter.\n\nFor Python, you just need to import the `AIToolbox.so` module, and you'll be\nable to use the classes as exported to Python. All classes are documented, and\nyou can run in the Python CLI\n\n    help(AIToolbox.MDP)\n    help(AIToolbox.POMDP)\n\nto see the documentation for each specific class.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSvalorzen%2FAI-Toolbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSvalorzen%2FAI-Toolbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSvalorzen%2FAI-Toolbox/lists"}