{"id":13936495,"url":"https://github.com/andreiapostoae/dota2-predictor","last_synced_at":"2025-07-19T22:30:52.177Z","repository":{"id":22200548,"uuid":"94788210","full_name":"andreiapostoae/dota2-predictor","owner":"andreiapostoae","description":"Tool that predicts the outcome of a Dota 2 game using Machine Learning","archived":false,"fork":false,"pushed_at":"2022-12-08T00:38:38.000Z","size":65769,"stargazers_count":365,"open_issues_count":16,"forks_count":81,"subscribers_count":39,"default_branch":"master","last_synced_at":"2024-08-08T23:23:52.927Z","etag":null,"topics":["dota","dota2","gaming","logistic-regression","machine-learning","neural-network","prediction"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andreiapostoae.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-19T14:52:15.000Z","updated_at":"2024-08-01T13:24:52.000Z","dependencies_parsed_at":"2023-01-12T08:30:31.066Z","dependency_job_id":null,"html_url":"https://github.com/andreiapostoae/dota2-predictor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreiapostoae%2Fdota2-predictor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreiapostoae%2Fdota2-predictor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreiapostoae%2Fdota2-predictor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreiapostoae%2Fdota2-predictor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andreiapostoae","download_url":"https://codeload.github.com/andreiapostoae/dota2-predictor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226686728,"owners_count":17666928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dota","dota2","gaming","logistic-regression","machine-learning","neural-network","prediction"],"created_at":"2024-08-07T23:02:43.821Z","updated_at":"2024-11-27T04:31:11.618Z","avatar_url":"https://github.com/andreiapostoae.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Important update from the author \nI will try to update this repo over the winter break as I want the game meta to stabilize a little. Stay tuned!\n\n\n# dota2-predictor\n\n1. [Overview](#overview)\t\t\n2. [Requirements](#requirements)\t\t\n3. [Project structure](#structure)\t\n4. [Basic usage](#usage)\t\t\n    * [Mining](#mining)\t\t\n    * [Loading a dataset](#dataset)\t\t\n    * [Training and evaluating a model](#training)\t\t\n    * [Querying a model](#querying)\t\t\n    * [Visualizing data](#visualizing)\t\t\n5. [Author's note](#author)\t\t\n6. [FAQ](#faq)\n\n## Overview  \u003ca name=\"overview\"\u003e\u003c/a\u003e\ndota2-predictor is a tool that uses Machine Learning to predict the outcome of a Dota 2 game and suggest the best last pick. Moreover, it helps the users visualize various statistics of the datasets (e.g. hero maps, synergies, counters). The project achieves roughly 0.65 ROC AUC score using Logistic Regression and Neural Networks. The code is used mainly for creating models used by [dotadamus.com](http://www.dotadamus.com).\n\n## Requirements \u003ca name=\"requirements\"\u003e\u003c/a\u003e\nThe project requires a handful of python2.7 packages. Install them using:\n```bash\npip install -r requirements.txt\n```\n\n## Project structure  \u003ca name=\"structure\"\u003e\u003c/a\u003e\n    .\n    ├── preprocessing               # Dataset preprocessing\n    ├── pretrained                  # Pretrained models for each MMR category and precomputed synergies\n    ├── tools                       # Auxiliary tools e.g. metadata parser and miner\n    ├── training                    # Model training and evaluation\n    ├── visualizing                 # Plotting tools\n    ├── 706e.zip                    # Mined dataset example\n    ├── api_examples.py             # Examples of API usage\n    └── metadata.json               # Heroes and patches database\n\n\n## Basic usage  \u003ca name=\"usage\"\u003e\u003c/a\u003e\nYou can find all the functions this tool has in [api_examples.py](https://github.com/andreiapostoae/dota2-predictor/blob/master/api_examples.py).\n\n### Mining data  \u003ca name=\"mining\"\u003e\u003c/a\u003e\nMining the first 1000 games between two valid match IDs and saving them to a file is as simple as:\n\n```python\nfrom tools.miner import mine_data\n\nmine_data(file_name='mine_example.csv',\n          first_match_id=3492535023,\n          last_match_id=3498023575,\n          stop_at=1000)\n```\n\nThe result will be a CSV file with the following format:\n\n| match_id   | radiant_win | radiant_team       | dire_team      | avg_mmr | num_mmr | game_mode | lobby_type |\n|------------|-------------|--------------------|----------------|---------|---------|-----------|------------|\n| 3315687002 | True        | \"105,74,14,41,113\" | \"7,63,93,4,36\" | 3649    | 2       | 22        | 7          |\n| 3380355408 | False       | \"97,62,18,50,35\"   | \"7,22,32,1,36\" | 3186    | 7       | 22        | 7          |\n| 3324424409 | False       | \"96,74,36,15,94\"   | \"39,1,2,86,27\" | 2648    | 2       | 22        | 0          |\n\n  * match_id: the official match ID stored in Valve servers\n  * radiant_win: True if radiant won, False if dire won\n  * radiant_team: string containing 5 hero IDs coresponding to the heroes in the radiant team\n  * dire_team: same, but for dire\n  * avg_mmr: average MMR of people with public MMR in the game\n  * num_mmr: number of people with public MMR in the game\n  * game_mode: 22 - All Pick\n  * lobby_type: 7 - Ranked, 0 - Normal\n\n\n### Loading a dataset  \u003ca name=\"dataset\"\u003e\u003c/a\u003e\nWhen loading a dataset, additional preprocessing through feature engineering is done. We define **advantages** two (number_of_heroes, number_of_heroes) matrices that give a score for synergy and counter between each combination of heroes. \n  * 0.54 score at position [i, j] for the synergy matrix means hero\u003csub\u003ei\u003c/sub\u003e and hero\u003csub\u003ej\u003c/sub\u003e have a 54% winrate when playing in the same team (the matrix is symmetrical)\n  * 0.33 score at position [i, j] for the counter matrix means hero\u003csub\u003ei\u003c/sub\u003e has a 33% winrate against hero\u003csub\u003ej\u003c/sub\u003e (the matrix is not symmetrical)\n\nFor the entire dataset, these advantages are already precomputed, but they can be overwritten through the read_dataset method:\n\n```python\nfrom preprocessing.dataset import read_dataset\ndataset, advantages_computer = read_dataset('706e_train_dataset.csv',\n\t\t\t\t\t    low_mmr=2000,\n\t\t\t\t\t    high_mmr=2500,\n\t\t\t\t\t    advantages=True)\n```\n\nThis example recomputes advantages and uses them automatically as features in the dataset, while filtering games between 2000 and 2500 MMR.\n\n\n### Training and evaluating a model  \u003ca name=\"training\"\u003e\u003c/a\u003e\nThe algorithm used for training is Logistic Regression and the evaluation is done through cross validation. The trained model can be saved to a pickle file for later use.\nThe cross validation is done on the train dataset and the final accuracy scores (ROC AUC and raw accuracy) are on the test dataset.\n\n```python\nfrom preprocessing.dataset import read_dataset\nfrom training.cross_validation import evaluate\n\ndataset_train, _ = read_dataset('706e_train_dataset.csv', low_mmr=4500)\ndataset_test, _ = read_dataset('706e_test_dataset.csv', low_mmr=4500)\n\nevaluate(dataset_train, dataset_test, cv=7, save_model='test.pkl')\n```\nOutput:\n```\nINFO:preprocessing.dataset:The train dataset contains 112057 games\nINFO:preprocessing.dataset:The test dataset contains 8390 games\nINFO:training.cross_validation:Cross validation scores over the training set (7 folds): 0.630 +/- 0.004\nINFO:training.cross_validation:Test ROC AUC: 0.628\nINFO:training.cross_validation:Test accuracy score: 0.592\n```\n\n### Querying a model  \u003ca name=\"querying\"\u003e\u003c/a\u003e\nThere are two type of queries you can do:\n  * full query: insert all 10 heroes in a game and predict the winner\n  * partial query: insert 9 heroes in a game and the prediction will be made in order to maximize the 10\u003csup\u003eth\u003c/sup\u003e player's winning chance\n\n```python\nfrom training.query import query\n\nfull_result = query(3000,\n\t\t    [59, 56, 54, 48, 31],\n                    [40, 41, 52, 68, 61])\n\npartial_result = query(3000,\n                       [59, 56, 54, 48, 31],\n                       [40, 41, 52, 68])\n```\n\nFor the first query, a simple result is shown (e.g. Radiant has 53.42% chance to win). The second query has a more complex output: tuples of (hero, (win chance, similarity inside team)).\nWe define **similarity** a score that represents how close two different are role-wise. We use this metric in order to filter out heroes that do not match a team composition (e.g. you do not want a 5 carry team even if they have very high winrates separately). The similarities can be analyzed in a hero map in the next section.\n\n### Visualizing data  \u003ca name=\"visualizing\"\u003e\u003c/a\u003e\nFor better understanding of the data, plots can be created with ease:\n  * learning curve (matplotlib, plotly)\n  * win rate statistics (plotly)\n  * pick statistics (plotly)\n  * MMR distribution (plotly)\n  * hero synergies (plotly)\n  * hero counters (plotly)\n  * hero map using similarities (plotly)\n\nHere are some examples, but more can be found [here](https://plot.ly/~apostoae.andrei), where they are also interactive:\n\n![learning curve](https://i.imgur.com/lESHAv1.png)\n\n![hero map](https://i.imgur.com/kfpYYlT.png)\n\n## Author's note  \u003ca name=\"author\"\u003e\u003c/a\u003e\nThis is a hobby project started with the goal of achieving as high accuracy as possible given the picks from a game. After great response from the community, I decided to look for a collaborator who would help me implement a web app for this tool (web development is not really my thing), as previously it was hard to use.\n\nI was lucky to find [Michael](https://github.com/michael-pacheco), a talented student who shares the same passion as me for gaming and programming. He was able to create the [website](http://www.dotadamus.com) from scratch using what I previously implemented and set everything up for deployment. \n\nThe website is not indended to make money and all the expenses are on my side, but as long as people visit it and the servers hold enough traffic, I am the happiest person in the world.\n\nThis tool is up-to-date with the current patches and does a decent job predicting your best possible last pick given a situation in order to give you that little extra chance that turns the tides in your favor.\n\nGood luck in your matches and game on!\n\n\n## FAQ \u003ca name=\"faq\"\u003e\u003c/a\u003e\n  1. Only 60% accuracy? That is not much better than predicting that radiant always wins.\n        * Yes, after a lot of feature engineering and algorithm searching, this is the best I could come up with. Along my experiments, I tried using a variety of classification algorithms and even Neural Networks. Even with a lot of tuning, the NNs acted at best as good as the Logistic Regression, so yeah... There is also the human factor that strongly influences the outcome of a game, so there is no way of predicting each game with close-to-perfect accuracy.\n\n  2. How did you generate similarities and plot the hero map?\n        * I completely ignored the result of the games and tried to understand better what a team composition is. I then applied word2vec to the data, mapping each hero to a word and each team of 5 heroes to a sentence. Amazingly, the algorithm was able to find structure on those team compositions and successfully modelled the similarities. I then applied TSNE to reduce the dimensions, k-means to cluster the new data and voilà.\n\n  3. Why don't you use only 6k+ games to train your model then get rich by betting on pro games?\n        * Sadly, it does not work like that. The pro scene is totally different than pubs because in pubs there is some MMR balance, while in pro games the dataset would be filled with noise. Also, there are very few games played at high MMRs.\n\n  4. Why did you not use other statistics, such as XPM, GPM or itemization?\n        * This tool's usage is to suggest you the best possible pick before the game starts. Other statistics are dynamic throughout the game, so they do not help the prediction.\n\n  5. How many games do I need to achieve the best accuracy possible?\n        * Experimentally, I'd say at least 150k. If you analyze the learning rates, you can see the plateau after 200k.\n\n  6. Can I contribute to the project?\n        * Yes and I would be glad! I try to be as active as possible, so feel free to post issues, pull requests, or even contact me personally to discuss. Periodically, we ([Michael](https://github.com/michael-pacheco) and me) make updates to the [website](http://www.dotadamus.com) with changes from this repo.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreiapostoae%2Fdota2-predictor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandreiapostoae%2Fdota2-predictor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreiapostoae%2Fdota2-predictor/lists"}