{"id":22180531,"url":"https://github.com/s1m0n38/cr-analysis","last_synced_at":"2025-07-08T00:18:05.977Z","repository":{"id":159825713,"uuid":"559230019","full_name":"S1M0N38/cr-analysis","owner":"S1M0N38","description":"An exercise in data collection/analysis","archived":false,"fork":false,"pushed_at":"2023-12-18T23:05:34.000Z","size":15353,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-29T23:29:48.715Z","etag":null,"topics":["clash-royale","data-analysis","data-collection","data-science"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/S1M0N38.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-29T13:19:19.000Z","updated_at":"2022-12-05T13:54:55.000Z","dependencies_parsed_at":null,"dependency_job_id":"d4095259-0141-48f5-9d74-ab45c13f1209","html_url":"https://github.com/S1M0N38/cr-analysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/S1M0N38%2Fcr-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/S1M0N38%2Fcr-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/S1M0N38%2Fcr-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/S1M0N38%2Fcr-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/S1M0N38","download_url":"https://codeload.github.com/S1M0N38/cr-analysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245331933,"owners_count":20598075,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clash-royale","data-analysis","data-collection","data-science"],"created_at":"2024-12-02T09:18:35.153Z","updated_at":"2025-03-24T18:45:59.085Z","avatar_url":"https://github.com/S1M0N38.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Clash Royale Analysis \n\n\u003e This content is not affiliated with, endorsed, sponsored, or specifically\n\u003e approved by Supercell and Supercell is not responsible for it. For more\n\u003e information see Supercell’s Fan Content Policy:\n\u003e www.supercell.com/fan-content-policy.\n\nAn exercise in data collection/analysis, juggling with millions of records.\n\n\u003cdiv align=\"center\"\u003e\u003cp\u003e\n    \u003ca href=\"https://github.com/S1M0N38/cr-analysis/pulse\"\u003e\n      \u003cimg alt=\"Last commit\" src=\"https://img.shields.io/github/last-commit/S1M0N38/cr-analysis?style=for-the-badge\u0026color=8bd5ca\"/\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/S1M0N38/cr-analysis/blob/main/LICENSE\"\u003e\n      \u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/S1M0N38/cr-analysis?style=for-the-badge\u0026color=ee999f\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/S1M0N38/cr-analysis\"\u003e\n      \u003cimg alt=\"Repo Size\" src=\"https://img.shields.io/github/repo-size/S1M0N38/cr-analysis?color=DDB6F2\u0026label=SIZE\u0026style=for-the-badge\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://discord.com/users/S1M0N38#0317\"\u003e\n      \u003cimg alt=\"Discord\" src=\"https://img.shields.io/static/v1?label=DISCORD\u0026message=DM\u0026color=a3b7ff\u0026style=for-the-badge\" /\u003e\n    \u003c/a\u003e\n\u003c/div\u003e\n\n-------------------------------------------------------------------------------\n\n## Data Collection\n\n\u003e [Clash Royale Games dataset](https://www.kaggle.com/datasets/s1m0n38/clash-royale-games)\n\u003e publicly available\n\nFirst we need data. Clash Royale data can be retrieved using [Clash Royale\nAPI](https://developer.clashroyale.com/#/). The date we are interested in are\nbattles between players, specifically 1v1 top players battles.\n\n### How Data Collection works\n\nClash Royale's matchmaking algorithm is based on players' strength, i.e. you\nbattle against other players stronger or weaker than you. So, to\ncollect battles played between top players, you can start requesting battles\nfrom a bunch of top players (we call them `root_players`). Root players'\nopponents are also top players so you can also request their battles and the\nprocess goes on.\n\nSo to collect data you have to constantly send http requests to the API and use the\nresponses to them to produce other http calls. `data/collect.py` is a python script\ndesigned to do exactly that: request battles, save battles into a csv file,\ngenerate new requests based on collected battles.\n\n`data/collect.py` makes use of `data/crawler.py` which implements an asynchronous\niterator class capable of scheduling parallel http requests based on previous\nAPI response. \n\nData is stored in compressed CSV files (suffix `.csv.gz`). We choose this\nformat because it's easy to work with and widely used/supported.\n\n\n### How Data Collection is performed\n\nAt the time of writing we use Raspberry Pi 3 Model B\n([configuration](https://github.com/S1M0N38/dots/tree/rpi)) to collect data\nrunning `data/collect.sh`. `data/collect.sh` is a bash script that wraps\n`data/collect.py` by specifying its command line arguments (e.g. root players, number\nof parallel requests, name of the output file, etc.). `data/collect.sh` is\nscheduled to run hourly by [crontab](https://en.wikipedia.org/wiki/Cron) and it\ntakes about 50 minutes to complete on Raspberry Pi with our connection. Moreover\n`data/collect.sh` is responsible for sorting and compressing csv file that\n`data/collect.py` has produced. Data generated by `data/collect.py` are stored\nin `db/hours` folder.\n\n`data/join.sh` is the bash script that is run daily to join data from the previous\nday while eliminating redundant data. Results of `data/join.sh` are stored in\n`db/days`.\n\n\n### Set up Data Collection\n\nHere's how to setup data collection pipeline on your own machine. Assuming you\nhave access to command line and [git](https://git-scm.com/) installed ... \n\n1. Clone this repo in your $HOME directory\n```bash \ngit clone https://github.com/S1M0N38/cr-analysis.git $HOME/cr-analysis\n```\n\n2. Navigate to `data` subdirectory\n```bash\ncd $HOME/cr-analysis/data\n```\nEvery **data collection** script MUST be run from this directory\n(`cr-analysis/data`).\n\n3. Ensure `python 3.9` or greater is available.\n``` bash\npython -V\n```\nThis project is developed using python 3.9 hence backward compatibility is not\nensured.\n\n4. Upgrade `pip` and install requirements\n```bash\npython -m pip install --upgrade pip\npython -m pip install -r requirements.txt\n```\n\n5. Create a developer account [here](https://developer.clashroyale.com/#/) and\n   add credentials as environment variables. For example if you use bash as\n   shell add\n```bash\nexport API_CLASH_ROYALE_EMAIL=\"example@mail.com\"\nexport API_CLASH_ROYALE_PASSWORD=\"MY_P4S5W0RD\"\n```\nto your `.bashrc`. Then restart the terminal and navigate again to\n`cr-analysis/data`.\n\n6. Run test script to ensure that data collection scripts will runs flawlessly.\n```bash\n./test.sh\n```\n\n7. If not error arose you can schedule `collect.sh` to run hourly and `join.sh`\n   to run daily.\n```bash \ncrontab -e\n```\nThis command will open a text file with your favorite editor (choose `nano` if\nin doubt). Every line in this file is a scheduled task. Lines that starts with\n`#` are just comments. Add those lines at the end of this file.\n```crontab\n# Clash Royale API credentials\n# (you have to add credentials to this file as well)\nAPI_CLASH_ROYALE_EMAIL=\"example@mail.com\"\nAPI_CLASH_ROYALE_PASSWORD=\"MY_P4S5W0RD\"\n\n  0  *  *  *  *  cd $HOME/cr-analysis/data \u0026\u0026 ./collect.sh 100000\n  0 12  *  *  *  cd $HOME/cr-analysis/data \u0026\u0026 ./join.sh\n# |  |  |  |  |  |\n# |  |  |  |  |  +--- command\n# |  |  |  |  +----- day of week (0 - 6) (Sunday=0)\n# |  |  |  +------- month (1 - 12)\n# |  |  +--------- day of month (1 - 31)\n# |  +----------- hour (0 - 23)\n# +------------- min (0 - 59)\n```\nThat's all. Scripts scheduled with `crontab` will be run in background.\nUse a process viewer (e.g. [htop](https://htop.dev/)) to check if the are\nrunning. Alternatively take a look at the log files contained in `db/hours` and\nin `db/days`.\n\n### Further Data Manipulation\n\nLet's pretend that three days pass since you have collection pipeline running on\nyour machine. You should have a directory structure like this\n```\ncr-analysis\n├── db\n│  ├── days\n│  │  ├── 20221114.csv.gz\n│  │  ├── 20221115.csv.gz\n│  │  └── 20221116.csv.gz\n│  ├── hours\n│  │  └── ...\n│  ├── test\n│  │  └── ...\n│  └── ...\n├── ...\n└── README.md\n```\nFiles in `cr-analysis/db/days` are stored compressed CSV files containing battles\ncollected on that date (This does not mean that battles were played on that day\nbut that they were played on that day or days before). Now you can copy\n`db/days` from your \"scraping machine\" (in our case Raspberry Pi) to your\n\"analysis machine\" (in our case a laptop).\n\nClone this very repo in your \"analysis machine\" and navigate to\n`cr-analysis/db`. Here you can find some empty directory and some script for\ndata manipulation. Those empty folder will be populated with data coming from\n\"scraping machine\".\n\nWe can connect to Raspberry with ssh so we can pull data using `rsync` from\n\"analysis machine\":\n```bash\nrsync -aPv pi@192.168.178.101:cr-analysis/db/days .\n```\nwhere:\n- `pi` is the user on RPI\n- `192.168.178.101` is the local IP of RPI\n- `cr-analysis/db/days` is the path to data folder on RPI\n- `.` is where I want to save data on \"analysis machine\"\n\nWith CSV on our \"analysis machine\" we can proceed with further data\nmanipulation. For example we like to rearrange battles in such a way that CSV\nfile name refers to the days battles were played instead of the day battles data\nwere collected. This can be achieved using `season.sh` script. For example\n```bash\n./season.sh 20221107 20221205\n```\nwill create the folder `db/20221107-20221205` with one csv file per day between\nthe 7th of Nov 2022 to the 5th of Dec 2022.\n```\ncr-analysis\n├── db \n│  ├── 20221107-20221205\n│  │  ├── 20221107.csv.gz\n│  │  ├── 20221108.csv.gz\n│  │  ├── ...\n│  │  ├── 20221204.csv.gz\n│  │  └── 20221205.csv.gz\n│  ├── days\n│  │  └── ...\n│  ├── hours\n│  │  └── ...\n│  ├── test\n│  │  └── ...\n│  └── ...\n├── ...\n└── README.md\n```\nThis means that the file `db/20221107.csv.gz` will contain battles played on 7th\nof Nov 2022 sorted by time. This way of storing data has a couple of\nadvantages:\n\n- Keep files size relatively small, so they are easy to handle compared to a\n  huge CSV file.\n- `gzip` compression allow to concatenate them without decompression (e.g. `cat\n  20221107.csv.gz 20221108.csv.gz 20221109.csv.gz` will stream to stdin all\n  battle played between 20221107 and 20221109 sorted by time).\n\nAnother way to organize data is to store them in a proper database. The script\n`db/sqlite.sh` takes file .csv.gz from `db/days` and concatenates them in a\nsingle `db/db.sqlite` file.\n```\ncr-analysis\n├── db\n│  ├── days\n│  │  └── ... \n│  └── db.sqlite\n└── ...\n```\n\nFor files manipulation `collect/join.sh`, `db/season.sh` and `db/sqlite.sh`\nleverage the power of command line programs pre-installed on many Unix-Like OSes.\nTake a look at them if you want to manipulate compressed CSV on your own.\n\n-------------------------------------------------------------------------------\n\n## Data Analysis\n\nData Analysis is still in early stage but here is a simple example if you'd\nlike to experiment on your own. This example use\n[pandas](https://pandas.pydata.org/) as data analysis tool to analyse battles\nplayed in different days.\n\n### Install\n\nAssuming you have already clone this repository ...\n\n1. Navigate to `analysis` subdirectory\n```bash\ncd $HOME/cr-analysis/analysis\n```\nEvery **data analysis** command/script MUST be run from this directory\n(`cr-analysis/analysis`).\n\n2. Ensure `python 3.9` or greater is available.\n``` bash\npython -V\n```\nThis project is developed using python 3.9 so backward compatibility is not\nensured.\n\n3. Upgrade `pip` and install requirements\n```bash\npython -m pip install --upgrade pip\npython -m pip install -r requirements.txt\n```\nThis install data analysis dependencies.\n\n### Preprocessing\n\nFirst it's better to convert data files from well-known csv format to the\nless-known *parquet* format on which we will have a fine grained control while\nworking with pandas ([this video](https://youtu.be/9LYYOdIwQXg) explains why).\nSuppose we obtained `db/20221107-20221205/20221107.csv.gz` from the data\ncollection pipeline\n```\ncr-analysis\n├── db \n│  ├── 20221107-20221205\n│  │  ├── 20221107.csv.gz\n│  │  └── ...\n│  └── ...\n└── ...\n```\nYou can convert if to parquet format by using `analysis/parquet.py`\n```bash\npython parquet.py -i ../db/20221107-20221205/20221107.csv.gz\n```\nThis will create `20221107.parquet` in the same directory of the input file\n```\ncr-analysis\n├── db \n│  ├── 20221107-20221205\n│  │  ├── 20221107.csv.gz\n│  │  ├── 20221107.parquet\n│  │  └── ...\n│  └── ...\n└── ...\n```\nThe bash script `analysis/parquet.sh` convert all csv file stored in db into\nparquet files.\n\n### Simple Example\n\n1. Start jupyerlab server with `jupyer-lab`\n\n2. Take a look at `analysis/analysis.ipynb`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs1m0n38%2Fcr-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fs1m0n38%2Fcr-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs1m0n38%2Fcr-analysis/lists"}