{"id":19085048,"url":"https://github.com/jaywalnut310/rl-atari-skiing","last_synced_at":"2025-04-30T09:26:00.309Z","repository":{"id":41105711,"uuid":"166159374","full_name":"jaywalnut310/rl-atari-skiing","owner":"jaywalnut310","description":" Solve Skiing-v0 by Using Deep Reinforcement Learning","archived":false,"fork":false,"pushed_at":"2019-01-17T05:57:28.000Z","size":433,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-30T14:51:11.854Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jaywalnut310.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-01-17T04:20:19.000Z","updated_at":"2023-11-30T12:28:57.000Z","dependencies_parsed_at":"2022-08-28T23:31:45.876Z","dependency_job_id":null,"html_url":"https://github.com/jaywalnut310/rl-atari-skiing","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaywalnut310%2Frl-atari-skiing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaywalnut310%2Frl-atari-skiing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaywalnut310%2Frl-atari-skiing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaywalnut310%2Frl-atari-skiing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jaywalnut310","download_url":"https://codeload.github.com/jaywalnut310/rl-atari-skiing/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251675746,"owners_count":21625881,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T02:53:37.673Z","updated_at":"2025-04-30T09:26:00.249Z","avatar_url":"https://github.com/jaywalnut310.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# rl-atari-skiing\nSolve [Skiing-v0](https://gym.openai.com/envs/Skiing-v0/) by Using Deep Reinforcement Learning.\nThe main objective is to solve the ski game in different manners.\nIt is quite an interesting and hard task because it gives almost rewards at the end of the game.\n\n- [Human Demonstrations](#human-demonstrations)\n- [Heuristic Markovian Agent](#heuristic-markovian-agent)\n- [DAgger](#dagger)\n\n\n\n## Human Demonstrations\nHuman demonstrations can be utilized for imitation learning, [guiding an agent](https://blog.openai.com/learning-montezumas-revenge-from-a-single-demonstration/) or etc.\n\nMake a dictionary which contains useful info (scores, observations, actions and snapshots) for training.\n\nOutput dictionary name would be 'skiing_records'.\n\nHandle the agent by arrow keys.\n```\npython make_human_data.py\n```\n\nThen, restore any state and facilitate it in your own way!\n```python\nimport gym\nimport numpy as np\nimport pickle\n\n# load human data and select one timestep\ninterval = 34\nn_replay = 18\nwith open('./skiing_records', 'rb') as f:\n  records = pickle.load(f)\nlen_records = len(records['observations'])\nreplay_ids = [0] + [len_records - (i+1) * interval for i in reversed(range(int(np.ceil(len_records/interval))-2))]\nreplay_id = replay_ids[n_replay]\nprint(records.keys(), '\\n', len(replay_ids), replay_ids)\n\n# make environment\nenv_name = \"Skiing-v0\"\nenv = gym.make(env_name)\n\n# restore state\nobserve = env.reset()\nenv.env.restore_full_state(records['snapshots'][replay_id])\nobserve = records['observations'][replay_id]\nscore = records['scores'][replay_id]\n```\n\n## Heuristic Markovian Agent\nAs creating human demonstrations is expensive, a heuristic agent, who plays ski not that bad, can be useful to test or guide rl agents. The agent estimates velocity of the player and locations of flags and the player, and uses them to determine which direction is appropriate to go through between the flags. Because it looks only previous and current frames, it is a 2nd-order markovian agent. \n\n![play-heuristic-markovain-agent](resources/heuristic_markovian_agent.gif)\n\nFor more information, please check [the jupyter notebook](heuristic_markovian_agent.ipynb).\n\n## DAgger\nI implement [DAgger algorithm](https://www.cs.cmu.edu/~sross1/publications/Ross-AIStats11-NoRegret.pdf) in a simplistic way. For simplicity, I get labels from [the heuristic agent](#heuristic-markovian-agent) instead of asking humans to label observations.\n\n![play-dagger](resources/dagger.gif)\n\nFor more information, please check [the jupyter notebook](dagger.ipynb).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaywalnut310%2Frl-atari-skiing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjaywalnut310%2Frl-atari-skiing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaywalnut310%2Frl-atari-skiing/lists"}