{"id":15138134,"url":"https://github.com/qdsang/rocket_landing_simulation","last_synced_at":"2025-04-06T06:44:40.355Z","repository":{"id":246063679,"uuid":"819990290","full_name":"qdsang/rocket_landing_simulation","owner":"qdsang","description":"Rocket landing simulation using RL PPO","archived":false,"fork":false,"pushed_at":"2024-06-25T15:48:10.000Z","size":4812,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-12T12:21:44.936Z","etag":null,"topics":["landing","ppo","rl","rocket","spacex"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qdsang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-25T15:21:40.000Z","updated_at":"2024-07-09T05:52:43.000Z","dependencies_parsed_at":"2024-06-25T17:17:25.713Z","dependency_job_id":"275371ad-47ed-4a9e-a9e8-9ceb1d0add0e","html_url":"https://github.com/qdsang/rocket_landing_simulation","commit_stats":null,"previous_names":["qdsang/rocket_landing_simulation"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qdsang%2Frocket_landing_simulation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qdsang%2Frocket_landing_simulation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qdsang%2Frocket_landing_simulation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qdsang%2Frocket_landing_simulation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qdsang","download_url":"https://codeload.github.com/qdsang/rocket_landing_simulation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247445651,"owners_count":20939953,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["landing","ppo","rl","rocket","spacex"],"created_at":"2024-09-26T07:20:50.455Z","updated_at":"2025-04-06T06:44:40.337Z","avatar_url":"https://github.com/qdsang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rocket landing based on RL PPO implementation\n基于PPO强化学习的火箭降落控制\n\n- reinforcement learning PPO\n- stable_baselines3\n\n\n## Preview\n\n\u003ctable\u003e\n\u003ctr\u003e\n    \u003ctd\u003e\u003cimg src=\"./docs/preview.gif\" title=\"Rocket Landing Preview\" /\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"./docs/preview2.gif\" title=\"Rocket Landing Preview\" /\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"./docs/preview3.gif\" title=\"Rocket Landing Preview\" /\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\n## Installation\n\n\n```bash\n\n# mac\nbrew install swig\n\n# windows\n# download swig: https://sourceforge.net/projects/swig/files/swigwin/\n# add environment variable to PATH\n# install vs .... 😌\n\n# pythin env\npython3.9 -m venv .venv\nsource ./.venv/bin/activate \n\n# python requirements\npip install -r requirements.txt\n\npython main.py test\n\n```\n\n\n\n## TF logs\n\n```\ntensorboard --logdir ./log\n```\n\n## stable_baselines3 verbose\n\n### Rollout (回合数据)\n- `ep_len_mean`: 平均每个episodes(回合)的长度。当前是479，这通常表明智能体在回合中的表现。\n- `ep_rew_mean`: 平均每个回合的奖励，当前是 -3.43。负值通常说明智能体表现尚未达到期望。\n\n关注重点：\n- `ep_len_mean` 和 `ep_rew_mean` 是监督模型学习过程表现的重要指标。\n\n### Time (时间数据)\n- `fps`: 每秒的帧数，当前是 1223。这个指标反映了算法运行的速度，不需要过分关注。\n- `iterations`: 当前训练迭代次数，已到28次，这一般反映训练的进度。\n- `time_elapsed`: 已经经过的训练时间，单位是秒，当前375秒。\n- `total_timesteps`: 累计的时间步数，当前是 458752，这通常表示训练规模。\n\n### Train (训练数据)\n- `approx_kl`: 近似的KL散度，代表新老策略之间的差异，0.009表示较小变化。\n- `clip_fraction`: 被裁剪的比率，表明策略更新时有多少采样被裁剪，这里是0.102。\n- `clip_range`: 裁剪范围（默认0.2）。\n- `entropy_loss`: 熵损失，反映策略的随机性，-3.77表示策略不那么随机。\n- `explained_variance`: 解释方差，0.981非常接近1，表明价值函数能够很好地拟合回报。\n- `learning_rate`: 学习率，0.0003是默认学习率，影响模型的更新速度。\n- `loss`: 总损失，0.009表示模型损失较小。\n- `n_updates`: 迭代更新数，当前是270，是重要的训练过程指标。\n- `policy_gradient_loss`: 策略梯度损失，-0.00523表示负值，通常是降低总损失。\n- `std`: 标准差，这里的0.854须配合其他标准观察。\n- `value_loss`: 值函数损失，0.000236表示值函数（返回回报的预计函数）拟合得较好。\n\n关注重点：\n- `approx_kl`, `entropy_loss`, `explained_variance`, `value_loss` 是监督策略梯度和价值函数的重要指标。\n- `clip_fraction`、`loss`也需要关注，其高或低反映了模型更新过程中的健康程度和训练效果。\n\n### 总结\n尤为需要关注的是`ep_rew_mean`、`explained_variance`、`value_loss`等，它们直接表明了智能体的表现和模型的拟合状况。当前这些数据表明模型表现尚可但有提升空间：负现金流（负奖励）和总时间步数共同揭示训练规模和效果，训练时间是否高效优化要结合设备性能。\n\n\n\n\n```bash\n#截屏 MP4 to gif 文件\nffmpeg -i ./xx.mp4 -r 10 -pix_fmt rgb24 output.gif\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqdsang%2Frocket_landing_simulation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqdsang%2Frocket_landing_simulation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqdsang%2Frocket_landing_simulation/lists"}