{"id":13460800,"url":"https://github.com/wangshub/RL-Stock","last_synced_at":"2025-03-24T19:33:00.851Z","repository":{"id":37343495,"uuid":"249599851","full_name":"wangshub/RL-Stock","owner":"wangshub","description":"📈 如何用深度强化学习自动炒股","archived":false,"fork":false,"pushed_at":"2022-11-22T05:26:28.000Z","size":4488,"stargazers_count":2948,"open_issues_count":35,"forks_count":703,"subscribers_count":87,"default_branch":"master","last_synced_at":"2024-08-01T10:21:47.533Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wangshub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-24T03:04:23.000Z","updated_at":"2024-08-01T03:40:04.000Z","dependencies_parsed_at":"2022-07-12T12:31:45.961Z","dependency_job_id":null,"html_url":"https://github.com/wangshub/RL-Stock","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangshub%2FRL-Stock","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangshub%2FRL-Stock/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangshub%2FRL-Stock/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangshub%2FRL-Stock/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wangshub","download_url":"https://codeload.github.com/wangshub/RL-Stock/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222004307,"owners_count":16914876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T10:00:49.088Z","updated_at":"2024-10-29T06:30:53.503Z","avatar_url":"https://github.com/wangshub.png","language":"Jupyter Notebook","funding_links":[],"categories":["Strategies \u0026 Research","Jupyter Notebook","金融股票"],"sub_categories":["Time Series Data","网络服务_其他"],"readme":"# 📈 如何用深度强化学习自动炒股\n\n## 💡 初衷\n\n最近一段时间，受到新冠疫情的影响，股市接连下跌，作为一棵小白菜兼小韭菜，竟然产生了抄底的大胆想法，拿出仅存的一点私房钱梭哈了一把。\n\n第二天，暴跌，俺加仓\n\n第三天，又跌，俺加仓\n\n第三天，又跌，俺又加仓...\n\n\u003cimg src=\"img/2020-03-27-10-45-59.png\" alt=\"drawing\" width=\"50%\"/\u003e\n\n一番错误操作后，结果惨不忍睹，第一次买股票就被股市一段暴打，受到了媳妇无情的嘲讽。痛定思痛，俺决定换一个思路：**如何用深度强化学习来自动模拟炒股？** 实验验证一下能否获得收益。\n\n## 📖 监督学习与强化学习的区别\n\n监督学习（如 LSTM）可以根据各种历史数据来预测未来的股票的价格，判断股票是涨还是跌，帮助人做决策。\n\n\u003cimg src=\"img/2020-03-25-18-55-13.png\" alt=\"drawing\" width=\"50%\"/\u003e\n\n而强化学习是机器学习的另一个分支，在决策的时候采取合适的行动 (Action) 使最后的奖励最大化。与监督学习预测未来的数值不同，强化学习根据输入的状态（如当日开盘价、收盘价等），输出系列动作（例如：买进、持有、卖出），使得最后的收益最大化，实现自动交易。\n\n\u003cimg src=\"img/2020-03-25-18-19-03.png\" alt=\"drawing\" width=\"50%\"/\u003e\n\n## 🤖 OpenAI Gym 股票交易环境\n\n### 观测 Observation\n\n策略网络观测的就是一只股票的各项参数，比如开盘价、收盘价、成交数量等。部分数值会是一个很大的数值，比如成交金额或者成交量，有可能百万、千万乃至更大，为了训练时网络收敛，观测的状态数据输入时，必须要进行归一化，变换到 `[-1, 1]` 的区间内。\n\n|参数名称|参数描述|说明|\n|---|---|---|\n|date|交易所行情日期|格式：YYYY-MM-DD|\n|code|证券代码|格式：sh.600000。sh：上海，sz：深圳|\n|open|今开盘价格|精度：小数点后4位；单位：人民币元|\n|high|最高价|精度：小数点后4位；单位：人民币元|\n|low|最低价|精度：小数点后4位；单位：人民币元|\n|close|今收盘价|精度：小数点后4位；单位：人民币元|\n|preclose|昨日收盘价|精度：小数点后4位；单位：人民币元|\n|volume|成交数量|单位：股|\n|amount|成交金额|精度：小数点后4位；单位：人民币元|\n|adjustflag|复权状态|不复权、前复权、后复权|\n|turn|换手率|精度：小数点后6位；单位：%|\n|tradestatus|交易状态|1：正常交易 0：停牌|\n|pctChg|涨跌幅（百分比）|精度：小数点后6位|\n|peTTM|滚动市盈率|精度：小数点后6位|\n|psTTM|滚动市销率|精度：小数点后6位|\n|pcfNcfTTM|滚动市现率|精度：小数点后6位|\n|pbMRQ|市净率|精度：小数点后6位|\n\n### 动作 Action\n\n假设交易共有**买入**、**卖出**和**保持** 3 种操作，定义动作(`action`)为长度为 2 的数组\n\n- `action[0]` 为操作类型；\n- `action[1]` 表示买入或卖出百分比；\n\n| 动作类型 `action[0]` | 说明 |\n|---|---|\n| 1 | 买入 `action[1]`|\n| 2 | 卖出 `action[1]`|\n| 3 | 保持 |\n\n注意，当动作类型 `action[0] = 3` 时，表示不买也不抛售股票，此时 `action[1]` 的值无实际意义，网络在训练过程中，Agent 会慢慢学习到这一信息。\n\n### 奖励 Reward\n\n奖励函数的设计，对强化学习的目标至关重要。在股票交易的环境下，最应该关心的就是当前的盈利情况，故用当前的利润作为奖励函数。即`当前本金 + 股票价值 - 初始本金 = 利润`。\n\n```python\n# profits\nreward = self.net_worth - INITIAL_ACCOUNT_BALANCE\nreward = 1 if reward \u003e 0 else reward = -100\n```\n\n为了使网络更快学习到盈利的策略，当利润为负值时，给予网络一个较大的惩罚 (`-100`)。\n\n### 策略梯度\n\n因为动作输出的数值是连续，因此使用基于策略梯度的优化算法，其中比较知名的是 [PPO 算法](https://arxiv.org/abs/1707.06347)，OpenAI 和许多文献已把 PPO 作为强化学习研究中首选的算法。PPO 优化算法 Python 实现参考 [stable-baselines](https://stable-baselines.readthedocs.io/en/master/modules/ppo2.html)。\n\n## 🕵️‍♀️ 模拟实验\n\n### 环境安装\n\n```sh\n# 虚拟环境\nvirtualenv -p python3.6 venv\nsource ./venv/bin/activate\n# 安装库依赖\npip install -r requirements.txt\n```\n\n### 股票数据获取\n\n股票证券数据集来自于 [baostock](http://baostock.com/baostock/index.php/%E9%A6%96%E9%A1%B5)，一个免费、开源的证券数据平台，提供 Python API。\n\n```bash\n\u003e\u003e pip install baostock -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn\n```\n\n数据获取代码参考 [get_stock_data.py](https://github.com/wangshub/RL-Stock/blob/master/get_data.py)\n\n```python\n\u003e\u003e python get_stock_data.py\n```\n\n将过去 20 多年的股票数据划分为训练集，和末尾 1 个月数据作为测试集，来验证强化学习策略的有效性。划分如下\n\n| `1990-01-01` ~ `2019-11-29` | `2019-12-01` ~ `2019-12-31` |\n|---|---|\n| 训练集 | 测试集 |\n\n### 验证结果\n\n**单只股票**\n\n- 初始本金 `10000`\n- 股票代码：`sh.600036`(招商银行)\n- 训练集： `stockdata/train/sh.600036.招商银行.csv`\n- 测试集： `stockdata/test/sh.600036.招商银行.csv`\n- 模拟操作 `20` 天，最终盈利约 `400`\n\n\u003cimg src=\"img/sh.600036.png\" alt=\"drawing\" width=\"70%\"/\u003e\n\n**多只股票**\n\n选取 `1002` 只股票，进行训练，共计\n\n- 盈利： `44.5%`\n- 不亏不赚： `46.5%`\n- 亏损：`9.0%`\n\n\u003cimg src=\"img/pie.png\" alt=\"drawing\" width=\"50%\"/\u003e\n\n\u003cimg src=\"img/hist.png\" alt=\"drawing\" width=\"50%\"/\u003e\n\n## 👻 最后\n\n- 股票 Gym 环境主要参考 [Stock-Trading-Environment](https://github.com/notadamking/Stock-Trading-Environment)，对观测状态、奖励函数和训练集做了修改。\n- 俺完全是股票没入门的新手，难免存在错误，欢迎指正！\n- 数据和方法皆来源于网络，无法保证有效性，**Just For Fun**！\n\n## 📚 参考资料\n\n- Y. Deng, F. Bao, Y. Kong, Z. Ren and Q. Dai, \"Deep Direct Reinforcement Learning for Financial Signal Representation and Trading,\" in IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 653-664, March 2017.\n- [Yuqin Dai, Chris Wang, Iris Wang, Yilun Xu, \"Reinforcement Learning for FX trading\"](http://stanford.edu/class/msande448/2019/Final_reports/gr2.pdf)\n- Chien Yi Huang. Financial trading as a game: A deep reinforcement learning approach. arXiv preprint arXiv:1807.02787, 2018.\n- [Create custom gym environments from scratch — A stock market example](https://towardsdatascience.com/creating-a-custom-openai-gym-environment-for-stock-trading-be532be3910e)\n- [notadamking/Stock-Trading-Environment](https://github.com/notadamking/Stock-Trading-Environment)\n- [Welcome to Stable Baselines docs! - RL Baselines Made Easy](https://stable-baselines.readthedocs.io/en/master)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangshub%2FRL-Stock","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwangshub%2FRL-Stock","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangshub%2FRL-Stock/lists"}