Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/RL-MLDM/alphagen

Generating sets of formulaic alpha (predictive) stock factors via reinforcement learning.
https://github.com/RL-MLDM/alphagen

quantitative-trading reinforcement-learning symbolic-regression

Last synced: 3 months ago
JSON representation

Generating sets of formulaic alpha (predictive) stock factors via reinforcement learning.

Host: GitHub
URL: https://github.com/RL-MLDM/alphagen
Owner: RL-MLDM
Created: 2022-07-05T05:35:50.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2023-11-15T05:39:59.000Z (over 1 year ago)
Last Synced: 2024-09-18T10:09:32.192Z (5 months ago)
Topics: quantitative-trading, reinforcement-learning, symbolic-regression
Language: Python
Homepage:
Size: 388 KB
Stars: 478
Watchers: 7
Forks: 159
Open Issues: 15
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-systematic-trading - AlphaGen - Automatic formulaic alpha generation with reinforcement learning. (Alpha Collections / Expression based alpha)

README

        # AlphaGen



    



Automatic formulaic alpha generation with reinforcement learning.

Paper *Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning* accepted by [KDD 2023](https://kdd.org/kdd2023/), Applied Data Science (ADS) track.

Paper available on [ACM DL](https://dl.acm.org/doi/10.1145/3580305.3599831) or [arXiv](https://arxiv.org/abs/2306.12964).

## How to reproduce?

Note that you can either use our builtin alpha calculation pipeline(see Choice 1), or implement an adapter to your own pipeline(see Choice 2).

### Choice 1: Stock data preparation

Builtin pipeline requires Qlib library and local-storaged stock data.

- READ THIS! We need some of the metadata (but not the actual stock price/volume data) given by Qlib, so follow the data preparing process in [Qlib](https://github.com/microsoft/qlib#data-preparation) first.

- The actual stock data we use are retrieved from [baostock](http://baostock.com/baostock/index.php/%E9%A6%96%E9%A1%B5), due to concerns on the timeliness and truthfulness of the data source used by Qlib.

- The data can be downloaded by running the script `data_collection/fetch_baostock_data.py`. The newly downloaded data is saved into `~/.qlib/qlib_data/cn_data_baostock_fwdadj` by default. This path can be customized to fit your specific needs, but make sure to use the correct path when loading the data (In `alphagen_qlib/stock_data.py`, function `StockData._init_qlib`, the path should be passed to qlib with `qlib.init(provider_uri=path)`).

### Choice 2: Adapt to external pipelines

Maybe you have better implements of alpha calculation, you can implement an adapter of `alphagen.data.calculator.AlphaCalculator`. The interface is defined as follows:

```python

class AlphaCalculator(metaclass=ABCMeta):

    @abstractmethod

    def calc_single_IC_ret(self, expr: Expression) -> float:

        'Calculate IC between a single alpha and a predefined target.'

    @abstractmethod

    def calc_single_rIC_ret(self, expr: Expression) -> float:

        'Calculate Rank IC between a single alpha and a predefined target.'

    @abstractmethod

    def calc_single_all_ret(self, expr: Expression) -> Tuple[float, float]:

        'Calculate both IC and Rank IC between a single alpha and a predefined target.'

    @abstractmethod

    def calc_mutual_IC(self, expr1: Expression, expr2: Expression) -> float:

        'Calculate IC between two alphas.'

    @abstractmethod

    def calc_pool_IC_ret(self, exprs: List[Expression], weights: List[float]) -> float:

        'First combine the alphas linearly,'

        'then Calculate IC between the linear combination and a predefined target.'

    @abstractmethod

    def calc_pool_rIC_ret(self, exprs: List[Expression], weights: List[float]) -> float:

        'First combine the alphas linearly,'

        'then Calculate Rank IC between the linear combination and a predefined target.'

    @abstractmethod

    def calc_pool_all_ret(self, exprs: List[Expression], weights: List[float]) -> Tuple[float, float]:

        'First combine the alphas linearly,'

        'then Calculate both IC and Rank IC between the linear combination and a predefined target.'

```

Reminder: the values evaluated from different alphas may have drastically different scales, we recommend that you should normalize them before combination.

### Before running

All principle components of our expriment are located in [train_maskable_ppo.py](train_maskable_ppo.py).

These parameters may help you build an `AlphaCalculator`:

- instruments (Set of instruments)

- start_time & end_time (Data range for each dataset)

- target (Target stock trend, e.g., 20d return rate)

These parameters will define a RL run:

- batch_size (PPO batch size)

- features_extractor_kwargs (Arguments for LSTM shared net)

- device (PyTorch device)

- save_path (Path for checkpoints)

- tensorboard_log (Path for TensorBoard)

### Run!

```shell

python train_maskable_ppo.py --seed=SEED --pool=POOL_CAPACITY --code=INSTRUMENTS --step=NUM_STEPS

```

Where `SEED` is random seed, e.g., `1` or `1,2`, `POOL_CAPACITY` is the size of combination model and, `NUM_STEPS` is the limit of RL steps.

### After running

- Model checkpoints and alpha pools are located in `save_path`;

    - The model is compatiable with [stable-baselines3](https://github.com/DLR-RM/stable-baselines3)

    - Alpha pools are formatted in human-readable JSON.

- Tensorboard logs are located in `tensorboard_log`.

## Baselines

### GP-based methods

[gplearn](https://github.com/trevorstephens/gplearn) implements Genetic Programming, a commonly used method for symbolic regression. We maintained a modified version of gplearn to make it compatiable with our task. The corresponding experiment scipt is [gp.py](gp.py)

### Deep Symbolic Regression

[DSO](https://github.com/brendenpetersen/deep-symbolic-optimization) is a mature deep learning framework for symbolic optimization tasks. We maintained a minimal version of DSO to make it compatiable with our task. The corresponding experiment scipt is [dso.py](dso.py)

## Repository Structure

- `/alphagen` contains the basic data structures and the essential modules for starting an alpha mining pipeline;

- `/alphagen_qlib` contains the qlib-specific APIs for data preparation;

- `/alphagen_generic` contains data structures and utils designed for our baselines, which basically follow [gplearn](https://github.com/trevorstephens/gplearn) APIs, but with modifications for quant pipeline;

- `/gplearn` and `/dso` contains modified versions of our baselines.

## Trading (Experimental)

We implemented some trading strategies based on Qlib. See [backtest.py](backtest.py) and [trade_decision.py](trade_decision.py) for demos.

## Citing our work

```bibtex

@inproceedings{alphagen,

    author = {Yu, Shuo and Xue, Hongyan and Ao, Xiang and Pan, Feiyang and He, Jia and Tu, Dandan and He, Qing},

    title = {Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning},

    year = {2023},

    doi = {10.1145/3580305.3599831},

    booktitle = {Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},

}

```

## Contributing

Feel free to submit Issues or Pull requests.

## Contributors

This work is maintained by the MLDM research group, [IIP, ICT, CAS](http://iip.ict.ac.cn/).

Maintainers include:

- [Hongyan Xue](https://github.com/xuehongyanL)

- [Shuo Yu](https://github.com/Chlorie)

Thanks to the following contributors:

- [@yigaza](https://github.com/yigaza)

Thanks to the following in-depth research on our project:

- *因子选股系列之九十五:DFQ强化学习因子组合挖掘系统*