https://github.com/chirindaopensource/continuous_time_rl_for_alm
End-to-end Python implementation of Huang's (2025) continuous-time RL methodology for asset-liability management. Features model-free soft actor-critic with adaptive exploration, entropy regularization, and Euler-Maruyama SDE simulation. Includes 7 baselines (SAC/PPO/DDPG/CPPI/ACS/MBP), parallelized execution, and Wilcoxon statistical validation.
https://github.com/chirindaopensource/continuous_time_rl_for_alm
actor-critic algorithmic-trading asset-liability-management continuous-time deep-reinforcement-learning entropy-regularization financial-engineering gymnasium numerical-methods optimal-control policy-gradient portfolio-optimization python pytorch quantitative-finance reinforcement-learning risk-management soft-actor-critic stochastic-control stochastic-differential-equations
Last synced: 27 days ago
JSON representation
End-to-end Python implementation of Huang's (2025) continuous-time RL methodology for asset-liability management. Features model-free soft actor-critic with adaptive exploration, entropy regularization, and Euler-Maruyama SDE simulation. Includes 7 baselines (SAC/PPO/DDPG/CPPI/ACS/MBP), parallelized execution, and Wilcoxon statistical validation.
- Host: GitHub
- URL: https://github.com/chirindaopensource/continuous_time_rl_for_alm
- Owner: chirindaopensource
- License: mit
- Created: 2025-10-05T13:02:13.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-10-05T14:58:22.000Z (8 months ago)
- Last Synced: 2025-10-05T15:06:38.803Z (8 months ago)
- Topics: actor-critic, algorithmic-trading, asset-liability-management, continuous-time, deep-reinforcement-learning, entropy-regularization, financial-engineering, gymnasium, numerical-methods, optimal-control, policy-gradient, portfolio-optimization, python, pytorch, quantitative-finance, reinforcement-learning, risk-management, soft-actor-critic, stochastic-control, stochastic-differential-equations
- Language: Jupyter Notebook
- Homepage:
- Size: 108 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# **`README.md`**
# Continuous-Time Reinforcement Learning for Asset-Liability Management
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/)
[](https://arxiv.org/abs/2509.23280)
[](https://icaif.acm.org/2025/)
[](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)
[](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)
[](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)
[](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)
[](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)
[](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)
[](https://github.com/psf/black)
[](http://mypy-lang.org/)
[](https://numpy.org/)
[](https://pandas.pydata.org/)
[](https://pytorch.org/)
[](https://scipy.org/)
[](https://gymnasium.farama.org/)
[](https://jupyter.org/)
--
**Repository:** `https://github.com/chirindaopensource/continuous_time_rl_for_alm`
**Owner:** 2025 Craig Chirinda (Open Source Projects)
This repository contains an **independent**, professional-grade Python implementation of the research methodology from the 2025 paper entitled **"Continuous-Time Reinforcement Learning for Asset-Liability Management"** by:
* Yilie Huang
The project provides a complete, end-to-end computational framework for replicating the paper's novel continuous-time reinforcement learning approach to ALM. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from rigorous, reproducible experimental setup and parallelized simulation to comprehensive statistical analysis and the generation of all publication-quality figures and tables.
## Table of Contents
- [Introduction](#introduction)
- [Theoretical Background](#theoretical-background)
- [Features](#features)
- [Methodology Implemented](#methodology-implemented)
- [Core Components (Notebook Structure)](#core-components-notebook-structure)
- [Key Callables](#key-callables)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Input Data Structure](#input-data-structure)
- [Usage](#usage)
- [Output Structure](#output-structure)
- [Project Structure](#project-structure)
- [Customization](#customization)
- [Contributing](#contributing)
- [Recommended Extensions](#recommended-extensions)
- [License](#license)
- [Citation](#citation)
- [Acknowledgments](#acknowledgments)
## Introduction
This project provides a Python implementation of the methodologies presented in the 2025 paper "Continuous-Time Reinforcement Learning for Asset-Liability Management." The core of this repository is the iPython Notebook `continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb`, which contains a comprehensive suite of functions to replicate the paper's findings, from initial data validation to the final generation of all analytical tables and figures.
The paper introduces a novel model-free, continuous-time reinforcement learning (RL) algorithm for the Asset-Liability Management (ALM) problem. It frames the problem as a Linear-Quadratic (LQ) control task and develops a soft actor-critic method with adaptive exploration to dynamically manage the surplus deviation between assets and liabilities. This codebase operationalizes this framework, allowing users to:
- Rigorously validate and manage the entire experimental configuration.
- Systematically generate reproducible, randomized market scenarios based on a stochastic differential equation (SDE) model.
- Execute large-scale, parallelized simulations comparing the proposed ALM-RL agent against six distinct baselines.
- Perform comprehensive statistical analysis using non-parametric tests to validate performance claims.
- Conduct a full suite of robustness analyses, including hyperparameter sensitivity, market parameter stress tests, and discretization analysis.
## Theoretical Background
The implemented methods are grounded in stochastic optimal control, reinforcement learning, and numerical methods for SDEs.
**1. ALM as a Linear-Quadratic (LQ) Control Problem:**
The core of the problem is to control the surplus deviation, `x(t)`, from a target. Its dynamics are modeled by the SDE:
$$
dx(t) = (A x(t) + B u(t))dt + (C x(t) + D u(t))dW(t)
$$
where `u(t)` is the control action. The objective is to maximize the expected value of a quadratic functional that penalizes deviations over a finite horizon `[0, T]`:
$$
\max_{u} \mathbb{E}\left[ \int_{0}^{T} -\frac{1}{2}Qx(t)^2 dt - \frac{1}{2}Hx(T)^2 \right]
$$
**2. Continuous-Time Soft Actor-Critic:**
Since the market parameters `A, B, C, D` are unknown, a model-free RL approach is used. The paper develops a continuous-time soft actor-critic algorithm based on an entropy-regularized objective:
$$
J(t, x; \pi) = \mathbb{E}\left[ \int_{t}^{T} \left(-\frac{1}{2}Qx(s)^2 + \gamma p(s)\right) ds - \frac{1}{2}Hx(T)^2 \Big| x(t)=x \right]
$$
where `p(s)` is the entropy of the stochastic policy `π`.
**3. Key Algorithmic Features:**
- **Parametric Forms:** Based on LQ theory, the value function `J` is parameterized as a quadratic function of `x`, and the policy `π` is a Gaussian distribution whose mean is linear in `x`.
- **Adaptive Exploration:** The policy's variance (actor exploration) is learned via policy gradient.
- **Scheduled Exploration:** The entropy temperature `γ` (critic exploration) follows a deterministic, decaying schedule.
- **Update Rules:** The agent learns via discretized versions of continuous-time temporal difference and policy gradient updates (Eqs. 16, 17, 18 in the paper).
## Features
The provided iPython Notebook (`continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb`) implements the full research pipeline, including:
- **Modular, Multi-Task Architecture:** The entire pipeline is broken down into 13 distinct, modular tasks, each with its own orchestrator function, covering validation, setup, simulation, analysis, and reporting.
- **Configuration-Driven Design:** All experimental parameters are managed in an external `config.yaml` file, allowing for easy customization and replication without code changes.
- **Multi-Algorithm Support:** Complete, from-scratch implementations of the proposed **ALM-RL** agent and six baselines: **DCPPI**, **ACS**, **MBP**, **SAC**, **PPO**, and **DDPG**.
- **Rigorous Reproducibility:** A multi-level seeding protocol ensures bitwise reproducibility of market scenarios and isolates stochastic streams for fair agent comparison.
- **Parallelized Execution:** The main experimental pipeline is designed for parallel execution across multiple CPU cores, dramatically reducing the time required for the 200 independent runs.
- **Comprehensive Analysis Suite:** Implements the full statistical analysis from the paper, including moving average smoothing, terminal performance extraction, and one-sided Wilcoxon signed-rank tests.
- **Robustness Analysis Module:** Includes a full suite of post-hoc analyses to test hyperparameter sensitivity, robustness to extreme market conditions, and sensitivity to SDE discretization.
- **Automated Reporting:** Programmatic generation of all key tables and figures from the paper.
## Methodology Implemented
The core analytical steps directly implement the methodology from the paper:
1. **Validation (Task 1):** Ingests and rigorously validates the `config.yaml` for structural, mathematical, and logical consistency.
2. **Setup (Task 2):** Establishes the deterministic seeding hierarchy for the entire experiment.
3. **Initialization (Task 3):** Generates the 200 randomized market scenarios and the corresponding initial parameters for all agents.
4. **Agent & Environment Implementation (Tasks 4-7):** Provides complete, professional-grade implementations of all agents and the SDE environment.
5. **Execution (Task 8):** Runs the main simulation pipeline in parallel, executing 20,000 episodes for each of the 7 agents across all 200 market scenarios.
6. **Metrics & Analysis (Tasks 9-10):** Processes the raw simulation data to compute smoothed learning curves, terminal performance, and the final p-value matrix.
7. **Visualization (Task 11):** Generates the final, publication-quality plots and summary tables.
8. **Orchestration & Robustness (Tasks 12-13):** Provides top-level orchestrators to run the main pipeline and the additional robustness analyses.
## Core Components (Notebook Structure)
The `continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb` notebook is structured as a logical pipeline with modular orchestrator functions for each of the major tasks. All functions are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.
## Key Callables
The project is designed around a single, top-level user-facing interface function:
- **`main`:** This master orchestrator function, located in the final section of the notebook, runs the entire automated research pipeline from end-to-end. It can be configured to run the main reproduction experiment, the robustness analyses, or both. A single call to this function reproduces the entire computational portion of the project.
## Prerequisites
- Python 3.9+
- Core dependencies: `numpy`, `pandas`, `scipy`, `pyyaml`, `torch`, `gymnasium`, `matplotlib`, `seaborn`, `tqdm`.
## Installation
1. **Clone the repository:**
```sh
git clone https://github.com/chirindaopensource/continuous_time_rl_for_alm.git
cd continuous_time_reinforcement_learning_asset_liability_management
```
2. **Create and activate a virtual environment (recommended):**
```sh
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
```
3. **Install Python dependencies:**
```sh
pip install numpy pandas scipy pyyaml torch gymnasium matplotlib seaborn tqdm
```
## Input Data Structure
The pipeline is driven by a single `config.yaml` file. No external datasets are required, as the market scenarios are procedurally generated based on the parameters within this file.
## Usage
The `continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb` notebook provides a complete, step-by-step guide. The primary workflow is to execute the final cell of the notebook, which calls the top-level `main` orchestrator:
```python
# Final cell of the notebook or in a main.py script
# Load the configuration from the YAML file.
STUDY_INPUTS = load_config('config.yaml')
# Run the entire study (reproduction and robustness analysis).
final_artifacts = main(
study_params=STUDY_INPUTS,
run_reproduction=True,
run_robustness=True,
num_workers=8 # Adjust based on available CPU cores
)
# The `final_artifacts` dictionary will contain the key results DataFrames.
```
## Output Structure
The `main` function creates one or two output directories (`alm_rl_reproduction_output/` and `alm_rl_robustness_output/`) with the following structure:
```
output_directory/
│
├── data/
│ ├── seed_table.csv
│ ├── market_params_table.csv
│ ├── alm_rl_initial_table.csv
│ ├── baselines_initial_table.csv
│ ├── raw_results.npy
│ ├── learning_curves.csv
│ ├── terminal_performance.csv
│ └── p_value_matrix.csv
│
├── figures/
│ ├── figure1_learning_curves.png
│ └── figure2_p_value_heatmap.png
│
└── tables/
└── table1_summary_statistics.html
```
## Project Structure
```
continuous_time_rl_for_alm/
│
├── continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb # Main implementation notebook
├── config.yaml # Master configuration file
├── requirements.txt # Python package dependencies
├── LICENSE # MIT license file
└── README.md # This documentation file
```
## Customization
The pipeline is highly customizable via the `config.yaml` file. Users can easily modify all experimental parameters, including the number of runs/episodes, SDE parameter distributions, agent hyperparameters, and evaluation settings, without altering the core Python code.
## Contributing
Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.
## Recommended Extensions
Future extensions could include:
- **Alternative SDE Models:** Integrating more complex market models, such as those with stochastic volatility (e.g., Heston model) or jumps.
- **Multi-Asset Formulations:** Extending the state and action spaces to handle a portfolio of multiple assets.
- **Automated Hyperparameter Tuning:** Wrapping the pipeline with a hyperparameter optimization library (e.g., Optuna) to automatically find the best settings for the ALM-RL agent.
- **Real-World Data Application:** Adapting the framework to use historical financial data by first estimating the SDE parameters from time series data.
## License
This project is licensed under the MIT License.
## Citation
If you use this code or the methodology in your research, please cite the original paper:
```bibtex
@inproceedings{huang2025continuous,
author = {Huang, Yilie},
title = {Continuous-Time Reinforcement Learning for Asset-Liability Management},
booktitle = {Proceedings of the 6th ACM International Conference on AI in Finance},
series = {ICAIF '25},
year = {2025},
publisher = {ACM},
note = {arXiv:2509.23280}
}
```
For the implementation itself, you may cite this repository:
```
Chirinda, C. (2025). A Professional-Grade Implementation of the "Continuous-Time RL for ALM" Framework.
GitHub repository: https://github.com/chirindaopensource/continuous_time_rl_for_alm
```
## Acknowledgments
- Credit to **Yilie Huang** for the foundational research that forms the entire basis for this computational replication.
- This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including **NumPy, Pandas, SciPy, PyTorch, Gymnasium, Matplotlib, and Jupyter**, whose work makes complex computational analysis accessible and robust.
--
*This README was generated based on the structure and content of `continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb` and follows best practices for research software documentation.*