{"id":31696387,"url":"https://github.com/chirindaopensource/continuous_time_rl_for_alm","last_synced_at":"2026-05-08T07:32:41.112Z","repository":{"id":318158862,"uuid":"1070190279","full_name":"chirindaopensource/continuous_time_rl_for_alm","owner":"chirindaopensource","description":"End-to-end Python implementation of Huang's (2025) continuous-time RL methodology for asset-liability management. Features model-free soft actor-critic with adaptive exploration, entropy regularization, and Euler-Maruyama SDE simulation. Includes 7 baselines (SAC/PPO/DDPG/CPPI/ACS/MBP), parallelized execution, and Wilcoxon statistical validation.","archived":false,"fork":false,"pushed_at":"2025-10-05T14:58:22.000Z","size":111,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-05T15:06:38.803Z","etag":null,"topics":["actor-critic","algorithmic-trading","asset-liability-management","continuous-time","deep-reinforcement-learning","entropy-regularization","financial-engineering","gymnasium","numerical-methods","optimal-control","policy-gradient","portfolio-optimization","python","pytorch","quantitative-finance","reinforcement-learning","risk-management","soft-actor-critic","stochastic-control","stochastic-differential-equations"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chirindaopensource.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-05T13:02:13.000Z","updated_at":"2025-10-05T14:58:25.000Z","dependencies_parsed_at":"2025-10-05T15:06:41.500Z","dependency_job_id":"5ae62b78-7942-4160-82c3-341e1740a040","html_url":"https://github.com/chirindaopensource/continuous_time_rl_for_alm","commit_stats":null,"previous_names":["chirindaopensource/continuous_time_rl_for_alm"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/chirindaopensource/continuous_time_rl_for_alm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chirindaopensource%2Fcontinuous_time_rl_for_alm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chirindaopensource%2Fcontinuous_time_rl_for_alm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chirindaopensource%2Fcontinuous_time_rl_for_alm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chirindaopensource%2Fcontinuous_time_rl_for_alm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chirindaopensource","download_url":"https://codeload.github.com/chirindaopensource/continuous_time_rl_for_alm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chirindaopensource%2Fcontinuous_time_rl_for_alm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32771013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T02:36:36.067Z","status":"ssl_error","status_checked_at":"2026-05-08T02:36:07.210Z","response_time":54,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actor-critic","algorithmic-trading","asset-liability-management","continuous-time","deep-reinforcement-learning","entropy-regularization","financial-engineering","gymnasium","numerical-methods","optimal-control","policy-gradient","portfolio-optimization","python","pytorch","quantitative-finance","reinforcement-learning","risk-management","soft-actor-critic","stochastic-control","stochastic-differential-equations"],"created_at":"2025-10-08T17:02:24.173Z","updated_at":"2026-05-08T07:32:41.100Z","avatar_url":"https://github.com/chirindaopensource.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **`README.md`**\n\n# Continuous-Time Reinforcement Learning for Asset-Liability Management\n\n\u003c!-- PROJECT SHIELDS --\u003e\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Python Version](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/)\n[![arXiv](https://img.shields.io/badge/arXiv-2509.23280-b31b1b.svg)](https://arxiv.org/abs/2509.23280)\n[![Conference](https://img.shields.io/badge/Conference-ICAIF%20'25-9cf)](https://icaif.acm.org/2025/)\n[![Year](https://img.shields.io/badge/Year-2025-purple)](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)\n[![Discipline](https://img.shields.io/badge/Discipline-Quantitative%20Finance%20%7C%20RL-00529B)](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)\n[![Primary Data](https://img.shields.io/badge/Data-Simulated%20SDE-lightgrey)](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)\n[![Core Method](https://img.shields.io/badge/Method-Continuous--Time%20RL-orange)](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)\n[![Key Concepts](https://img.shields.io/badge/Concepts-LQ%20Control%20%7C%20Soft%20Actor--Critic-red)](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)\n[![Baselines](https://img.shields.io/badge/Baselines-SAC%20%7C%20PPO%20%7C%20DDPG%20%7C%20CPPI-blueviolet)](https://github.com/chirindaopensource/continuous_time_reinforcement_learning_asset_liability_management)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Type Checking: mypy](https://img.shields.io/badge/type%20checking-mypy-blue)](http://mypy-lang.org/)\n[![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=flat\u0026logo=numpy\u0026logoColor=white)](https://numpy.org/)\n[![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=flat\u0026logo=pandas\u0026logoColor=white)](https://pandas.pydata.org/)\n[![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=flat\u0026logo=PyTorch\u0026logoColor=white)](https://pytorch.org/)\n[![SciPy](https://img.shields.io/badge/SciPy-%230C55A5.svg?style=flat\u0026logo=scipy\u0026logoColor=white)](https://scipy.org/)\n[![Gymnasium](https://img.shields.io/badge/Gymnasium-0086D1?style=flat)](https://gymnasium.farama.org/)\n[![Jupyter](https://img.shields.io/badge/Jupyter-%23F37626.svg?style=flat\u0026logo=Jupyter\u0026logoColor=white)](https://jupyter.org/)\n--\n\n**Repository:** `https://github.com/chirindaopensource/continuous_time_rl_for_alm`\n\n**Owner:** 2025 Craig Chirinda (Open Source Projects)\n\nThis repository contains an **independent**, professional-grade Python implementation of the research methodology from the 2025 paper entitled **\"Continuous-Time Reinforcement Learning for Asset-Liability Management\"** by:\n\n*   Yilie Huang\n\nThe project provides a complete, end-to-end computational framework for replicating the paper's novel continuous-time reinforcement learning approach to ALM. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from rigorous, reproducible experimental setup and parallelized simulation to comprehensive statistical analysis and the generation of all publication-quality figures and tables.\n\n## Table of Contents\n\n- [Introduction](#introduction)\n- [Theoretical Background](#theoretical-background)\n- [Features](#features)\n- [Methodology Implemented](#methodology-implemented)\n- [Core Components (Notebook Structure)](#core-components-notebook-structure)\n- [Key Callables](#key-callables)\n- [Prerequisites](#prerequisites)\n- [Installation](#installation)\n- [Input Data Structure](#input-data-structure)\n- [Usage](#usage)\n- [Output Structure](#output-structure)\n- [Project Structure](#project-structure)\n- [Customization](#customization)\n- [Contributing](#contributing)\n- [Recommended Extensions](#recommended-extensions)\n- [License](#license)\n- [Citation](#citation)\n- [Acknowledgments](#acknowledgments)\n\n## Introduction\n\nThis project provides a Python implementation of the methodologies presented in the 2025 paper \"Continuous-Time Reinforcement Learning for Asset-Liability Management.\" The core of this repository is the iPython Notebook `continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb`, which contains a comprehensive suite of functions to replicate the paper's findings, from initial data validation to the final generation of all analytical tables and figures.\n\nThe paper introduces a novel model-free, continuous-time reinforcement learning (RL) algorithm for the Asset-Liability Management (ALM) problem. It frames the problem as a Linear-Quadratic (LQ) control task and develops a soft actor-critic method with adaptive exploration to dynamically manage the surplus deviation between assets and liabilities. This codebase operationalizes this framework, allowing users to:\n-   Rigorously validate and manage the entire experimental configuration.\n-   Systematically generate reproducible, randomized market scenarios based on a stochastic differential equation (SDE) model.\n-   Execute large-scale, parallelized simulations comparing the proposed ALM-RL agent against six distinct baselines.\n-   Perform comprehensive statistical analysis using non-parametric tests to validate performance claims.\n-   Conduct a full suite of robustness analyses, including hyperparameter sensitivity, market parameter stress tests, and discretization analysis.\n\n## Theoretical Background\n\nThe implemented methods are grounded in stochastic optimal control, reinforcement learning, and numerical methods for SDEs.\n\n**1. ALM as a Linear-Quadratic (LQ) Control Problem:**\nThe core of the problem is to control the surplus deviation, `x(t)`, from a target. Its dynamics are modeled by the SDE:\n$$\ndx(t) = (A x(t) + B u(t))dt + (C x(t) + D u(t))dW(t)\n$$\nwhere `u(t)` is the control action. The objective is to maximize the expected value of a quadratic functional that penalizes deviations over a finite horizon `[0, T]`:\n$$\n\\max_{u} \\mathbb{E}\\left[ \\int_{0}^{T} -\\frac{1}{2}Qx(t)^2 dt - \\frac{1}{2}Hx(T)^2 \\right]\n$$\n\n**2. Continuous-Time Soft Actor-Critic:**\nSince the market parameters `A, B, C, D` are unknown, a model-free RL approach is used. The paper develops a continuous-time soft actor-critic algorithm based on an entropy-regularized objective:\n$$\nJ(t, x; \\pi) = \\mathbb{E}\\left[ \\int_{t}^{T} \\left(-\\frac{1}{2}Qx(s)^2 + \\gamma p(s)\\right) ds - \\frac{1}{2}Hx(T)^2 \\Big| x(t)=x \\right]\n$$\nwhere `p(s)` is the entropy of the stochastic policy `π`.\n\n**3. Key Algorithmic Features:**\n-   **Parametric Forms:** Based on LQ theory, the value function `J` is parameterized as a quadratic function of `x`, and the policy `π` is a Gaussian distribution whose mean is linear in `x`.\n-   **Adaptive Exploration:** The policy's variance (actor exploration) is learned via policy gradient.\n-   **Scheduled Exploration:** The entropy temperature `γ` (critic exploration) follows a deterministic, decaying schedule.\n-   **Update Rules:** The agent learns via discretized versions of continuous-time temporal difference and policy gradient updates (Eqs. 16, 17, 18 in the paper).\n\n## Features\n\nThe provided iPython Notebook (`continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb`) implements the full research pipeline, including:\n\n-   **Modular, Multi-Task Architecture:** The entire pipeline is broken down into 13 distinct, modular tasks, each with its own orchestrator function, covering validation, setup, simulation, analysis, and reporting.\n-   **Configuration-Driven Design:** All experimental parameters are managed in an external `config.yaml` file, allowing for easy customization and replication without code changes.\n-   **Multi-Algorithm Support:** Complete, from-scratch implementations of the proposed **ALM-RL** agent and six baselines: **DCPPI**, **ACS**, **MBP**, **SAC**, **PPO**, and **DDPG**.\n-   **Rigorous Reproducibility:** A multi-level seeding protocol ensures bitwise reproducibility of market scenarios and isolates stochastic streams for fair agent comparison.\n-   **Parallelized Execution:** The main experimental pipeline is designed for parallel execution across multiple CPU cores, dramatically reducing the time required for the 200 independent runs.\n-   **Comprehensive Analysis Suite:** Implements the full statistical analysis from the paper, including moving average smoothing, terminal performance extraction, and one-sided Wilcoxon signed-rank tests.\n-   **Robustness Analysis Module:** Includes a full suite of post-hoc analyses to test hyperparameter sensitivity, robustness to extreme market conditions, and sensitivity to SDE discretization.\n-   **Automated Reporting:** Programmatic generation of all key tables and figures from the paper.\n\n## Methodology Implemented\n\nThe core analytical steps directly implement the methodology from the paper:\n\n1.  **Validation (Task 1):** Ingests and rigorously validates the `config.yaml` for structural, mathematical, and logical consistency.\n2.  **Setup (Task 2):** Establishes the deterministic seeding hierarchy for the entire experiment.\n3.  **Initialization (Task 3):** Generates the 200 randomized market scenarios and the corresponding initial parameters for all agents.\n4.  **Agent \u0026 Environment Implementation (Tasks 4-7):** Provides complete, professional-grade implementations of all agents and the SDE environment.\n5.  **Execution (Task 8):** Runs the main simulation pipeline in parallel, executing 20,000 episodes for each of the 7 agents across all 200 market scenarios.\n6.  **Metrics \u0026 Analysis (Tasks 9-10):** Processes the raw simulation data to compute smoothed learning curves, terminal performance, and the final p-value matrix.\n7.  **Visualization (Task 11):** Generates the final, publication-quality plots and summary tables.\n8.  **Orchestration \u0026 Robustness (Tasks 12-13):** Provides top-level orchestrators to run the main pipeline and the additional robustness analyses.\n\n## Core Components (Notebook Structure)\n\nThe `continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb` notebook is structured as a logical pipeline with modular orchestrator functions for each of the major tasks. All functions are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.\n\n## Key Callables\n\nThe project is designed around a single, top-level user-facing interface function:\n\n-   **`main`:** This master orchestrator function, located in the final section of the notebook, runs the entire automated research pipeline from end-to-end. It can be configured to run the main reproduction experiment, the robustness analyses, or both. A single call to this function reproduces the entire computational portion of the project.\n\n## Prerequisites\n\n-   Python 3.9+\n-   Core dependencies: `numpy`, `pandas`, `scipy`, `pyyaml`, `torch`, `gymnasium`, `matplotlib`, `seaborn`, `tqdm`.\n\n## Installation\n\n1.  **Clone the repository:**\n    ```sh\n    git clone https://github.com/chirindaopensource/continuous_time_rl_for_alm.git\n    cd continuous_time_reinforcement_learning_asset_liability_management\n    ```\n\n2.  **Create and activate a virtual environment (recommended):**\n    ```sh\n    python -m venv venv\n    source venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`\n    ```\n\n3.  **Install Python dependencies:**\n    ```sh\n    pip install numpy pandas scipy pyyaml torch gymnasium matplotlib seaborn tqdm\n    ```\n\n## Input Data Structure\n\nThe pipeline is driven by a single `config.yaml` file. No external datasets are required, as the market scenarios are procedurally generated based on the parameters within this file.\n\n## Usage\n\nThe `continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb` notebook provides a complete, step-by-step guide. The primary workflow is to execute the final cell of the notebook, which calls the top-level `main` orchestrator:\n\n```python\n# Final cell of the notebook or in a main.py script\n\n# Load the configuration from the YAML file.\nSTUDY_INPUTS = load_config('config.yaml')\n\n# Run the entire study (reproduction and robustness analysis).\nfinal_artifacts = main(\n    study_params=STUDY_INPUTS,\n    run_reproduction=True,\n    run_robustness=True,\n    num_workers=8  # Adjust based on available CPU cores\n)\n\n# The `final_artifacts` dictionary will contain the key results DataFrames.\n```\n\n## Output Structure\n\nThe `main` function creates one or two output directories (`alm_rl_reproduction_output/` and `alm_rl_robustness_output/`) with the following structure:\n\n```\noutput_directory/\n│\n├── data/\n│   ├── seed_table.csv\n│   ├── market_params_table.csv\n│   ├── alm_rl_initial_table.csv\n│   ├── baselines_initial_table.csv\n│   ├── raw_results.npy\n│   ├── learning_curves.csv\n│   ├── terminal_performance.csv\n│   └── p_value_matrix.csv\n│\n├── figures/\n│   ├── figure1_learning_curves.png\n│   └── figure2_p_value_heatmap.png\n│\n└── tables/\n    └── table1_summary_statistics.html\n```\n\n## Project Structure\n\n```\ncontinuous_time_rl_for_alm/\n│\n├── continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb # Main implementation notebook\n├── config.yaml                                                                   # Master configuration file\n├── requirements.txt                                                              # Python package dependencies\n├── LICENSE                                                                       # MIT license file\n└── README.md                                                                     # This documentation file\n```\n\n## Customization\n\nThe pipeline is highly customizable via the `config.yaml` file. Users can easily modify all experimental parameters, including the number of runs/episodes, SDE parameter distributions, agent hyperparameters, and evaluation settings, without altering the core Python code.\n\n## Contributing\n\nContributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.\n\n## Recommended Extensions\n\nFuture extensions could include:\n-   **Alternative SDE Models:** Integrating more complex market models, such as those with stochastic volatility (e.g., Heston model) or jumps.\n-   **Multi-Asset Formulations:** Extending the state and action spaces to handle a portfolio of multiple assets.\n-   **Automated Hyperparameter Tuning:** Wrapping the pipeline with a hyperparameter optimization library (e.g., Optuna) to automatically find the best settings for the ALM-RL agent.\n-   **Real-World Data Application:** Adapting the framework to use historical financial data by first estimating the SDE parameters from time series data.\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Citation\n\nIf you use this code or the methodology in your research, please cite the original paper:\n\n```bibtex\n@inproceedings{huang2025continuous,\n  author    = {Huang, Yilie},\n  title     = {Continuous-Time Reinforcement Learning for Asset-Liability Management},\n  booktitle = {Proceedings of the 6th ACM International Conference on AI in Finance},\n  series    = {ICAIF '25},\n  year      = {2025},\n  publisher = {ACM},\n  note      = {arXiv:2509.23280}\n}\n```\n\nFor the implementation itself, you may cite this repository:\n```\nChirinda, C. (2025). A Professional-Grade Implementation of the \"Continuous-Time RL for ALM\" Framework.\nGitHub repository: https://github.com/chirindaopensource/continuous_time_rl_for_alm\n```\n\n## Acknowledgments\n\n-   Credit to **Yilie Huang** for the foundational research that forms the entire basis for this computational replication.\n-   This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including **NumPy, Pandas, SciPy, PyTorch, Gymnasium, Matplotlib, and Jupyter**, whose work makes complex computational analysis accessible and robust.\n\n--\n\n*This README was generated based on the structure and content of `continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb` and follows best practices for research software documentation.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchirindaopensource%2Fcontinuous_time_rl_for_alm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchirindaopensource%2Fcontinuous_time_rl_for_alm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchirindaopensource%2Fcontinuous_time_rl_for_alm/lists"}