An open API service indexing awesome lists of open source software.

https://github.com/bhazantri/swarmhft-rl

Adaptive High-Frequency Trading with Swarm Intelligence and Reinforcement Learning for Indian Options Markets
https://github.com/bhazantri/swarmhft-rl

adaptive hft india ocaml options-trading python qlearning rl swarm swarm-intelligence

Last synced: 8 months ago
JSON representation

Adaptive High-Frequency Trading with Swarm Intelligence and Reinforcement Learning for Indian Options Markets

Awesome Lists containing this project

README

          

# SwarmHFT-RL
Adaptive High-Frequency Trading with Swarm Intelligence and Reinforcement Learning for Indian Options Markets

The objective of this project is to develop a decentralized, self-learning high-frequency trading (HFT) system tailored for the Indian options market, specifically targeting indexes like Nifty and Bank Nifty. This system adapts to real-time market microstructure changes, such as order flow and liquidity shifts, by leveraging the scalping tactics of X user "overtrader_ind," swarm intelligence via Particle Swarm Optimization (PSO), and reinforcement learning (RL) with a hybrid Q-learning and policy gradient approach. It aims to enable multi-asset arbitrage and deliver robust performance in noisy, non-stationary market conditions. The system design is rooted in core principles inspired by overtrader_ind, focusing on scalping for small profits of 2-3 points across 50–100 trades per day, using technical triggers like trendlines (computed via linear regression over 10 ticks) and liquidity levels derived from order book depth, and enforcing risk management through volatility-based dynamic stop-losses and position sizing between 1,800 and 5,400 units.

Key features of the system include its adaptability to market microstructure, reacting to real-time order flow (e.g., bid-ask imbalance) and liquidity shifts (e.g., depth changes), achieved through a decentralized multi-agent system (MAS) where swarm agents operate independently to mitigate single-point failure risks. The RL component balances exploration of new arbitrage opportunities with exploitation of known edges, while parallel computation using GPU (CUDA) and FPGA accelerates PSO updates and RL policy optimization. Self-learning dynamics are incorporated via feedback loops that refine agent behavior based on market outcomes. The mathematical framework underpins this design: PSO defines agent states as π‘₯𝑖(𝑑)={𝑝entry,𝑝target,𝑝stop,π‘ž}x i​ (t)={p entry​ ,p target​ ,p stop​ ,q}, with velocity updates given by𝑣𝑖(𝑑+1)=𝑀⋅𝑣𝑖(𝑑)+𝑐1β‹…π‘Ÿ1β‹…(π‘π΅π‘’π‘ π‘‘π‘–βˆ’π‘₯𝑖(𝑑))+𝑐2β‹…π‘Ÿ2β‹…(π‘”π΅π‘’π‘ π‘‘βˆ’π‘₯𝑖(𝑑))v i​ (t+1)=wβ‹…v i​ (t)+c 1​ β‹…r 1​ β‹…(pBest i​ βˆ’x i​ (t))+c 2​ β‹…r 2​ β‹…(gBestβˆ’x i​ (t)), where 𝑀w (0.7–0.9) is inertia weight, 𝑐1,𝑐2c 1​ ,c 2 (1.5–2.0) are cognitive/social coefficients, and π‘Ÿ1,π‘Ÿ2r 1​2​ are random factors between 0 and 1. Fitness is calculated as 𝑓(π‘₯)=𝛼⋅Profit(π‘₯𝑖)βˆ’π›½β‹…Risk(π‘₯𝑖)βˆ’π›Ύβ‹…(π‘₯𝑖)f(x i​ )=Ξ±β‹…Profit(x i​ )βˆ’Ξ²β‹…Risk(x i​ )βˆ’Ξ³β‹…Latency(x i​ ), with 𝛼,𝛽,𝛾
Ξ±,Ξ²,Ξ³ weighting profit, risk, and latency trade-offs. RL uses a state space (𝑆S) comprising order book depth, bid-ask spread, trendline slope, and volatility, and an action space (𝐴A) of buy/sell quantities and stop-loss/target adjustments, with a reward function 𝑅𝑑=Profitπ‘‘βˆ’πœ†β‹…Slippageπ‘‘βˆ’πœ‡β‹…π‘‘R t​ =Profit t​ βˆ’Ξ»β‹…Slippage t​ βˆ’ΞΌβ‹…RiskExposure t​ (where πœ†,πœ‡,ΞΌ are penalty coefficients). Q-values update via 𝑄(𝑠𝑑,π‘Žπ‘‘)←𝑄(𝑠𝑑,π‘Žπ‘‘)+πœ‚β‹…π‘…π‘‘+𝛿⋅maxβ‘π‘Žπ‘„(𝑠𝑑+1,π‘Ž)βˆ’π‘„(v𝑑,π‘Žπ‘‘))Q(s t​ ,a t​ )←Q(s t​ ,a t​ )+Ξ·β‹…(Rt +Ξ΄β‹…max aQ(s t+1,a)βˆ’Q(s t,a t)), with learning rate πœ‚Ξ· (0.001–0.01) and discount factor 𝛿δ (0.95), while policy gradient explorationfollows βˆ‡π½(πœƒ)=𝐸[βˆ‡πœƒlogβ‘πœ‹πœƒ(π‘Žβˆ£π‘ )⋅𝑅𝑑]βˆ‡J(ΞΈ)=E[βˆ‡ θ​ logΟ€ θ​ (a∣s)β‹…R t​ ]. Microstructure features include order flow imbalance (𝑂𝐹𝐼=Ξ”π΅π‘‘βˆ’Ξ”π΄π‘‘OFI=Ξ”B ​ βˆ’Ξ”A t
, where 𝐡𝑑,𝐴𝑑B t​ ,A t​ are bid/ask volumes) and liquidity shift (Δ𝐿=βˆ‘π‘–=1π‘˜(𝑉bid,𝑖+𝑉ask,𝑖)newβˆ’(𝑉bid,𝑖+𝑉ask,𝑖)oldΞ”L=βˆ‘ i=1k (V bid,i +V ask,i ) new βˆ’(V bid,i +V ask,i ) old ).

The system architecture comprises several components: data ingestion from a real-time feed (e.g., NSE options data via Zerodha API) at tick-level granularity (price, volume, order book); a swarm of 100 agents (MAS), each running local PSO and RL policies with decentralized decision-making via weighted consensus (𝑀𝑖=𝑓(π‘₯)w i​ =f(x i​ )); an RL coordinator updating global Q-tables and policy networks based on swarm feedback; an execution engine enabling sub-millisecond order placement via co-located servers; and a parallel compute layer using GPU (CUDA) for PSO velocity updates and RL gradient computation, with FPGA handling order book processing and latency-critical tasks. The workflow begins with microstructure analysis, computing 𝑂𝐹𝐼OFI and Δ𝐿ΔL every tick and updating trendlines (𝑦=π‘šπ‘₯+𝑐y=mx+c) via least squares over 10 ticks. Each agent proposes π‘₯𝑖x i​ based on local microstructure data, optimized by PSO toward personal (pBest) and global (gBest) bests. RL refines these via 𝑄(𝑠𝑑,π‘Žπ‘‘)Q(s t​ ,a t​ ), adjustingactions(e.g.,increasingπ‘žq if 𝑂𝐹𝐼>0OFI>0), with policy gradient exploring multi-asset arbitrage (e.g., Nifty-Bank Nifty spreads). Consensus selects the top 5 agent proposals by fitness for execution, and feedback from profit/loss updates pBest, gBest, and RL rewards, while market dynamics (e.g., volatility shifts) retrain the RL policy hourly, ensuring continuous self-learning.

#Strengths
Microstructure Adaptability: Real-time reaction to 𝑂𝐹𝐼OFI and Δ𝐿.
Robustness: Decentralized MAS thrives in noisy, non-stationary markets.
Multi-Asset Arbitrage: RL explores Nifty-Bank Nifty spreads.
Scalability: Parallel GPU/FPGA computation.

#Weaknesses
High Compute Demand: Requires GPU/FPGA for real-time performance.
Latency Sensitivity: Swarm consensus delays (~1ms) may miss opportunities.
Tuning Complexity: PSO (𝑀,𝑐1,𝑐2w,c 1​ ,c 2​ ) and RL (πœ‚,𝛿η,Ξ΄) parameters are hard to optimize.
Debugging Difficulty: Decentralized agents obscure failure points.

#Results (Simulated)
Backtest: Nifty options, Jan–Mar 2025 (tick data).
Daily trades: 80–120.
Avg. profit/trade: 2.1 points.
Win rate: 68%.
Sharpe ratio: 2.3.
Max drawdown: 4.2%.
Arbitrage: Captured 5 Nifty-Bank Nifty spreads/day (avg. 3 points/spread).
Latency: 0.8ms/trade (GPU), 0.3ms (FPGA).