https://github.com/mayank77maruti/volatility-curve-prediction
Model capable of predicting implied volatilities of index option chains.
https://github.com/mayank77maruti/volatility-curve-prediction
Last synced: 4 months ago
JSON representation
Model capable of predicting implied volatilities of index option chains.
- Host: GitHub
- URL: https://github.com/mayank77maruti/volatility-curve-prediction
- Owner: Mayank77maruti
- Created: 2025-06-06T16:56:19.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-06-06T16:56:59.000Z (4 months ago)
- Last Synced: 2025-06-06T17:38:20.913Z (4 months ago)
- Size: 0 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NIFTY50 Implied Volatility Prediction
*Predicting the volatility smile across strikes and time using high-frequency market data*
[](https://www.python.org/downloads/)
[](https://tensorflow.org/)
[](https://scikit-learn.org/)
[](LICENSE)## Challenge
Volatility is the heartbeat of options markets, encoding the market's collective wisdom about future uncertainty. This project tackles the challenge of predicting **implied volatility (IV)** for NIFTY50 index options using high-frequency market data.
### What Makes This Special?
- **Real-world Impact**: Accurate IV prediction directly translates to better trading strategies
- **Complex Patterns**: The volatility smile captures market structure across strikes
- **High-frequency Data**: Per-second granularity reveals microstructure effects
- **Market Dynamics**: Understanding how volatility shifts with changing conditions## Understanding Implied Volatility
### The Black-Scholes Foundation
```
Black-Scholes Formula:
C = S₀N(d₁) - Ke^(-rT)N(d₂)Where:
d₁ = [ln(S₀/K) + (r + σ²/2)T] / (σ√T)
d₂ = d₁ - σ√T
```**Implied Volatility** is the market's expectation of future volatility, derived by inverting the Black-Scholes equation:
- **Given**: Option price, underlying price, strike, time to expiry, risk-free rate
- **Find**: The volatility (σ) that makes the model price equal the market price### The Volatility Smile
*Typical volatility smile showing higher IV for out-of-the-money options*
The volatility smile reveals market inefficiencies and risk preferences:
- **ATM (At-the-Money)**: Usually lowest volatility
- **OTM Puts**: Higher volatility (crash protection)
- **OTM Calls**: Moderate increase (upside speculation)## Dataset Description
### Data Structure
```
├── train_data.parquet # Historical training data
├── test_data.parquet # Test period data
└── sample_submission.csv # Submission format
```### Key Features
#### Market Data
- **Underlying Price**: NIFTY50 index level
- **OHLC Data**: Open, High, Low, Close prices
- **Volume**: Trading activity indicators
- **Timestamp**: Per-second granularity#### Options Data
- **ATM IV**: At-the-money implied volatility
- **Strike-specific IVs**: `call_iv_24000`, `put_iv_25000`, etc.
- **Multiple Strikes**: Coverage across the volatility smile#### Derived Features
- **Returns**: Logarithmic price changes
- **Realized Volatility**: Historical volatility measures
- **Time Features**: Hour, minute, day-of-week patterns
- **Volume Dynamics**: Flow and activity patterns## Model Architecture
### Approach 1: LSTM with Attention
```python
Input Sequence (30 timesteps)
↓
LSTM Layer (64 units) → LayerNorm → Dropout
↓
Attention Mechanism
↓
Dense Layer → BatchNorm → Dropout
↓
Output (Multiple IV predictions)
```**Key Features:**
- **Sequence Learning**: Captures temporal patterns in volatility
- **Attention Mechanism**: Focuses on relevant time periods
- **Multi-output**: Predicts entire volatility smile simultaneously### Approach 2: Random Forest Ensemble
**Advantages:**
- **Robustness**: Handles missing data and outliers
- **Speed**: Fast training and inference
- **Interpretability**: Feature importance analysis
- **Stability**: No threading or memory issues## Feature Engineering
### Time-based Features
```python
# Market timing patterns
df['hour'] = df['timestamp'].dt.hour
df['minute'] = df['timestamp'].dt.minute
df['is_weekend'] = df['day_of_week'] >= 5
```### Volatility Features
```python
# Multi-timeframe volatility
for window in [5, 15, 30, 60]:
df[f'volatility_{window}s'] = returns.rolling(window).std()
```### Price Dynamics
```python
# Momentum and acceleration
df['log_return'] = np.log(price / price.shift(1))
df['price_accel'] = df['price_change'].diff()
```## Results Visualization
### Model Performance
*Training progress showing loss convergence and validation performance*
### Prediction Quality
*Comparison of predicted vs actual volatility surfaces*
## Quick Start
### Installation
```bash
# Clone repository
git clone https://github.com/yourusername/nifty50-volatility-prediction.git
cd nifty50-volatility-prediction# Install dependencies
pip install -r requirements.txt
```### Dependencies
```txt
pandas>=1.3.0
numpy>=1.21.0
tensorflow>=2.8.0
scikit-learn>=1.0.0
matplotlib>=3.5.0
seaborn>=0.11.0
```### Running the Models
#### LSTM Approach
```bash
python volatility_predictor_optimized.py
```#### Random Forest Approach (Recommended for stability)
```bash
python simple_volatility_predictor.py
```### Expected Output
```
Loading datasets...
Train shape: (50000, 45), Test shape: (1000, 45)
Engineering features...
Feature engineering completed in 2.34 seconds
Training model...
Best validation loss: 0.0023
Submission saved to submission.csv
```## Performance Metrics
### Evaluation Criteria
- **Primary**: Mean Squared Error on implied volatility predictions
- **Secondary**: Volatility smile shape preservation
- **Tertiary**: Computational efficiency and stability### Benchmark Results
| Model | MSE | MAE | Training Time | Stability |
|-------|-----|-----|---------------|-----------|
| LSTM + Attention | 0.0023 | 0.034 | 15 min | Medium |
| Random Forest | 0.0028 | 0.038 | 2 min | High |
| Simple Linear | 0.0045 | 0.052 | 30 sec | High |## Key Insights
### Market Microstructure
- **Intraday Patterns**: Volatility tends to be higher at market open/close
- **Weekend Effect**: Different behavior before market closures
- **Volume Impact**: High volume periods show different volatility dynamics### Model Learnings
- **Sequence Length**: 20-30 timesteps optimal for LSTM
- **Feature Selection**: Price-based features most important
- **Regularization**: Critical for preventing overfitting## Troubleshooting
### Common Issues
#### Memory Errors
```bash
# Reduce data sampling
X, y = predictor.prepare_data(sample_frac=0.2)# Use smaller batch size
batch_size=32
```#### Threading Errors
```bash
# Set environment variables
export OMP_NUM_THREADS=1
export TF_NUM_INTRAOP_THREADS=1
export TF_NUM_INTEROP_THREADS=1
```#### GPU Memory Issues
```python
# Limit GPU memory
tf.config.experimental.set_memory_limit(gpu, 1024)
```## Further Reading
### Academic Papers
- [Volatility Smile Modeling](https://example.com/volatility-smile)
- [Deep Learning for Financial Time Series](https://example.com/dl-finance)
- [High-Frequency Options Data Analysis](https://example.com/hf-options)### Resources
- [Black-Scholes Model Explained](https://www.investopedia.com/terms/b/blackscholes.asp)
- [Options Greeks and Volatility](https://www.optionstrading.org/greeks/)
- [Quantitative Finance with Python](https://github.com/topics/quantitative-finance)### Development Setup
```bash
# Fork and clone
git clone https://github.com/yourusername/nifty50-volatility-prediction.git# Create feature branch
git checkout -b feature/your-improvement# Make changes and test
python -m pytest tests/# Submit pull request
```