https://github.com/donpushme/price-predictor-updated
https://github.com/donpushme/price-predictor-updated
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/donpushme/price-predictor-updated
- Owner: donpushme
- Created: 2025-08-05T14:59:37.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2025-08-13T02:31:00.000Z (10 months ago)
- Last Synced: 2025-10-06T21:40:58.533Z (8 months ago)
- Language: Python
- Size: 77.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cryptocurrency and Commodity Volatility Prediction System
A comprehensive AI-powered system for predicting volatility, skewness, and kurtosis of cryptocurrency and commodity price changes using PyTorch neural networks. Designed for Monte Carlo simulation with 288-step forecasting (24 hours in 5-minute intervals).
## Features
- **Multi-Asset Support**: Bitcoin, Ethereum, Solana, and XAU (Gold)
- **288-Step Forecasting**: Predicts 24 hours ahead in 5-minute intervals
- **Statistical Moments**: Predicts volatility, skewness, and kurtosis
- **Real-Time Prediction**: Single input generates full 24-hour forecast
- **Monte Carlo Ready**: Outputs designed for Monte Carlo simulation
- **Database Integration**: MongoDB for scalable data and prediction storage
- **Confidence Intervals**: Monte Carlo dropout for uncertainty estimation
- **Time-Aware Models**: Captures US trading hour volatility patterns
## Project Structure
```
price-predict/
├── model.py # Neural network architecture
├── trainer.py # Model training functionality
├── predictor.py # Real-time prediction
├── data_processor.py # Data preprocessing and feature engineering
├── database_manager.py # MongoDB operations
├── multi_trainer.py # Multi-asset training script
├── multi_predictor.py # Multi-asset prediction script
├── config.py # Configuration settings
├── requirements.txt # Python dependencies
├── README.md # This file
├── training_data/ # Training data directory (create and add CSV files)
│ ├── bitcoin_5min.csv
│ ├── ethereum_5min.csv
│ ├── solana_5min.csv
│ └── xau_5min.csv
├── models/ # Saved models directory (auto-created)
└── predictions/ # Prediction outputs directory (auto-created)
```
## Installation
1. **Clone or download the project files**
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Set up MongoDB**:
```bash
# Automatic setup (Ubuntu/macOS/CentOS)
python3 mongodb_setup.py
# Or install manually:
# Ubuntu: sudo apt-get install mongodb-org
# macOS: brew install mongodb-community
# Windows: Download from mongodb.com
```
4. **Prepare training data**:
- Create `training_data/` directory
- Add CSV files with the following format:
```csv
timestamp,open,high,low,close
2022-02-03 14:50:00,36551.864,36618.271,36500.254,36500.254
2022-02-03 14:55:00,36500.254,36565.123,36480.567,36520.890
...
```
## Usage
### 1. Training Models
#### Train all assets:
```bash
python multi_trainer.py --epochs 100 --batch-size 32
```
#### Train specific asset:
```bash
python multi_trainer.py --asset bitcoin --epochs 100
```
#### Training with custom parameters:
```bash
python multi_trainer.py \
--data-dir training_data \
--models-dir models \
--epochs 150 \
--batch-size 64 \
--learning-rate 0.001 \
--patience 25 \
--device cuda
```
### 2. Making Predictions
#### Single asset prediction:
```bash
python multi_predictor.py --asset bitcoin --price 45000
```
#### Multi-asset prediction:
```bash
python multi_predictor.py --prices '{"bitcoin": 45000, "ethereum": 3000, "solana": 100, "xau": 2000}'
```
#### Prediction with confidence intervals:
```bash
python multi_predictor.py --asset bitcoin --price 45000 --confidence
```
#### Generate Monte Carlo scenarios:
```bash
python multi_predictor.py --prices '{"bitcoin": 45000}' --scenarios 1000
```
#### Save predictions to file:
```bash
python multi_predictor.py --asset bitcoin --price 45000 --output predictions/bitcoin_forecast.json
```
### 3. Using the API
```python
from multi_predictor import MultiAssetPredictor
from datetime import datetime
# Initialize predictor
predictor = MultiAssetPredictor(models_dir="models")
# Make prediction
current_time = datetime.now()
prediction = predictor.predict_single_asset(
asset="bitcoin",
current_price=45000.0,
current_time=current_time
)
# Access predictions
volatility_forecast = prediction['volatility'] # 288 values
skewness_forecast = prediction['skewness'] # 288 values
kurtosis_forecast = prediction['kurtosis'] # 288 values
timestamps = prediction['timestamps'] # 288 timestamps
print(f"Next 5-min volatility: {volatility_forecast[0]:.6f}")
print(f"24-hour max volatility: {max(volatility_forecast):.6f}")
```
### 4. Monte Carlo Simulation Example
```python
import numpy as np
from multi_predictor import MultiAssetPredictor
from datetime import datetime
predictor = MultiAssetPredictor()
# Generate scenarios for simulation
scenarios = predictor.generate_monte_carlo_scenarios(
assets=["bitcoin", "ethereum"],
current_prices={"bitcoin": 45000, "ethereum": 3000},
current_time=datetime.now(),
n_scenarios=1000
)
# Use scenarios in Monte Carlo simulation
for scenario in scenarios["bitcoin"]:
volatility_path = scenario['volatility'] # 288 values
skewness_path = scenario['skewness'] # 288 values
kurtosis_path = scenario['kurtosis'] # 288 values
# Your simulation logic here
# Generate price path using these statistical moments
```
## Model Architecture
The system uses a sophisticated neural network architecture:
- **Time Embedding**: Captures cyclical patterns (hour, day of week, US trading hours)
- **Price Embedding**: Processes OHLC data and technical indicators
- **LSTM Layers**: Models temporal dependencies
- **Multi-Head Attention**: Focuses on relevant patterns
- **Multiple Output Heads**: Separate predictors for volatility, skewness, and kurtosis
### Key Features:
- **Real-life Volatility Patterns**: Models higher volatility during US trading hours
- **Technical Indicators**: RSI, moving averages, Parkinson volatility estimator
- **Statistical Moments**: Rolling calculations of volatility, skewness, and kurtosis
- **Gradient Clipping**: Prevents exploding gradients
- **Early Stopping**: Prevents overfitting
- **Learning Rate Scheduling**: Adaptive learning rate
## Data Requirements
### Input Format
CSV files with 5-minute OHLC data:
- `timestamp`: ISO format datetime
- `open`: Opening price
- `high`: Highest price
- `low`: Lowest price
- `close`: Closing price
### Data Processing
The system automatically:
- Calculates returns and log returns
- Computes technical indicators
- Generates time-based features
- Calculates rolling statistical moments
- Normalizes features for training
## Configuration
Modify `config.py` to customize:
- Model architecture parameters
- Training hyperparameters
- Data processing settings
- Asset-specific configurations
- Loss function weights
- Prediction bounds
## Database
The system uses MongoDB for:
- **Price Data Storage**: Historical OHLC data with calculated features in time-series collections
- **Prediction Storage**: All forecasts with metadata and arrays stored as documents
- **Model Metadata**: Training configurations and performance metrics
- **Training History**: Loss curves and training statistics
- **Scalability**: Better performance for large datasets and real-time predictions
- **Indexing**: Optimized time-based queries and symbol lookups
## Performance Monitoring
Track model performance through:
- Training/validation loss curves
- Individual loss components (volatility, skewness, kurtosis)
- Learning rate schedules
- Early stopping triggers
- Model comparison metrics
## Troubleshooting
### Common Issues:
1. **CUDA out of memory**: Reduce batch size or model hidden dimension
2. **Training data not found**: Check file paths in `training_data/` directory
3. **Model loading failed**: Ensure model was trained and saved properly
4. **Prediction errors**: Verify data processor is fitted and price history is available
5. **MongoDB connection failed**: Ensure MongoDB is running (`python3 mongodb_setup.py --check-only`)
6. **Database errors**: Check MongoDB service status and connection settings in `config.py`
### Debug Options:
- Use smaller models for testing (`hidden_dim=64, num_layers=1`)
- Reduce sequence length for memory constraints
- Check data quality and preprocessing steps
- Validate input data format and ranges
## Technical Details
### Neural Network
- **Input**: Price sequences (100 timesteps) + time features
- **Output**: 288-step forecasts for each statistical moment
- **Architecture**: LSTM + Attention + Multiple output heads
- **Regularization**: Dropout, weight decay, gradient clipping
### Time Features
- Hour/minute cyclical encoding
- Day of week patterns
- US trading hours indicator
- Weekend/weekday classification
### Statistical Moments
- **Volatility**: Standard deviation of returns (always positive)
- **Skewness**: Asymmetry measure (bounded by tanh)
- **Kurtosis**: Tail heaviness (minimum 3.0 for normal distribution)
### Loss Function
Weighted combination:
- 60% Volatility loss (most important for Monte Carlo)
- 20% Skewness loss
- 20% Kurtosis loss
## Citation
If you use this system in your research, please cite:
```
Cryptocurrency and Commodity Volatility Prediction System
Multi-Asset Neural Network for Monte Carlo Simulation
[Your Name/Organization], 2024
```
## License
This project is for educational and research purposes. Please ensure compliance with data usage rights and applicable regulations when using with real financial data.
## Support
For issues and questions:
1. Check the troubleshooting section
2. Verify your data format matches requirements
3. Test with smaller datasets first
4. Check GPU memory usage and requirements
## Future Enhancements
Potential improvements:
- Additional asset support
- Real-time data feeds
- Enhanced uncertainty quantification
- Ensemble models
- Alternative architectures (Transformers, GRU)
- Risk management integration