https://github.com/kerryyys/umh2025
Bridges the power of Hidden Markov Models (HMM) and Natural Language Processing (NLP) to detect market regimes and predict optimal trading strategies.
https://github.com/kerryyys/umh2025
bitcoin cryptocurrency hmm metrics nlp regime strategies trading
Last synced: 4 months ago
JSON representation
Bridges the power of Hidden Markov Models (HMM) and Natural Language Processing (NLP) to detect market regimes and predict optimal trading strategies.
- Host: GitHub
- URL: https://github.com/kerryyys/umh2025
- Owner: kerryyys
- Created: 2025-04-10T16:21:45.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-04-19T23:56:25.000Z (6 months ago)
- Last Synced: 2025-05-01T08:15:09.881Z (6 months ago)
- Topics: bitcoin, cryptocurrency, hmm, metrics, nlp, regime, strategies, trading
- Language: Jupyter Notebook
- Homepage:
- Size: 65.9 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# UMH2025 ๐
Hi there! We are **Team Lima Biji**, participating in the **UMHackathon 2025** under:๐ **Domain 2 - Quantitative Trading**
๐ **Slides link**: [View Our Deck](https://www.canva.com/design/DAGkWFnoy34/IumXz3cmGOLTeMXOjEOGaw/edit?utm_content=DAGkWFnoy34&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton)
Our project integrates financial market data and online user sentiment to enhance crypto market regime detection using Hidden Markov Models (HMM), with the final goal of recommending BUY/SELL/HOLD strategies based on both on-chain data and Reddit discussion patterns.---
## ๐ง Introduction
Cryptocurrency markets are volatile and sentiment-driven. While traditional models rely purely on numerical indicators, our project attempts to answer:
> "Can combining on-chain whale behavior and Reddit user sentiment create more explainable, adaptive, and realistic trading strategies?"
We propose an explainable ML-driven trading assistant that identifies market regimes and gives contextual investment suggestions supported by public discussions.
---
## ๐ฏ Project Goal
Our aim is to build an **alpha-generating crypto trading system** that:
- Detect **market regimes** using unsupervised learning (HMM).
- Integrate **Reddit sentiment** to capture behavioral shifts.
- Recommend trading actions (BUY/SELL/HOLD) along with **justifications** derived from sentiment trends.---
## ๐งช Hypotheses & Metrics
> โ๏ธ Hypothesis:
- H1: Technical Indicators Improve Regime Detection
- H2: XGBoost Feature Selection Predicts Extreme Price Moves
- H3: Whale-Driven Features Define Market Regimes> โ๏ธ Metrics:
- Sharpe Ratio
- Max Drawdown
- Trade Frequency
- Strategy Win Rate---
## ๐ ๏ธ Setup & Installation
```
# Clone the repo
git clone https://github.com/kerryyys/UMH2025.git
cd UMH2025# Create virtual environment
python -m venv venv
source venv/bin/activate # on Windows: venv\Scripts\activate# Install dependencies
pip install -r requirements.txt# Run the pipeline (example)
python models/new_model.py
```
> Note: For NLP sentiment analysis, make sure your Reddit credentials are set up via .env or passed into the praw module.---
## ๐ Innovation Highlights
๐ฌ NLP for Whale Behavior Tracking
Scrapes Reddit data to detect how the crowd reacts to sudden market flows.Uses **VADER Sentiment Analysis** to extract daily sentiment scores.
Integrates whale movement (inflows/outflows) with public opinion.
---
## ๐ Feature Attribution for Transparency
Uses decision tree explanations & correlation matrices to expose how features drive decisions.Explains why the model triggers certain BUY/SELL calls.
---
## ๐ Visual Insights
Heatmaps, clustering charts, and sentiment trendlines to explain strategies visually.---
Model state visualizations (e.g., HMM transition maps, emission probabilities).
---
## ๐งช Feature Engineering
Feature Type | Examples | Description
On-Chain | exchange_inflow, whale_spikes | Real-time behaviors of smart money
Sentiment | avg_sentiment_score, post_volume | Reddit NLP signals aggregated daily
Technical | price, returns, volume | Classical indicators
Engineered | log_return, whale_sentiment_diff | Combined sentiment-behavioral signals---
## ๐งฑ Model Architecture
We combine:A **Gaussian HMM** for market regime detection.
An **NLP pipeline** to extract public sentiment.
A **strategy recommendation engine** based on regime + sentiment context.
### ๐ง Model Architecture (Conceptual View):
โโโโโโโโโโโโโโโโโโโโโโ
โ On-chain Features โโโโโ CryptoQuant / Glassnode / Coinglass
โโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโ
โ Reddit Sentiment โโโโโ NLP pipeline from r/CryptoCurrency, r/BitcoinMarkets
โโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโ
โ Feature Merger โ
โโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโ
โ Gaussian HMM โ
โโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Strategy Decision Engine โ โโ> ๐ด SELL / ๐ก HOLD / ๐ข BUY
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ### ๐งฑ Class Diagram (Simplified Structure)
+---------------------+
| RedditSentiment |
+---------------------+
| + fetch_posts() |
| + analyze() |
| + save_results() |
+---------------------++---------------------+
| FeatureEngineer |
+---------------------+
| + merge_sources() |
| + clean_features() |
+---------------------++---------------------+
| HMMTrader |
+---------------------+
| + train_model() |
| + predict_regime() |
| + evaluate() |
| + generate_signals()|
+---------------------++---------------------+
| Visualizer |
+---------------------+
| + plot_regimes() |
| + save_backtest() |
+---------------------+---
## ๐๏ธ File Structure
```
UMH2025/
โโโ archive/ # Archived or deprecated filesโโโ data/
โ โโโ cleaned/ # Cleaned datasets (e.g., cleaned/btc 2023-2024/)
โ โโโ NLP/
โ โ โโโ processed/ # Processed NLP sentiment data
โ โ โโโ raw_unused_data/ # Raw unused Reddit post data
โ โ โโโ reddit_posts.csv # Collected Reddit post data
โ โโโ processed_data/ # Final processed datasets for modeling
โ โโโ raw_data/
โ โโโ crypto_kmeans_clustering_output.csv
โ โโโ crypto_strategy_output.csvโโโ models/ # Currently unused โ reserved for model scripts or checkpoints
โโโ results/ # Folder to store visualizations or model outputs
โโโ src/
โ โโโ 0_config/ # Configuration files and constants
โ โโโ 1_fetch_data/ # Scripts to fetch or collect raw data
โ โโโ 2_merge_data/ # Scripts to merge and align multiple data sources
โ โโโ 3_clean_data/ # Scripts to clean and preprocess datasets
โ โโโ 4_backtesting/ # Backtesting strategies and evaluation logic
โ โโโ NLP/ # NLP-specific analysis, sentiment scoring, etc.
โ โโโ _pycache_/
โ โโโ assets/ # Static files for Dash app styling
โ โ โโโ custom.css
โ โโโ dash/ # Dash app components
โ โโโ app.py # Main entry point for the Dash dashboard
โ โโโ callbacks.py # Callback functions for interactivity
โ โโโ data_loader.py # Loads and prepares data for visualization
โ โโโ layout.py # Dash app layout and structureโโโ .gitattributes
โโโ .gitignore
โโโ README.md # Project documentation
โโโ requirements.txt # Python dependencies
โโโ run.bat # Batch script to execute project pipeline
โโโ setup_env.bat # Batch script to set up the environment
โโโ ohlcv.csv # OHLCV (Open, High, Low, Close, Volume) data```
---
## ๐ Citations
**HMM On-Chain Data**: Credit to [CoinGlass](https://www.coinglass.com/), [CryptoQuant](https://cryptoquant.com/), [Glassnode](https://glassnode.com/)**Reddit Sentiment Data**: Credit to [Reddit](https://www.reddit.com/) via praw API