An open API service indexing awesome lists of open source software.

https://github.com/kerryyys/umh2025

Bridges the power of Hidden Markov Models (HMM) and Natural Language Processing (NLP) to detect market regimes and predict optimal trading strategies.
https://github.com/kerryyys/umh2025

bitcoin cryptocurrency hmm metrics nlp regime strategies trading

Last synced: 4 months ago
JSON representation

Bridges the power of Hidden Markov Models (HMM) and Natural Language Processing (NLP) to detect market regimes and predict optimal trading strategies.

Awesome Lists containing this project

README

          

# UMH2025 ๐Ÿš€
Hi there! We are **Team Lima Biji**, participating in the **UMHackathon 2025** under:

๐Ÿ“Š **Domain 2 - Quantitative Trading**

๐Ÿ“‘ **Slides link**: [View Our Deck](https://www.canva.com/design/DAGkWFnoy34/IumXz3cmGOLTeMXOjEOGaw/edit?utm_content=DAGkWFnoy34&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton)
Our project integrates financial market data and online user sentiment to enhance crypto market regime detection using Hidden Markov Models (HMM), with the final goal of recommending BUY/SELL/HOLD strategies based on both on-chain data and Reddit discussion patterns.

---

## ๐Ÿง  Introduction

Cryptocurrency markets are volatile and sentiment-driven. While traditional models rely purely on numerical indicators, our project attempts to answer:

> "Can combining on-chain whale behavior and Reddit user sentiment create more explainable, adaptive, and realistic trading strategies?"

We propose an explainable ML-driven trading assistant that identifies market regimes and gives contextual investment suggestions supported by public discussions.

---

## ๐ŸŽฏ Project Goal

Our aim is to build an **alpha-generating crypto trading system** that:
- Detect **market regimes** using unsupervised learning (HMM).
- Integrate **Reddit sentiment** to capture behavioral shifts.
- Recommend trading actions (BUY/SELL/HOLD) along with **justifications** derived from sentiment trends.

---

## ๐Ÿงช Hypotheses & Metrics
> โœ๏ธ Hypothesis:
- H1: Technical Indicators Improve Regime Detection
- H2: XGBoost Feature Selection Predicts Extreme Price Moves
- H3: Whale-Driven Features Define Market Regimes

> โœ๏ธ Metrics:
- Sharpe Ratio
- Max Drawdown
- Trade Frequency
- Strategy Win Rate

---

## ๐Ÿ› ๏ธ Setup & Installation

```
# Clone the repo
git clone https://github.com/kerryyys/UMH2025.git
cd UMH2025

# Create virtual environment
python -m venv venv
source venv/bin/activate # on Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the pipeline (example)
python models/new_model.py
```
> Note: For NLP sentiment analysis, make sure your Reddit credentials are set up via .env or passed into the praw module.

---

## ๐ŸŒŸ Innovation Highlights
๐Ÿ’ฌ NLP for Whale Behavior Tracking
Scrapes Reddit data to detect how the crowd reacts to sudden market flows.

Uses **VADER Sentiment Analysis** to extract daily sentiment scores.

Integrates whale movement (inflows/outflows) with public opinion.

---

## ๐Ÿ”Ž Feature Attribution for Transparency
Uses decision tree explanations & correlation matrices to expose how features drive decisions.

Explains why the model triggers certain BUY/SELL calls.

---

## ๐Ÿ“Š Visual Insights
Heatmaps, clustering charts, and sentiment trendlines to explain strategies visually.

---

Model state visualizations (e.g., HMM transition maps, emission probabilities).

---

## ๐Ÿงช Feature Engineering
Feature Type | Examples | Description
On-Chain | exchange_inflow, whale_spikes | Real-time behaviors of smart money
Sentiment | avg_sentiment_score, post_volume | Reddit NLP signals aggregated daily
Technical | price, returns, volume | Classical indicators
Engineered | log_return, whale_sentiment_diff | Combined sentiment-behavioral signals

---

## ๐Ÿงฑ Model Architecture
We combine:

A **Gaussian HMM** for market regime detection.
An **NLP pipeline** to extract public sentiment.
A **strategy recommendation engine** based on regime + sentiment context.
![image](https://github.com/user-attachments/assets/cdf75edd-bf0f-42df-869d-2d38e56d9cfc)

### ๐Ÿง  Model Architecture (Conceptual View):
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ On-chain Features โ”‚โ—„โ”€โ”€โ”€ CryptoQuant / Glassnode / Coinglass
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Reddit Sentiment โ”‚โ—„โ”€โ”€โ”€ NLP pipeline from r/CryptoCurrency, r/BitcoinMarkets
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Feature Merger โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Gaussian HMM โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Strategy Decision Engine โ”‚ โ”€โ”€> ๐Ÿ”ด SELL / ๐ŸŸก HOLD / ๐ŸŸข BUY
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

### ๐Ÿงฑ Class Diagram (Simplified Structure)
+---------------------+
| RedditSentiment |
+---------------------+
| + fetch_posts() |
| + analyze() |
| + save_results() |
+---------------------+

+---------------------+
| FeatureEngineer |
+---------------------+
| + merge_sources() |
| + clean_features() |
+---------------------+

+---------------------+
| HMMTrader |
+---------------------+
| + train_model() |
| + predict_regime() |
| + evaluate() |
| + generate_signals()|
+---------------------+

+---------------------+
| Visualizer |
+---------------------+
| + plot_regimes() |
| + save_backtest() |
+---------------------+

---
## ๐Ÿ—‚๏ธ File Structure
```
UMH2025/
โ”œโ”€โ”€ archive/ # Archived or deprecated files

โ”œโ”€โ”€ data/
โ”‚ โ”œโ”€โ”€ cleaned/ # Cleaned datasets (e.g., cleaned/btc 2023-2024/)
โ”‚ โ”œโ”€โ”€ NLP/
โ”‚ โ”‚ โ”œโ”€โ”€ processed/ # Processed NLP sentiment data
โ”‚ โ”‚ โ”œโ”€โ”€ raw_unused_data/ # Raw unused Reddit post data
โ”‚ โ”‚ โ””โ”€โ”€ reddit_posts.csv # Collected Reddit post data
โ”‚ โ”œโ”€โ”€ processed_data/ # Final processed datasets for modeling
โ”‚ โ””โ”€โ”€ raw_data/
โ”‚ โ”œโ”€โ”€ crypto_kmeans_clustering_output.csv
โ”‚ โ””โ”€โ”€ crypto_strategy_output.csv

โ”œโ”€โ”€ models/ # Currently unused โ€“ reserved for model scripts or checkpoints

โ”œโ”€โ”€ results/ # Folder to store visualizations or model outputs

โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ 0_config/ # Configuration files and constants
โ”‚ โ”œโ”€โ”€ 1_fetch_data/ # Scripts to fetch or collect raw data
โ”‚ โ”œโ”€โ”€ 2_merge_data/ # Scripts to merge and align multiple data sources
โ”‚ โ”œโ”€โ”€ 3_clean_data/ # Scripts to clean and preprocess datasets
โ”‚ โ”œโ”€โ”€ 4_backtesting/ # Backtesting strategies and evaluation logic
โ”‚ โ”œโ”€โ”€ NLP/ # NLP-specific analysis, sentiment scoring, etc.
โ”‚ โ”œโ”€โ”€ _pycache_/
โ”‚ โ”œโ”€โ”€ assets/ # Static files for Dash app styling
โ”‚ โ”‚ โ””โ”€โ”€ custom.css
โ”‚ โ””โ”€โ”€ dash/ # Dash app components
โ”‚ โ”œโ”€โ”€ app.py # Main entry point for the Dash dashboard
โ”‚ โ”œโ”€โ”€ callbacks.py # Callback functions for interactivity
โ”‚ โ”œโ”€โ”€ data_loader.py # Loads and prepares data for visualization
โ”‚ โ””โ”€โ”€ layout.py # Dash app layout and structure

โ”œโ”€โ”€ .gitattributes
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ README.md # Project documentation
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”œโ”€โ”€ run.bat # Batch script to execute project pipeline
โ”œโ”€โ”€ setup_env.bat # Batch script to set up the environment
โ””โ”€โ”€ ohlcv.csv # OHLCV (Open, High, Low, Close, Volume) data

```

---

## ๐Ÿ“š Citations
**HMM On-Chain Data**: Credit to [CoinGlass](https://www.coinglass.com/), [CryptoQuant](https://cryptoquant.com/), [Glassnode](https://glassnode.com/)

**Reddit Sentiment Data**: Credit to [Reddit](https://www.reddit.com/) via praw API