https://github.com/roderickqiu/data-mining-project
Data Mining (CS306) Project, Spring 2025, SUSTech CSE.
https://github.com/roderickqiu/data-mining-project
Last synced: 9 months ago
JSON representation
Data Mining (CS306) Project, Spring 2025, SUSTech CSE.
- Host: GitHub
- URL: https://github.com/roderickqiu/data-mining-project
- Owner: RoderickQiu
- Created: 2025-05-15T11:41:33.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-06-27T06:19:39.000Z (9 months ago)
- Last Synced: 2025-06-27T07:27:17.492Z (9 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 2.74 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Intelligent Learning Recommendation System
This repository contains the codebase for a CS306 (Data Mining) final project, which implements an intelligent learning recommendation system. The system features a deep learning-based knowledge tracing model and a personalized question recommendation engine, with a modern web frontend for user interaction.
## Project Structure
```
data-mining-project/
├── recommend/ # Standalone recommendation logic (legacy, see main.py for API)
├── frontend/ # Modern React + TypeScript web interface
├── logs/ # Training and experiment logs
├── saved_models/ # Trained model checkpoints (not tracked by git)
├── main.py # FastAPI backend server (primary API)
├── train.py # Model training script
├── test.py # Model evaluation script
├── dataset.py # Data loading utilities
├── requirements.txt # Python dependencies
└── ...
```
---
## Features
### 1. Knowledge Tracing & Prediction
- Implements a deep learning model (SAINT+) for sequential knowledge tracing.
- Predicts the probability of a student answering the next question correctly, based on their historical responses, response times, and question categories.
- Model is built with PyTorch for modularity and scalability.
### 2. Personalized Question Recommendation
- Question Recommendation: Suggests questions based on user mastery of knowledge tags, question difficulty, quality flags, tag overlap with user strengths/weaknesses, and benchmark tags.
- Available via API endpoint `/recommend_advanced`.
### 3. Modern Web Frontend
- Built with React, TypeScript, Tailwind CSS, and shadcn/ui.
- Provides interactive prediction and recommendation interfaces.
- Real-time data visualization with Recharts.
- Responsive design for desktop and mobile.
---
## Getting Started
### Backend (FastAPI)
#### Prerequisites
- Python 3.8+
- Install dependencies:
```bash
pip install -r requirements.txt
```
#### Running the API Server
1. Ensure the trained model checkpoint is available at `saved_models/best_model-v3.ckpt`.
2. Adjust dataset paths in `main.py` or `backend/config.py` as needed.
3. Start the server:
```bash
uvicorn main:app --reload
```
4. The API will be available at `http://localhost:8000`.
#### Key API Endpoints
- `POST /predict`: Predicts the probability of correct answers for a sequence.
- `POST /recommend_advanced`: Returns advanced personalized question recommendations.
- `POST /question_stats_by_ids`: Retrieves metadata for a list of question IDs.
### Model Training
- To train the model from scratch:
```bash
python train.py
```
- Training logs and metrics are saved in the `logs/` directory.
### Frontend
#### Setup & Run
```bash
cd frontend
npm install
npm start
```
- The app will be available at `http://localhost:3000`.
- API base URL is configured in `src/services/api.ts` (default: `http://localhost:8000`).
#### Features
- **Learning Prediction**: Input question IDs, response times, and categories to visualize predicted probabilities.
- **Question Recommendation**: Get personalized question suggestions using basic or advanced algorithms.
- **Data Visualization**: Interactive charts for predictions and recommendations.
#### Project Structure (Frontend)
```
frontend/
├── public/ # Static assets
├── src/
│ ├── components/ # React components
│ │ ├── ui/ # shadcn/ui base components
│ │ ├── MainTab.tsx # Main tab
│ ├── services/ # API services
│ │ └── api.ts # API interface
│ ├── lib/ # Utility functions
│ │ └── utils.ts # General utilities
│ ├── App.tsx # Main app component
│ └── index.tsx # App entry point
├── package.json # Project config
└── tailwind.config.js # Tailwind config
```
---
## Datasets
- The system uses the [Riiid! Answer Prediction](https://www.kaggle.com/competitions/riiid-test-answer-prediction/data) dataset.
- Please download the dataset and adjust paths in `main.py` or `backend/config.py` as needed.
---
## Python Dependencies
- torch
- pytorch-lightning
- pandas
- numpy
- scikit-learn
- fastapi
- uvicorn
As declared in `requirements.txt`.
---
## Logs & Model Checkpoints
- Training logs are stored in `logs/`.
- Model checkpoints are saved in `saved_models/` (not tracked by git due to size).