https://github.com/iftekheraziz/causality-mining
Data Mining
https://github.com/iftekheraziz/causality-mining
casuality computer-science data-mining data-mining-algorithms data-mining-python masters-degree university-coursework
Last synced: 3 months ago
JSON representation
Data Mining
- Host: GitHub
- URL: https://github.com/iftekheraziz/causality-mining
- Owner: IftekherAziz
- Created: 2024-11-24T22:25:22.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-11-24T22:33:54.000Z (7 months ago)
- Last Synced: 2025-01-25T18:43:33.471Z (5 months ago)
- Topics: casuality, computer-science, data-mining, data-mining-algorithms, data-mining-python, masters-degree, university-coursework
- Language: Jupyter Notebook
- Homepage:
- Size: 122 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Here is the updated `README.md` file reflecting the use of the Jupyter Notebook file:
---
### README.md
# Assignment: Causality in Data Mining
This repository contains solutions and explanations for **Exercise Sheet 3: Causality** from the Data Mining course at the University of Vienna, Winter Semester 2024/25. The assignment explores concepts of causality using statistical methods, with a focus on Granger causality, multivariate causal models, and bivariate additive noise models.
---
## Assignment Overview
### **Exercise 3-1: Granger Causal Test (6 points)**
The objective is to analyze causal relationships between variables using Granger causality with real-world data on chickens (\(Y\)) and eggs (\(X\)) from U.S. egg production (1930–1935).1. **Auto-regression for \(X\) and \(Y\) (1.5 points)**
Compute regression coefficients \((\beta_0, \beta_1, \beta_2)\) for predicting \(Y\) based on lagged values of \(X\) and \(Y\). Compute \(RSS1\) (Residual Sum of Squares for the full model).2. **Auto-regression without \(Y\) (1.5 points)**
Repeat the regression excluding \(Y\) and compute \(RSS2\) (Residual Sum of Squares for the reduced model).3. **Statistical Test (1.5 points)**
Use the Granger-Sargent test to determine if \(X\) Granger-causes \(Y\) with \(\alpha = 0.05\).4. **Causal Test for the Reverse Direction (1.5 points)**
Repeat the above steps to test if \(Y\) Granger-causes \(X\). Interpret the results to address the "chicken or egg" question.---
### **Exercise 3-2: Multivariate Granger Model (2 points)**
Analyze causal relationships in high-dimensional settings using multivariate Granger models.1. **Consistency of Granger Models (1 point)**
Evaluate the consistency of causal inference with ordinary least squares and adaptive Lasso penalties. Discuss if the method generalizes well in high-dimensional data.2. **HMMLGA Algorithm (1 point)**
Explain the **High-order Mixed Model Lasso Granger Algorithm (HMMLGA)** and describe its key hyperparameters, including regularization parameter \((\lambda)\), lag order \((d)\), and stopping criteria.---
### **Exercise 3-3: Bivariate Causal Models on Non-Temporal Data (2 points)**
Explore causal relationships using bivariate additive noise models.1. **Non-identifiability in Linear-Gaussian Cases (1 point)**
Explain why causal relationships between variables \(X\) and \(Y\) are non-identifiable when the noise is Gaussian and relationships are linear. Discuss the role of symmetry in the joint distribution.2. **Causal and Anti-Causal Directions (1 point)**
For \(X =\) age and \(Y =\) blood pressure, explain:
- How to change \(P(\text{effect}|\text{cause})\) without affecting \(P(\text{cause})\).
- How to change \(P(\text{cause})\) without affecting \(P(\text{effect}|\text{cause})\).
- Challenges in the anti-causal direction.---
## Key Topics Covered
- **Granger Causality**:
- Statistical framework for analyzing temporal causal relationships.
- Tests based on differences in residual sums of squares.
- **Multivariate Granger Models**:
- Extensions to handle high-dimensional data with sparse penalties like Lasso.
- **Bivariate Additive Noise Models**:
- Challenges of causal inference in non-temporal, symmetric distributions.
- **Real-World Applications**:
- Example datasets and domain-specific interpretations (e.g., chickens and eggs, age and blood pressure).---
## Repository Structure
```plaintext
├── Exercise_Sheet_3_Solution.ipynb # Jupyter Notebook containing all solutions
├── README.md # Overview of the assignment
```---
## Instructions
1. Clone this repository:
```bash
git clone https://github.com/username/causality-assignment.git
cd causality-assignment
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the solutions:
Open the Jupyter Notebook in your preferred environment (e.g., Jupyter Lab or Jupyter Notebook):
```bash
jupyter notebook Exercise_Sheet_3_Solution.ipynb
```
- **Exercise 3-1**: Granger causality tests.
- **Exercise 3-2**: Multivariate Granger models.
- **Exercise 3-3**: Bivariate additive noise models.---
## Acknowledgments
This assignment is part of the **Data Mining** course at the University of Vienna, guided by **RNDr. Katerina Schindlerová, CSc., Privatdoz.** and **Pranava Mummoju, MSc.**.