https://github.com/zerodiscord/ai-ml

Uni course labs monorepo
https://github.com/zerodiscord/ai-ml

Last synced: 4 months ago
JSON representation

Uni course labs monorepo

Host: GitHub
URL: https://github.com/zerodiscord/ai-ml
Owner: ZeroDiscord
Created: 2024-10-09T09:27:21.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2025-05-02T10:31:57.000Z (about 1 year ago)
Last Synced: 2025-07-18T18:26:56.623Z (11 months ago)
Language: Jupyter Notebook
Size: 4.05 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # [Lab Experiments](https://github.com/ZeroDiscord/AI-ML/tree/master/lab_4-6)

## Lab 4-6: Data Import, EDA, and Preprocessing

### Overview  

This repository covers three experiments:  

1. **Experiment 4**: Import/export data and display basic statistics.  

2. **Experiment 5**: Perform Exploratory Data Analysis (EDA).  

3. **Experiment 6**: Handle missing values, outliers, and preprocess data.  

### File Structure 

```

lab_4-6/

│

├── data/            # Directory for input datasets

│   └── sample.csv   # Placeholder for the dataset used in experiments

│

├── helpers/         # Contains scripts for individual experiments

│   ├── exp4_impexp.py   # Code for Experiment 4

│   ├── exp5_eda.py      # Code for Experiment 5

│   └── exp6_preprocess.py   # Code for Experiment 6

│

├── output/          # Stores exported results, plots, and preprocessed data

│   └── ...

│

└── driver.py        # Main driver script to run experiments

```

### Requirements  

- Python 3.x  

- Libraries: `pandas`, `matplotlib`, `seaborn`, `scipy`  

### How to Run  

1. Place the dataset in the `data/` directory OR simply run the driver script to import the sample dataset. 

2. Execute the main script:  

   ```bash

   python driver.py

   ```

3. Outputs for evaluation are present in the `output/` directory.






# [Project](https://github.com/ZeroDiscord/AI-ML/tree/master/project)

### Machine Learning Pipeline

This Demonstrates the creation and training of a machine learning pipeline using `scikit-learn`. The pipeline consists of two main components:

1. **StandardScaler**: Standardizes the features by removing the mean and scaling to unit variance.

2. **RandomForestClassifier**: A classifier that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

#### Code Overview

#### Steps:

1. **Import Libraries**: Import the necessary libraries from `scikit-learn`.

2. **Define Pipeline**: Create a pipeline that first scales the data using `StandardScaler` and then applies the `RandomForestClassifier`.

3. **Train the Model**: Fit the pipeline to the training data (`X_train`, `y_train`).

This setup ensures that the data is properly scaled before being fed into the classifier, which can improve the performance of the model.

```python

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler

from sklearn.ensemble import RandomForestClassifier

# Create a pipeline

pipe = Pipeline([

    ('scaler', StandardScaler()),

    ('rf', RandomForestClassifier())

])

# Fit the pipeline

pipe.fit(X_train, y_train)

```

### References

- [scikit-learn: Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html)

- [Dataset: Sparkify](https://udacity-dsnd.s3.amazonaws.com/sparkify/mini_sparkify_event_data.json)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zerodiscord/ai-ml

Awesome Lists containing this project

README