https://github.com/tansexe/ad-lab
Basics of Data Analysis & ML
https://github.com/tansexe/ad-lab
data-visualization eda ml
Last synced: 10 months ago
JSON representation
Basics of Data Analysis & ML
- Host: GitHub
- URL: https://github.com/tansexe/ad-lab
- Owner: tansexe
- License: mit
- Created: 2025-01-14T06:01:13.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-25T14:36:02.000Z (about 1 year ago)
- Last Synced: 2025-03-25T15:41:42.584Z (about 1 year ago)
- Topics: data-visualization, eda, ml
- Language: Jupyter Notebook
- Homepage:
- Size: 5.05 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Applications Development Lab
This project is part of the **Applications Development Lab** course in the 6th semester. The project explores data analysis, correlation identification, application of machine learning models, and the creation of an end-to-end machine learning pipeline.
## Project Overview
The main objective of this project is to analyze a dataset, explore different correlations, try various machine learning models, and develop a machine learning pipeline to streamline the entire process.
### Key Highlights:
- **Data Analysis**: Performed exploratory data analysis (EDA) to understand the dataset and its underlying patterns.
- **Correlation Analysis**: Investigated correlations between various features of the dataset and visualized the findings.
- **Machine Learning Models**: Tried several machine learning models to predict target variables and evaluated their performance.
- **ML Pipeline**: Developed a machine learning pipeline to automate the process from data preprocessing to model evaluation.
## Installation
To run this project locally, you need to have the following libraries installed:
- Python 3.x
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Scikit-learn
You can install the required libraries using pip:
```bash
pip install -r requirements.txt
```
## Usage
1. **Data Preprocessing**:
- The dataset is loaded and cleaned.
- Missing values are handled, and categorical variables are encoded.
2. **Exploratory Data Analysis (EDA)**:
- Visualizations are created to explore the data and understand the relationships between features.
3. **Modeling**:
- Various machine learning models, such as Linear Regression, Decision Trees, and Random Forest, are tested.
4. **Machine Learning Pipeline**:
- A pipeline is created to automate data preprocessing, model training, and evaluation.
## Evaluation
The performance of the models is evaluated using appropriate metrics such as accuracy, precision, recall, and F1-score.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Inspired by the concepts covered in the Applications Development Lab curriculum.
- Special thanks to the course instructors for their support and guidance.