Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/samuele-lolli/data-analytics-techniques

A practical approach to data analytics pipeline.
https://github.com/samuele-lolli/data-analytics-techniques

numpy pandas pytorch scikit-learn

Last synced: 2 days ago
JSON representation

A practical approach to data analytics pipeline.

Awesome Lists containing this project

README

        

# Data Analytics Techniques

This project uses a subset of the Million Song Database, which contains acoustic features and a `year` column. The objective of this task is to predict the year of release of a track based on its acoustic features.

The project covers the complete standard data analytics pipeline, including the following steps:

1. **Data Visualization**

2. **Preprocessing**
- Handling outliers (Winsorization and other techniques)
- Various types of scaling and normalization
- Principal Component Analysis (PCA)

3. **Training Classical Machine Learning Models**
- Random Forest
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Linear Regression

4. **Training a Feedforward Neural Network using PyTorch**

5. **Using PyTorch Tabular Models**
- TabNet
- TabTransformers

6. **Hyperparameter Tuning**

7. **Model Evaluation on Test Set**

For more information, you can view the following Jupyter Notebooks:
- `data-visualization.ipynb`
- `ml-sklearn.ipynb`
- `feedforward-pytorch.ipynb`
- `tabular-pytorch.ipynb`

This project demonstrates a comprehensive approach to data analytics, from preprocessing to model evaluation, using both classical machine learning models and more advanced neural network models.