https://github.com/samuele-lolli/data-analytics-techniques
A practical approach to data analytics pipeline.
https://github.com/samuele-lolli/data-analytics-techniques
numpy pandas pytorch scikit-learn
Last synced: 2 months ago
JSON representation
A practical approach to data analytics pipeline.
- Host: GitHub
- URL: https://github.com/samuele-lolli/data-analytics-techniques
- Owner: samuele-lolli
- Created: 2025-01-10T03:30:46.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-01-10T19:55:08.000Z (over 1 year ago)
- Last Synced: 2025-03-15T14:12:36.708Z (over 1 year ago)
- Topics: numpy, pandas, pytorch, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 10.5 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Analytics Techniques
This project uses a subset of the Million Song Database, which contains acoustic features and a `year` column. The objective of this task is to predict the year of release of a track based on its acoustic features.
The project covers the complete standard data analytics pipeline, including the following steps:
1. **Data Visualization**
2. **Preprocessing**
- Handling outliers (Winsorization and other techniques)
- Various types of scaling and normalization
- Principal Component Analysis (PCA)
3. **Training Classical Machine Learning Models**
- Random Forest
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Linear Regression
4. **Training a Feedforward Neural Network using PyTorch**
5. **Using PyTorch Tabular Models**
- TabNet
- TabTransformers
6. **Hyperparameter Tuning**
7. **Model Evaluation on Test Set**
For more information, you can view the following Jupyter Notebooks:
- `data-visualization.ipynb`
- `ml-sklearn.ipynb`
- `feedforward-pytorch.ipynb`
- `tabular-pytorch.ipynb`
This project demonstrates a comprehensive approach to data analytics, from preprocessing to model evaluation, using both classical machine learning models and more advanced neural network models.