Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/samuele-lolli/data-analytics-techniques
A practical approach to data analytics pipeline.
https://github.com/samuele-lolli/data-analytics-techniques
numpy pandas pytorch scikit-learn
Last synced: 2 days ago
JSON representation
A practical approach to data analytics pipeline.
- Host: GitHub
- URL: https://github.com/samuele-lolli/data-analytics-techniques
- Owner: samuele-lolli
- Created: 2025-01-10T03:30:46.000Z (14 days ago)
- Default Branch: master
- Last Pushed: 2025-01-10T19:55:08.000Z (14 days ago)
- Last Synced: 2025-01-22T04:15:48.630Z (2 days ago)
- Topics: numpy, pandas, pytorch, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 10.5 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Analytics Techniques
This project uses a subset of the Million Song Database, which contains acoustic features and a `year` column. The objective of this task is to predict the year of release of a track based on its acoustic features.
The project covers the complete standard data analytics pipeline, including the following steps:
1. **Data Visualization**
2. **Preprocessing**
- Handling outliers (Winsorization and other techniques)
- Various types of scaling and normalization
- Principal Component Analysis (PCA)3. **Training Classical Machine Learning Models**
- Random Forest
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Linear Regression4. **Training a Feedforward Neural Network using PyTorch**
5. **Using PyTorch Tabular Models**
- TabNet
- TabTransformers6. **Hyperparameter Tuning**
7. **Model Evaluation on Test Set**
For more information, you can view the following Jupyter Notebooks:
- `data-visualization.ipynb`
- `ml-sklearn.ipynb`
- `feedforward-pytorch.ipynb`
- `tabular-pytorch.ipynb`This project demonstrates a comprehensive approach to data analytics, from preprocessing to model evaluation, using both classical machine learning models and more advanced neural network models.