Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gblssroman/vk_internship
🏢 EDA (incl. KNN) & CatBoost & Optuna for predicting the most accurate scores for the business objects success.
https://github.com/gblssroman/vk_internship
catboost geolocation knn machine-learning vk
Last synced: about 1 month ago
JSON representation
🏢 EDA (incl. KNN) & CatBoost & Optuna for predicting the most accurate scores for the business objects success.
- Host: GitHub
- URL: https://github.com/gblssroman/vk_internship
- Owner: gblssroman
- Created: 2024-03-20T19:13:23.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2024-03-20T23:38:38.000Z (10 months ago)
- Last Synced: 2024-10-31T06:25:21.945Z (3 months ago)
- Topics: catboost, geolocation, knn, machine-learning, vk
- Language: Jupyter Notebook
- Homepage:
- Size: 32.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🏢 VK Internship March '24
EDA (incl. KNN) & CatBoost & Optuna for giving the most accurate scores for the object success prediction task.Repo for VK Internship Data Science task.
```.zip``` archive included.
**Instruction:**
1. Install ```requirements.txt``` via ```pip install -r requirements.txt``` (no junk :))
1. Launch ```generate_submission.py```
2. Get final ```submission.csv``` (it has already been generated in the folder ```output``` for fast reference).Other files description:
* ```classifier.cbm``` - Trained CatBoost model (regressor, not classifier)
* ```do_eda.py``` - Script used in ```generate_submission.py``` for given datasets preparation
* ```datasets``` - Folder containing datasets
* ```cols_to-drop.pkl``` - Columns to-be-dropped causing multicollinearity (dict).---
**Инструкция:**
1. Установите ``requirements.txt`` через ``pip install -r requirements.txt``
1. Запустите ``generate_submission.py``
2. Получите окончательный ``submission.csv`` (он уже сгенерирован в папке ``output`` для быстрого референса).Описание других файлов:
* ```classifier.cbm``` - Обученный регрессор CatBoost
* ``do_eda.py`` - скрипт, используемый в ``generate_submission.py`` для подготовки датасетов и фичей
* ``datasets`` - Папка, содержащая наборы данных
* ``cols_to-drop.pkl`` - Столбцы, подлежащие удалению, вызывающие мультиколлинеарность и по факту не дающие полезной информации.