{"id":18798395,"url":"https://github.com/molodchina/msu-ml-prac","last_synced_at":"2026-05-10T03:47:09.583Z","repository":{"id":213399390,"uuid":"730566977","full_name":"Molodchina/MSU-ML-Prac","owner":"Molodchina","description":"MSU-CMC-SP Machine Learning practicum","archived":false,"fork":false,"pushed_at":"2024-08-22T10:00:47.000Z","size":20152,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-29T18:20:31.603Z","etag":null,"topics":["clustering","decision-trees","encodings","ensemble","gradient-boosting","knn","machine-learning","matplotlib","numpy","pandas","plotly","seaborn","svm"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Molodchina.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-12T07:52:52.000Z","updated_at":"2024-08-22T10:01:50.000Z","dependencies_parsed_at":"2024-01-01T08:13:39.746Z","dependency_job_id":"79acf672-2334-40b4-b2c8-dfce5a265d2c","html_url":"https://github.com/Molodchina/MSU-ML-Prac","commit_stats":{"total_commits":2,"total_committers":1,"mean_commits":2.0,"dds":0.0,"last_synced_commit":"9a90f1a8830241cf0e96887431bb37be90152362"},"previous_names":["championsh/msu-ml-prac","molodchina/msu-ml-prac"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Molodchina%2FMSU-ML-Prac","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Molodchina%2FMSU-ML-Prac/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Molodchina%2FMSU-ML-Prac/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Molodchina%2FMSU-ML-Prac/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Molodchina","download_url":"https://codeload.github.com/Molodchina/MSU-ML-Prac/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239727055,"owners_count":19687099,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","decision-trees","encodings","ensemble","gradient-boosting","knn","machine-learning","matplotlib","numpy","pandas","plotly","seaborn","svm"],"created_at":"2024-11-07T22:11:50.300Z","updated_at":"2026-01-02T00:30:16.567Z","avatar_url":"https://github.com/Molodchina.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MSU-ML-Prac\n\nThis repository is devoted to **Machine Learning MSU Practicum**.\n\n**Mastered themes:**\n1. Handling **tabular data** using the **Pandas** library, visualization using the **Matplotlib** library, **Seaborn**, **Plotly**,\n2. **Vector computation** using the **NumPy** library,\n3. **K Nearest Neighbors (KNN)** algorithm for solving **classification** and **regression** tasks,\n4. **Linear models**\n    - **Overtraining** experience,\n    - **Dealing** with **overtraining**,\n    - **Regularization** Techniques,\n    - **Regression** issue.\n5. **Preprocessing** categorical features:\n    - **One-Hot** Encoding,\n    - **Count** Encoding.\n6. **Support Vector Machine (SVM)**:\n    - Plotting of **nonlinear decision boundary**,\n    - **Optimal selection** of the **hyperparameter**,\n    - **Principal Component Analysis (PCA)** for dimensionality reduction,\n    - The **Posterior Probability** for SVM,\n    - **Solving ML task**, , the task was solved with the use of ***ensemble learning***.\n7. **Decision Trees**:\n   \u003e Used to predict the *real estate prices in California*, using **RandomForestRegressor**, **ExtraTreesRegressor** and **LinearSVR**\n   - Training, predicting and visualizing **DecisionTreeRegressor**,\n   - Improving prediction result using **Ensemble learning**, including testing **stacking**, **bagging** and **boosting** techniques,\n   - **Transforming** multidimensional matrix into 1d-vectors,\n   - **Pipeline** use to chain multiple estimators into one,\n   - **GridSearchCV** use to tune the hyper-parameters of an estimator,\n   - **Solving ML tasks**, *predict the value of some energy for each physical potential*, using the ExtraTreesRegressor, the task was solved with the use of ***PotentialTransformer*** and ***data preprocessing*** (centering).\n8. **Gradient Boosting**:\n   \u003e Used to predict the *price of used cars in a number of countries*, using **XGBoost**, **LightGBM**, **Catboost**, **HyperOpt**,\n   - **Dataset preprocess**: missing values replaces with average ones, cells separation, feature selection and encoding\n   - **Hyperopt** use to tune the hyper-parameters of an estimator,\n   - **Solving ML tasks**, *predict the number of awards for the film*, the task was solved with the use of ***ensemble learning***.\n9. **Clusterization**:\n   - **Unsupervised machine learning methods** — **clustering** and **dimensionality reduction**.\n   - **PyTorch** and **Tensorflow** use,\n   - Using the dimensionality reduction algorithms **TSNE**, **UMAP**, **Isomap**, **KernelPCA**,\n   - Using **Transfer Learning** to transform to more representative feature space, where objects will be located in a variety that is easier to represent.\n\n## Project Tree\n```\n.\n├── Clustarization\n│       └── clusterization.ipynb\n├── Decision Trees\n│       ├── decision_trees.ipynb\n│       ├── decision_trees_ml.py\n│       └── decision_trees_unit-tests.py\n├── Gradient Boosting\n│       ├── gradient_boosting.ipynb\n│       └── gradient_boosting_ml.py\n├── KNN\n│       ├── cross_val.py\n│       ├── KNN_2023.ipynb\n│       └── scalers.py\n├── Linear Models: classification\n│       ├── Linear_Models_classification .ipynb\n│       └── Task.py\n├── Linear Models: regression\n│       └── Linear_Models_regression.ipynb\n├── numpy-pandas-matplotlib\n│       ├── functions.py\n│       ├── functions_vectorised.py\n│       └── Numpy_pandas_matplotlib.ipynb\n├── Python Introduction\n│       ├── task15.py\n│       ├── task6.py\n│       └── task7.py\n├── README.md\n└── SVM\n    ├── SVM.ipynb\n    └── svm_solution.py\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmolodchina%2Fmsu-ml-prac","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmolodchina%2Fmsu-ml-prac","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmolodchina%2Fmsu-ml-prac/lists"}