{"id":22731152,"url":"https://github.com/FreeAnalyticsPR/Kaggle","last_synced_at":"2025-08-08T09:31:39.604Z","repository":{"id":162342084,"uuid":"569923734","full_name":"FreeAnalyticsPR/Kaggle","owner":"FreeAnalyticsPR","description":"16 optimizing insights on ensemble learning with Python.","archived":false,"fork":false,"pushed_at":"2024-09-28T22:54:47.000Z","size":32664,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-07T00:22:33.859Z","etag":null,"topics":["deep-learning","insights","jupyter-notebook","kaggle","lightgbm","machine-learning","optimization-methods","python","xgboost"],"latest_commit_sha":null,"homepage":"https://www.coursera.org/user/3df13832d0fc4d5a1f5d652a5fec09cb","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FreeAnalyticsPR.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-23T23:43:58.000Z","updated_at":"2024-09-28T22:54:50.000Z","dependencies_parsed_at":"2025-01-07T08:28:40.918Z","dependency_job_id":null,"html_url":"https://github.com/FreeAnalyticsPR/Kaggle","commit_stats":null,"previous_names":["free-analytics/kaggle","freeanalyticspr/kaggle"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/FreeAnalyticsPR/Kaggle","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreeAnalyticsPR%2FKaggle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreeAnalyticsPR%2FKaggle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreeAnalyticsPR%2FKaggle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreeAnalyticsPR%2FKaggle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FreeAnalyticsPR","download_url":"https://codeload.github.com/FreeAnalyticsPR/Kaggle/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreeAnalyticsPR%2FKaggle/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269400468,"owners_count":24410926,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-08T02:00:09.200Z","response_time":72,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","insights","jupyter-notebook","kaggle","lightgbm","machine-learning","optimization-methods","python","xgboost"],"created_at":"2024-12-10T19:19:38.379Z","updated_at":"2025-08-08T09:31:36.657Z","avatar_url":"https://github.com/FreeAnalyticsPR.png","language":"Jupyter Notebook","readme":"## Awarded 16 analytical reports with Python at 16GB Kernel of Kaggle.\n### About the Author\n- Name: Satoru Shibata / 柴田 怜\n- Job: Sr. Data Scientist\n- Titles:\n  - [3x Kaggle Expert](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Evidence_3x_Kaggle_Expert.pdf)\n    - Retired from Kaggle at 2021 to focus on business as a data scientist.\n  - [4x Certified Professional](https://www.coursera.org/user/3df13832d0fc4d5a1f5d652a5fec09cb)\n    - [IBM Data Science Professional Certificate](https://www.credly.com/badges/c401bae6-9e5c-4071-8301-871a4283e4b2) at 2022.\n    - [IBM Applied Data Science Specialization Certificate](https://www.coursera.org/account/accomplishments/specialization/UYB8WV8FQDSH?utm_product=s12n) at 2023.\n    - [SAS Statistical Business Analyst Professional Certificate](https://www.credly.com/badges/91f1e7d7-33d0-4893-a55e-2270c40e5055) at 2024.\n    - [IBM Generative AI for Data Scientists Specialization Certificate](https://www.coursera.org/account/accomplishments/specialization/EQMNLGETBUM3?utm_product=s12n) at 2024.\n\n### Score Table\n| Departments      | Top Levels| Highest Rank | Awarded Medals     | \n| :---:            | :-------: | :----------: | :----------------: |\n| **Code**         | 0.2%      | 317/161,898  | 3 Silver 13 Bronze |\n| **Discussion**   | 0.3%      | 588/188,433  | 100 Bronze         |\n| **Datasets**     | 1%        | 354/34,643   | 3 Bronze           |\n| **Competitions** | 20-30%\n\n### Abstracts\n#### 3 Silver Medals\n1. [Optimized LightGBM with Optuna adding SAKT Model](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Silver_Medal_Optimized_LightGBM_with_Optuna_adding_SAKT_Model.ipynb)\n    - Lead sentences\n      - Submitted code using two Ensemble Learning Methods.\n      - Used 100 million rows of training data for prediction on a 16GB Kernel removing unnecessary objects.\n    - Issue\n      - Algorithms for TOEIC Learning Applications\n    - Significance\n      - Predict percentage of correct answers based on user's behavioral history.\n      - User's percentage of correct answers will increase with the number of problems solved.\n    - Purpose\n      - Optimize Binary Classification for AUC.\n    - Methodology:\n      - Ensemble Learning of LightGBM and SAKT.\n      - Hyperparameter Optimization with Optuna.\n    - Results\n      - Score: AUC = 0.781.\n      - Code: 31 Points.\n    - Considerations\n      - Obsessed with Models, Feature Engineering Remains a Challenge.\n      - Systematizing Multiple Models will also be a challenge in the future.\n    - Conclusion\n      - Code Silver Medal\n2. [LightGBM Classifier and Logistic Regression Report](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Silver_Medal_LightGBM_Classifier_and_Logistic_Regression_Report.ipynb)\n    - Lead sentences\n      - Optimized Classification of Anonymized Raw Data from Stock Market on 16GB Kernel.\n      - Contributed code that systematizes Ensemble Learning and Logistic Regression.\n    - Issue\n      - Utility Function Optimization of Supply and Demand Forecasting in Securities Markets.\n    - Significance\n      - Calculate based on indicators of absence or degree of stock returns.\n      - Optimize the behavior of whether to trade or not.\n    - Purpose\n      - AI Dev for Profit Maximization.\n    - Methodology\n      - Optimal classification of LightGBM.\n      - Logit Transformation of Purpose Variables Based on Probability Distributions.\n    - Results\n      - Score: 3741.118 (Outside of Medal Zone).\n      - Code: 33 Points.\n    - Considerations\n      - Utility function was not fully deciphered\n      - Which left some issues for paper surveys.\n      - The report was appreciated by other Kagglers.\n    - Conclusion\n      - Code Silver Medal\n3. [Optimize LightGBM HyperParameter with Optuna and GPU](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Silver_Medal_Optimize_LightGBM_HyperParameter_with_Optuna_and_GPU.ipynb)\n    - Lead sentences\n      - Unprecedented LightGBM Hyperparameter Optimization on GPU.\n      - Procedure was annexed and highly evaluated.\n    - Issues\n      - Preliminaries of “LightGBM Classifier and Logistic Regression Report“ .\n    - Significance\n      - High Parameter Optimization.\n      - There were few precedents for LightGBM.\n    - Purpose\n      - Code submission for optimizing LightGBM Hyperparameter on GPU.\n    - Methodology\n      - A survey of prior case studies using Optuna for LightGBM.\n      - Procedure of ssubmissions.\n    - Results\n      - Run: 953.9s on GPU\n      - Code: 31 Points.\n    - Consideration\n      - Available hyperparameter optimization of futures.\n    - Conclusion\n      - Code Silver Medal.\n\n#### 13 Bronze Medals\n1. [Optimized Logit LightGBM Classifier and CNN Models](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_Optimized_Logit_LightGBM_Classifier_and_CNN_Model.ipynb)\n    - Lead sentences\n      - Submitted a simulation of Multiple Model Systematization.\n      - Based on this failure, I was able to concentrate on LightGBM Optimization and Inference.\n    - Issue\n      - Exploring Optimization Models\n    - Significance\n      - Simulation iterations of Optimization Model.\n    - Purpose\n      - Optimize Utility Function by systematizing Multiple Models.\n    - Methodology\n      - Applying the Logit Transform to LightGBM.\n      - Explore combining with CNN.\n    - Results\n      - Score: 3344.738 (Outside of Medal Zone).\n      - Code: 15 Points.\n    - Considerations\n      - This code does LightGBM and CNN at the same time, which was prone to overflow.\n      - From now on, I will focus on one Model Optimization.\n    - Conclusion\n      - Code Bronze Medal\n1. [Optimized LightGBM with Optuna](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_Optimized_LightGBM_with_Optuna.ipynb)\n    - Lead sentences\n      - Dev Baseline Model for Code Competition to process 100 million rows of training data.\n      - The minimum performance was predicted to be 16GB.\n    - Issue\n      - 100 million rows of training data must be predicted on a 16GB Kernel.\n    - Significance\n      - This is the cornerstone of the final submission model.\n      - Preprocess and Feature Engineering were adjusted for further optimization.\n    - Purpose\n      - Baseline Model Dev\n    - Methodology\n      - Binary Classification by LightGBM Optimization.\n    - Results\n      - Score: AUC = 0.774.\n      - Code: 12 Points.\n    - Considerations\n      - Policy of additional development to Baseline Model.\n      - The improvement of AUC by the additional development was only 0.07, which left some issues.\n    - Conclusion\n      - Code Bronze Medal\n1. [LightGBM on GPU with Feature Engineering, Optuna, and Visualization](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_LightGBM_on_GPU_with_Feature_Engineering_Optuna_and_Visualization.ipynb)\n    - Lead sentence\n      - Code Bronze Medal for first attempt at submitting code.\n    - Issue\n      - This was my first real effort at Kaggle.\n    - Significance\n      - Visualize in a timely manner, and features were studied.\n      - Optuna was also used for the first time and applied later.\n    - Purpose\n      - Work on Feature Engineering.\n    - Methodology\n      - I read and referred to posted code by Kaggle Grandmaster.\n    - Results\n      - Code: 11 Points.\n    - Consideration\n      - I could gain experiences in implementing LightGBM with Optuna on GPU.\n    - Conclusion\n      - Code Bronze Medal.\n1. [LightGBM with the Inference and Empirical Analysis](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_LightGBM_with_the_Inference_and_Empirical_Analysis.ipynb)\n    - Lead sentences\n      - In the first scored submission code, AUC = 0.76.\n      - The challenges were used as the cornerstone of development experiences.\n    - Issue\n      - Scoring by developing additions to the submitted code for my first challenge.\n    - Significance\n      - A single process was limited to Model Object Generation.\n    - Purpose\n      - To further improve the performance of Prediction Model.\n    - Methodology\n      - Inference was added to improve Score.\n      - Empirical Analysis between raw data and predicted results.\n      - Detected significant differences in Gaussian Distribution.\n    - Results\n      - Score: AUC = 0.76.\n      - Code: 12 Points.\n    - Consideration\n      - This submitted code left insufficient understanding of inference as an issue.\n    - Conclusion\n      - Code Bronze Medal.\n1. [Submission and the Inference of LightGBM](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_Submission_and_the_Inference_of_LightGBM.ipynb)\n    - Lead sentences\n      - My first scoring submission code prototype\n      - Few examples of Empirical Analysis, I won Code Bronze Medal.\n    - Issue\n      - Prototype version of submission code for first scoring.\n    - Significance\n      - Implementing the scoring submission code.\n    - Purpose\n      - Gaining development experiences.\n    - Methodology\n      - Model objects were coded for scoring.\n      - Empirical Analysis detected a significant difference in Gaussian Distribution.\n    - Result\n      - Code: 7 Points.\n    - Considerations\n      - Actual scoring submission code became a separate file.\n      - This was an opportunity for me to feel the challenge of coding.\n      - Focused on its afterwards.\n    - Conclusion\n      - Code Bronze Medal.\n1. [Market Prediction XGBoost with GPU Modified](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_Market_Prediction_XGBoost_with_GPU_Modified.ipynb)\n    - Lead sentences\n      - Performance comparison with LightGBM by XGBoost Optimization.\n      - LightGBM takes the cake.\n    - Issue\n      - I seen good results with XGBoost sometimes.\n    - Significance\n      - Simulate on Models other than LightGBM and search for Optimized Model.\n    - Purpose\n      - Score improvement by XGBoost.\n    - Methodology\n      - GPU Implementation into XGBoost Optimization.\n    - Results\n      - Score: 3308.824 (Outside of Medal Zone).\n      - Code: 8 Points.\n    - Considerations\n      - XGBoost is easy to implement due to its many precedents.\n      - LightGBM is superior in performance comparison, which led me to focus on LightGBM.\n    - Conclusion\n      - Code Bronze Medal.\n1. [Cassava Leaf Disease Best Keras CNN Tuning](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_Cassava_Leaf_Disease_Best_Keras_CNN_Tuning.ipynb)\n     - Lead sentences\n        - I also participated in a competition on image analysis, challenging myself with raw data of various properties.\n        - I was left with some issues on the theoretical side, which gave me an opportunity to work from theoretical books.\n     - Issue\n        - I would like to try my hand at image analysis and find out what I am good at.\n     - Significance\n        - I want to gain experience in Keras implementation.\n        - Deepen my understanding CNN.\n     - Purpose\n        - I learn to understand and implement acoustic analysis and image analysis.\n     - Methodology\n        - I complemented the advanced submission code.\n     - Results\n        - Score: Accuracy = 0.885.\n        - Code: 18 Points.\n     - Considerations\n        - Theoretical aspects of acoustic analysis and image analysis remained a challenge.\n        - An opportunity to raise awareness to need to start with a survey of theoretical papers.\n    - Conclusion\n        - Code Bronze Medal\n1. [RFCX Residual Network with TPU Customized](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_RFCX_Residual_Network_with_TPU_Customized.ipynb)\n    - Lead sentences\n      - I also participated in a competition for acoustic analysis, and tried my hand at raw data of various properties.\n      - I was left with some issues on the theoretical side, which gave me an opportunity to work from theoretical books.\n    - Issue\n      - I would like to try my hand at acoustic analysis and find out what I am good at.\n    - Significance\n      - I want to gain experience in Keras implementation.\n      - Deepen my understanding CNN.\n    - Purpose\n      - I learn to understand and implement acoustic analysis and image analysis.\n    - Methodology\n      - I complemented the advanced submission code.\n    - Results\n      - Score: 0.772.\n      - Code: 12 Points.\n    - Considerations\n      - Theoretical aspects of acoustic analysis and image analysis remained a challenge.\n      - An opportunity to raise awareness to need to start with a survey of theoretical papers.\n    - Conclusion\n      - Code Bronze Medal.\n1. [Research with Customized Sharp Weighted](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_Research_with_Customized_Sharp_Weighted.ipynb)\n    - Lead sentences\n      - Work on Custom Metrics Clarification and systematization of Hyperparameters Optimization in LightGBM.\n      - An each milestone optimization object generation is still important.\n    - Issue\n      - Private Custom Metrics were used as an Evaluation Function.\n    - Significance\n      - Improve prediction accuracy by elucidating Private Custom Metrics.\n      - Reproducibility will be determined based on the Evaluation Function.\n    - Purpose\n      - Custom Metrics Clarification.\n    - Methodology\n      - LightGMB High Parameter Optimization.\n      - Systematization with Custom Metrics Decoding Examples.\n    - Results\n      - Generate each Parameter Optimization Object.\n      - Code: 6 Points.\n    - Consideration\n      - Importance of each milestone optimization object generation was reaffirmed.\n    - Conclusion\n      - Code Bronze Medal.\n1. [Optimize CatBoost HyperParameter with Optuna and GPU](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_Optimize_CatBoost_HyperParameter_with_Optuna_and_GPU.ipynb)\n    - Lead sentences\n      - Performance comparison was performed on optimized Ensemble Learning.\n      - LightGBM won the prediction accuracy.\n    - Issue\n      - I was new to CatBoost and wanted to compare performance with LightGBM.\n    - Significance\n      - Performance comparison of Ensemble Learning: LightGBM, XGBoost, CatBoost, etc.\n    - Purpose\n      - Algorithm selection for Prediction Models.\n    - Methodology\n      - Hyper-parameter optimization.\n      - CatBoost implementation.\n    - Results\n      - Score: AUC = 0.500.\n      - Code: 17 Points.\n    - Consideration\n      - At the baseline model stage, I gave the edge to LightGBM.\n    - Conclusion\n      - Code Bronze Medal.\n1. [LightGBM on Lyft Tabular Data added Inference and Tuning](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_LGBM_on_Lyft_Tabular_Data_Inference_Tuning.ipynb)\n    - Lead sentences\n      - Regression Prediction of LightGBM with Grid Search and Multiple Evaluation Functions.\n      - A harvest that uncovered all sorts of challenges!.\n    - Issue\n      - Regression Problem for Table Data Related to Automated Driving.\n    - Significance\n      - I want to work on Regression Prediction with LightGBM.\n      - Gain further development experiences.\n      - Implement multiple evaluation functions to improve accuracy.\n    - Purpose\n      - Improving accuracy of Regression Prediction.\n    - Methodology\n      - Set evaluation functions of LightGBM in MSE and RMSE.\n      - Parameter search by grid search.\n    - Results\n      - Score: 356.084.\n      - Code: 10 Points.\n    - Considerations\n      - Grid search shown that hyperparameter optimization is inefficient.\n      - I reaffirmed the need to use feature engineering and inference.\n    - Conclusion\n      - Code Bronze Medal\n1. [COVID-19 with H2OAutoML Baseline Model](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_COVID-19_with_H2OAutoML_Baseline.ipynb)\n    - Lead sentences\n      - Experimented with AutoML performance, but found the original to be more powerful.\n      - This led to the original development of the LightGBM optimization.\n    - Issue\n      - COVID-19 infection explosion and new global challenges.\n    - Significance\n      - Improvement of coding techniques for anonymized Table data.\n      - Accumulate experiences using AutoML.\n    - Purpose\n      - Optimization Regression Prediction with AutoML.\n    - Methodology\n      - Set RMSLE as evaluation function for Regression Prediction with H2O.\n      - Extract the optimized Regression Prediction Models: Deep Learning, XGBoost, GLM, GBM, etc.\n    - Results\n      - Score: RMSLE = 0.086.\n      - Code: 6 Points.\n    - Considerations\n      - Original development was more powerful than H2OAutoML.\n      - Opportunity to work on Optimized Regression Prediction with LightGBM.\n    - Conclusion\n      - Code Bronze Medal.\n1. [Optimized Predictive Model with H2OAutoML](https://github.com/Satoru-Shibata-JPN/Kaggle/blob/main/Kaggle_Python3_Bronze_Medal_Optimized_Predictive_Model_with_H2OAutoML.ipynb)\n    - Lead sentences\n      - Even in Binary Classification, AutoML was found to be inferior to proprietary.\n      - It is thought that the difference was due to Preprocess and Feature Engineering.\n    - Issue\n      - Regression Prediction by H2OAutoML was inferior to original development.\n    - Significance\n      - It was unclear whether results would be similar to Regression Prediction.\n    - Purpose\n      - Experiment on H2OAutoML in Binary Classification.\n    - Methodology\n      - Set RMSLE as the evaluation function for Binary Classification with H2O.\n      - Extract the Optimized Binary Classification Models: Deep Learning, XGBoost, GLM, GBM. ey/tc.\n    - Results\n      - Score: AUC = 0.850.\n      - Code: 5 Points.\n    - Considerations\n      - The performance was higher than that of Regression Prediction Case.\n      - Process and Feature Engineering itself is not automated.\n      - It has to be developed independently.\n    - Conclusion\n      - Code Bronze Medal.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFreeAnalyticsPR%2FKaggle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFreeAnalyticsPR%2FKaggle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFreeAnalyticsPR%2FKaggle/lists"}