{"id":15157782,"url":"https://github.com/sanjiv856/machine_learning_scikit-learn","last_synced_at":"2026-02-27T13:10:49.471Z","repository":{"id":254511086,"uuid":"845237453","full_name":"sanjiv856/machine_learning_scikit-learn","owner":"sanjiv856","description":"Repository for machine learning in Python using Scikit-learn.","archived":false,"fork":false,"pushed_at":"2024-08-24T08:14:58.000Z","size":491,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-13T17:18:04.623Z","etag":null,"topics":["pipelines","python","scikit-learn","sklearn","titanic-kaggle","titanic-survival-prediction"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sanjiv856.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-20T21:26:20.000Z","updated_at":"2024-08-24T10:31:46.000Z","dependencies_parsed_at":"2024-11-03T03:41:44.379Z","dependency_job_id":"3a367e05-7267-47f0-8452-d4a0ef5fe617","html_url":"https://github.com/sanjiv856/machine_learning_scikit-learn","commit_stats":{"total_commits":4,"total_committers":1,"mean_commits":4.0,"dds":0.0,"last_synced_commit":"3c2af31d418efdfa0931a46dc0c0c404f75dceca"},"previous_names":["sanjiv856/machine_learning_scikit-learn"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sanjiv856%2Fmachine_learning_scikit-learn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sanjiv856%2Fmachine_learning_scikit-learn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sanjiv856%2Fmachine_learning_scikit-learn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sanjiv856%2Fmachine_learning_scikit-learn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sanjiv856","download_url":"https://codeload.github.com/sanjiv856/machine_learning_scikit-learn/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247675628,"owners_count":20977376,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pipelines","python","scikit-learn","sklearn","titanic-kaggle","titanic-survival-prediction"],"created_at":"2024-09-26T20:03:37.346Z","updated_at":"2025-11-01T03:04:43.882Z","avatar_url":"https://github.com/sanjiv856.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Titanic Survival Prediction\n\nThis project implements a machine learning pipeline to predict the survival of passengers on the Titanic using various classification algorithms. The project involves data preprocessing, feature engineering, model training, hyperparameter tuning, and model evaluation.\n\n## Project Structure\n- `data/` - Directory containing the dataset (`train.csv` and `test.csv`).\n- `python_scikit-learn_titanic.py` - Main script containing the code for loading data, preprocessing, training models, and generating submissions.\n\n## Running the Code\n\n### Feature Engineering \u0026 Preprocessing:\nFeature engineering is applied to create new features such as Family_Size, Is_Alone, Title, Age_Group, etc.\nPreprocessing pipelines are defined for numerical and categorical features.\n\n### Model Training \u0026 Hyperparameter Tuning:\n\nSeveral classifiers are trained and tuned using GridSearchCV, including:\n- Random Forest\n- Extra Trees\n- XGBoost\n- Decision Tree\n- Logistic Regression\n- Gaussian Naive Bayes\n- K-Nearest Neighbors\n\nBest models and their parameters are saved as .pkl files.\n\n### Key Libraries Used\n- pandas - Data manipulation and analysis.\n- numpy - Numerical computations.\n- matplotlib \u0026 seaborn - Data visualization.\n- scikit-learn - Machine learning library for model building and evaluation.\n- xgboost - Implementation of gradient boosting algorithm.\n\n### Feature Engineering\nThe following features are engineered:\n- Family_Size - Number of family members onboard.\n- Is_Alone - Binary feature indicating if the passenger was alone.\n- Title - Extracted from passenger names.\n- Age_Group - Binned age groups.\n- Ticket_Number, Ticket_Location - Extracted from ticket information.\n- Cabin_Alphabet, Cabin_Recorded - Extracted from cabin information.\n\n### Hyperparameter Tuning\nHyperparameters are tuned using GridSearchCV with cross-validation to find the best model configuration.\n\n### Feature Importance\nFeature importance is plotted for the top 20 features for each model.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanjiv856%2Fmachine_learning_scikit-learn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsanjiv856%2Fmachine_learning_scikit-learn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanjiv856%2Fmachine_learning_scikit-learn/lists"}