{"id":26378382,"url":"https://github.com/bimadevs/supervised_regression_salaryprediction","last_synced_at":"2025-07-02T14:09:30.439Z","repository":{"id":271877796,"uuid":"914851756","full_name":"bimadevs/Supervised_Regression_SalaryPrediction","owner":"bimadevs","description":"This project aims to predict the salary of employees based on their years of experience using supervised machine learning techniques.","archived":false,"fork":false,"pushed_at":"2025-01-10T12:55:05.000Z","size":235,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-02T14:09:26.648Z","etag":null,"topics":["artificial-intelligence","dataanalytics","datascience","dibimbing","machine-learning","ml","predictive-modeling"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bimadevs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-10T12:47:04.000Z","updated_at":"2025-01-18T02:09:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"1fe34d54-1389-4ee2-8454-69f9237cf378","html_url":"https://github.com/bimadevs/Supervised_Regression_SalaryPrediction","commit_stats":null,"previous_names":["bimadevs/supervised_regression_salaryprediction"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bimadevs/Supervised_Regression_SalaryPrediction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bimadevs%2FSupervised_Regression_SalaryPrediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bimadevs%2FSupervised_Regression_SalaryPrediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bimadevs%2FSupervised_Regression_SalaryPrediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bimadevs%2FSupervised_Regression_SalaryPrediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bimadevs","download_url":"https://codeload.github.com/bimadevs/Supervised_Regression_SalaryPrediction/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bimadevs%2FSupervised_Regression_SalaryPrediction/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263154351,"owners_count":23422010,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","dataanalytics","datascience","dibimbing","machine-learning","ml","predictive-modeling"],"created_at":"2025-03-17T04:51:40.840Z","updated_at":"2025-07-02T14:09:30.428Z","avatar_url":"https://github.com/bimadevs.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Salary Prediction: Regression Analysis\n\n## Project Overview\nThis project aims to predict the salary of employees based on their years of experience using supervised machine learning techniques. Two regression models, Linear Regression and Decision Tree Regression, are implemented and compared.\n\n\u003cimg src=\"download.png\" alt=\"overview salary Prediciton\"\u003e\n\n---\n\n## Dataset Information\n- **Dataset Name**: `salary_data.csv`\n- **Columns**:\n  - `employee_id`: Unique identifier for each employee.\n  - `experience_years`: Years of experience of the employee.\n  - `salary`: Salary of the employee.\n\n---\n\n## Workflow\n### 1. Data Ingestion\n- Dataset is read and loaded using `pandas`.\n- Initial exploration is done with:\n  - `data.head()`\n  - `data.info()`\n  - `data.describe()`\n\n### 2. Exploratory Data Analysis (EDA)\n- A scatter plot visualizes the relationship between `experience_years` and `salary` using `seaborn` and `matplotlib`.\n\n### 3. Data Preparation\n- Checked for duplicates and removed them.\n- Checked for missing values (none found).\n- Dataset was split into predictors (`X`) and target (`y`).\n- Data was further split into training and testing sets:\n  - Train/Test Ratio: 75/25\n\n### 4. Model Implementation\n#### a. Linear Regression\n- Fitted a linear regression model using `sklearn.linear_model.LinearRegression`.\n- Plotted actual vs predicted values.\n- Evaluated using:\n  - Mean Squared Error (MSE)\n  - R-squared (R²) score\n\n#### b. Decision Tree Regression\n- Fitted a decision tree regressor using `sklearn.tree.DecisionTreeRegressor`.\n- Plotted actual vs predicted values.\n- Evaluated using:\n  - Mean Squared Error (MSE)\n  - R-squared (R²) score\n\n---\n\n## Results\n### Linear Regression\n- Model Equation: `y = 1641.366 + 103.197 * x`\n- Evaluation Metrics:\n  - **Train MSE**: 107699.85\n  - **Test MSE**: 128111.12\n  - **Train R²**: 0.77\n  - **Test R²**: 0.63\n\n### Decision Tree Regression\n- Evaluation Metrics:\n  - **Train MSE**: 88.12\n  - **Test MSE**: 128311.56\n  - **Train R²**: 1.00\n  - **Test R²**: 0.61\n\n---\n\n## Dependencies\n- Python 3.8+\n- Libraries:\n  - `pandas`\n  - `numpy`\n  - `matplotlib`\n  - `seaborn`\n  - `scikit-learn`\n\n---\n\n## Usage\n1. Install required libraries:\n   ```bash\n   pip install pandas numpy matplotlib seaborn scikit-learn\n   ```\n2. Run the script in your Python environment.\n3. Ensure the `salary_data.csv` file is in the working directory.\n\n---\n\n## Key Insights\n- There is a strong positive correlation between years of experience and salary.\n- Linear regression provides a simpler model but is less precise than the decision tree on this dataset.\n- Decision trees overfit the training data but show similar performance to linear regression on the test data.\n\n---\n\n## Future Improvements\n- Use additional features to improve model performance.\n- Experiment with more advanced regression models like Random Forest or Gradient Boosting.\n- Perform hyperparameter tuning for Decision Tree to reduce overfitting.\n\n---\n\n## Contact\nFor any queries or contributions, please reach out at: [bimadev06@gmail.com](mailto:bimadev06@gmail.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbimadevs%2Fsupervised_regression_salaryprediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbimadevs%2Fsupervised_regression_salaryprediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbimadevs%2Fsupervised_regression_salaryprediction/lists"}