{"id":18075619,"url":"https://github.com/gurramankit/censusproject_ml_randomforest","last_synced_at":"2026-04-14T04:03:21.149Z","repository":{"id":260133778,"uuid":"880429847","full_name":"gurramankit/CensusProject_ML_RandomForest","owner":"gurramankit","description":"The objective of this project is to build a classification model using the Census Income dataset from the UCI Machine Learning Repository. The model predicts whether an individual's income exceeds $50,000 per year, based on their demographic and employment-related attributes.","archived":false,"fork":false,"pushed_at":"2024-10-29T18:03:55.000Z","size":707,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-05T19:41:38.347Z","etag":null,"topics":["matplotlib","numpy","pandas","python","random-forest","scikit-learn","seaborn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gurramankit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-29T17:55:14.000Z","updated_at":"2024-10-29T18:52:28.000Z","dependencies_parsed_at":"2024-10-29T19:45:01.924Z","dependency_job_id":null,"html_url":"https://github.com/gurramankit/CensusProject_ML_RandomForest","commit_stats":null,"previous_names":["gurramankit/censusproject_ml_randomforest"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gurramankit/CensusProject_ML_RandomForest","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurramankit%2FCensusProject_ML_RandomForest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurramankit%2FCensusProject_ML_RandomForest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurramankit%2FCensusProject_ML_RandomForest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurramankit%2FCensusProject_ML_RandomForest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gurramankit","download_url":"https://codeload.github.com/gurramankit/CensusProject_ML_RandomForest/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gurramankit%2FCensusProject_ML_RandomForest/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278464337,"owners_count":25991182,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["matplotlib","numpy","pandas","python","random-forest","scikit-learn","seaborn"],"created_at":"2024-10-31T11:06:36.848Z","updated_at":"2025-10-05T14:22:15.342Z","avatar_url":"https://github.com/gurramankit.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CensusProject_ML_RandomForest\nThe objective of this project is to build a classification model using the Census Income dataset from the UCI Machine Learning Repository. The model predicts whether an individual's income exceeds $50,000 per year, based on their demographic and employment-related attributes.\n## Overview\nThis project aims to classify whether an individual earns more than $50,000 per year, using data from the 1994 US Census. This classification task leverages demographic and employment-related attributes to predict income, making it an essential tool for income prediction and social-economic analysis.\n\n## Dataset\n- **Source**: [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/census+income)\n- **Size**: 48,842 instances, covering demographic, employment, and income data.\n\n## Problem Statement\nThe goal is to predict if a person's income exceeds $50,000 per year based on their demographic attributes and work-related features.\n\n## Project Tasks\n### 1. Data Preprocessing\n- **Handling Missing Values**: Managed missing entries to maintain data quality.\n- **Encoding Categorical Variables**: Used encoding techniques to convert categorical data into numerical form.\n- **Feature Scaling**: Applied scaling for algorithm compatibility and performance.\n\n### 2. Exploratory Data Analysis (EDA)\nConducted EDA to explore feature relationships and visualize income distribution patterns. Key insights included trends based on age, education, occupation, and work hours.\n\n### 3. Model Building and Evaluation\nDeveloped multiple machine learning models for classification, with a focus on accuracy and robustness:\n- **Best Model**: Random Forest Classifier, achieving **84% accuracy**.\n- **Evaluation Metrics**: Accuracy, Precision, Recall, F1-score for comprehensive assessment.\n\n## Results\n- The Random Forest model achieved an accuracy of 84%, successfully predicting income categories with high performance.\n\n## Installation and Usage\n1. Clone this repository:\n    ```bash\n    git clone https://github.com/gurramankit/census-income-classification.git\n    ```\n2. Install the necessary libraries:\n    ```bash\n    pip install -r requirements.txt\n    ```\n3. Run the project:\n    ```bash\n    python main.py\n    ```\n\n## Project Structure\n- `data/` - Contains the Census Income dataset.\n- `notebooks/` - Jupyter notebooks for EDA, data preprocessing, and model development.\n- `src/` - Python scripts for data processing and model implementation.\n- `README.md` - Overview and instructions.\n- `requirements.txt` - Dependencies for running the project.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgurramankit%2Fcensusproject_ml_randomforest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgurramankit%2Fcensusproject_ml_randomforest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgurramankit%2Fcensusproject_ml_randomforest/lists"}