{"id":31836259,"url":"https://github.com/friendotjava/income-prediction","last_synced_at":"2025-10-12T01:28:43.702Z","repository":{"id":317003556,"uuid":"1065602662","full_name":"FrienDotJava/income-prediction","owner":"FrienDotJava","description":"A machine learning project classifying whether someone has income \u003e$50K or \u003c$50K using several models. Integrated with DVC Pipeline.","archived":false,"fork":false,"pushed_at":"2025-09-28T06:58:31.000Z","size":12,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-28T07:22:52.865Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FrienDotJava.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-28T03:55:31.000Z","updated_at":"2025-09-28T06:58:35.000Z","dependencies_parsed_at":"2025-10-01T06:32:43.077Z","dependency_job_id":null,"html_url":"https://github.com/FrienDotJava/income-prediction","commit_stats":null,"previous_names":["friendotjava/income-prediction"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/FrienDotJava/income-prediction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrienDotJava%2Fincome-prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrienDotJava%2Fincome-prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrienDotJava%2Fincome-prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrienDotJava%2Fincome-prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FrienDotJava","download_url":"https://codeload.github.com/FrienDotJava/income-prediction/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrienDotJava%2Fincome-prediction/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279009766,"owners_count":26084647,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-12T01:28:42.485Z","updated_at":"2025-10-12T01:28:43.697Z","avatar_url":"https://github.com/FrienDotJava.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Income Prediction (Adult Census)\n\n\u003e A modular machine-learning project that predicts whether a person’s annual income is **\u003e $50K** or **≤ $50K**, built with a clean Cookiecutter Data Science structure, tracked with **DVC** (and DVCLive), and instrumented for **MLflow** experiment logging. A small **Streamlit** dashboard is included for quick exploration.\n\n---\n\nTry it here: https://income-prediction1.streamlit.app/\n\n## 📌 Project goals\n\n- Train solid baseline \u0026 boosted tree models for the Adult/Census Income task (binary classification \u003e$50K).\n- Keep work **reproducible** (DVC pipelines + parameters), **trackable** (DVCLive/MLflow), and **organized** (Cookiecutter DS layout).\n- Provide a minimal **dashboard** to poke the model and visualize results.\n\n---\n\n## 🗂 Repository structure\n\n```\n├── data/                 # raw/ → interim/ → processed/ (DVC-managed)\n├── docs/                 # (optional) project docs\n├── dvclive/              # live metrics/artifacts from runs\n├── income_classification/\n│   ├── __init__.py\n│   ├── config.py\n│   ├── dataset.py        # data download/prepare helpers\n│   ├── features.py       # feature engineering\n│   └── modeling/\n│       ├── __init__.py\n│       ├── predict.py    # inference script\n│       └── train.py      # training script\n├── notebooks/            # EDA \u0026 scratch work\n├── references/           # data dictionary, notes, etc.\n├── dashboard.py          # Streamlit mini app\n├── dvc.yaml              # DVC pipeline (stages \u0026 deps)\n├── dvc.lock              # DVC lockfile (auto-generated)\n├── params.yaml           # central hyperparams \u0026 config\n├── requirements.txt      # Python dependencies\n├── Makefile              # convenience commands\n└── README.md\n```\n\n---\n\n## 📦 Dataset\n\nThis project uses the **Adult (Census Income)** dataset: **48,842** rows, **14** features, binary target (\u003e $50K).  \nYou can obtain it from UCI or Kaggle.\n\n\u003e Place raw files under `data/raw/`.\n\n---\n\n## 🛠️ Quickstart\n\n### 1) Setup environment\n\n```bash\ngit clone https://github.com/FrienDotJava/income-prediction.git\ncd income-prediction\n\npython -m venv .venv\nsource .venv/bin/activate  # Windows: .venv\\Scripts\\activate\n\npip install -r requirements.txt\n```\n\n### 2) Get the data\n\nDownload the Adult/Census Income data and put it here:\n\n```\ndata/\n└── raw/\n    └── adult.csv\n```\n\n### 3) Reproduce the pipeline (DVC)\n\n```bash\ndvc repro\n```\n\n- **Stages** and dependencies live in `dvc.yaml`; `dvc repro` runs data prep → feature building → training → evaluation.\n- Metrics and plots are logged in `dvclive/`.\n\n### 4) Tweak parameters \u0026 rerun\n\nEdit **`params.yaml`** to change model settings, then:\n\n```bash\ndvc repro\n```\n\n### 5) Track experiments (MLflow)\n\n```bash\nmlflow ui\n```\n\nOpen [http://127.0.0.1:5000](http://127.0.0.1:5000) to explore experiments.\n\n### 6) Run the dashboard\n\n```bash\nstreamlit run dashboard.py\n```\n\n---\n\n## 🧪 Pipeline overview\n\n- **Data prep**: clean \u0026 split the Adult dataset.  \n- **Feature engineering**: encode categoricals, scale numerics.  \n- **Model training**: Logistic Regression, RandomForest, GradientBoosting, etc.  \n- **Evaluation**: accuracy, precision, recall, F1, ROC-AUC, confusion matrix.  \n- **Experiment tracking**: DVC, DVCLive, MLflow.\n\n---\n\n## 📈 Example results\n\nTypical accuracy: **80–86%** (depends on preprocessing and model).\n\n---\n\n## ▶️ Makefile shortcuts\n\n```bash\nmake train      # Run training\nmake clean      # Clean temp artifacts\nmake dashboard  # Launch Streamlit dashboard\n```\n\n---\n\n## 📚 References\n\n- Adult (Census Income) dataset (UCI)\n- Kaggle: Adult Census Income\n- Cookiecutter Data Science\n\n---\n\n## 💡 Tips\n\n- If only `params.yaml` changes, rerun `dvc repro`.  \n- Use `dvc commit \u0026\u0026 dvc push` to sync data to remote storage.  \n- Use Streamlit for fast visual validation.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffriendotjava%2Fincome-prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffriendotjava%2Fincome-prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffriendotjava%2Fincome-prediction/lists"}