{"id":43263828,"url":"https://github.com/jonasneves/aipi510-project3","last_synced_at":"2026-02-01T15:03:03.778Z","repository":{"id":325660179,"uuid":"1101962332","full_name":"jonasneves/aipi510-project3","owner":"jonasneves","description":"Duke AIPI 510 Project 3 • AI/ML Salary Predictor • XGBoost model trained on H1B, Linkedin, Adzuna data • FastAPI + React","archived":false,"fork":false,"pushed_at":"2025-12-21T02:16:03.000Z","size":9518,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-22T23:19:27.932Z","etag":null,"topics":["aws-s3","fastapi","machine-learning","mlflow","python","react","salary-prediction","xgboost"],"latest_commit_sha":null,"homepage":"https://aisalary.neevs.io/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jonasneves.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-22T15:07:50.000Z","updated_at":"2025-12-21T02:16:07.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jonasneves/aipi510-project3","commit_stats":null,"previous_names":["jonasneves/aipi510-project3"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jonasneves/aipi510-project3","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonasneves%2Faipi510-project3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonasneves%2Faipi510-project3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonasneves%2Faipi510-project3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonasneves%2Faipi510-project3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jonasneves","download_url":"https://codeload.github.com/jonasneves/aipi510-project3/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonasneves%2Faipi510-project3/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28980855,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T13:38:33.235Z","status":"ssl_error","status_checked_at":"2026-02-01T13:38:32.912Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-s3","fastapi","machine-learning","mlflow","python","react","salary-prediction","xgboost"],"created_at":"2026-02-01T15:02:32.393Z","updated_at":"2026-02-01T15:03:03.770Z","avatar_url":"https://github.com/jonasneves.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI Salary Prediction Pipeline\n\n[![API Status](https://img.shields.io/endpoint?url=https://aisalary.neevs.io/api/badge/api\u0026label=API)](https://aisalary.neevs.io/api)\n[![App Status](https://img.shields.io/endpoint?url=https://aisalary.neevs.io/api/badge/app\u0026label=App)](https://aisalary.neevs.io)\n\n[![ML Pipeline](https://github.com/jonasneves/aipi510-project3/actions/workflows/ml-pipeline.yml/badge.svg)](https://github.com/jonasneves/aipi510-project3/actions/workflows/ml-pipeline.yml)\n\n**Live Demo:** [aisalary.neevs.io](https://aisalary.neevs.io) | [API Docs](https://aisalary.neevs.io/api/docs) | [Reports Portal (EDA + MLflow)](https://jonasneves.github.io/aipi510-project3/)\n\n## Overview\n\nPredict AI/ML salaries using machine learning. Built for Duke AIPI 510 Module Project 3.\n\n**Problem:** Estimate salary ranges for AI/ML roles based on job title, location, experience, and skills.\n\n**Solution:** XGBoost regression model trained on H1B visa filings, LinkedIn job postings, and Adzuna market data, deployed as a FastAPI service with a React frontend.\n\n## Dataset\n\n| Source | Description | Priority | Records |\n|--------|-------------|----------|---------|\n| [H1B Visa Data](https://www.dol.gov/agencies/eta/foreign-labor/performance) | DOL certified visa applications with actual salaries | 1 | ~10,000 AI/ML jobs |\n| [LinkedIn](https://www.linkedin.com) | Job postings with detailed salary, seniority, skills data | 1 | ~1,000+ (growing) |\n| [Adzuna](https://developer.adzuna.com/) | Job postings with salary ranges | 2 | ~16,500 |\n\nData hosted on AWS S3. Pipeline downloads and merges sources automatically.\n\n## Model\n\n**Architecture:** XGBoost Regressor\n\n| Parameter | Value |\n|-----------|-------|\n| n_estimators | 200 |\n| max_depth | 6 |\n| learning_rate | 0.1 |\n\n**Evaluation Metrics:**\n- MAE: ~$36,000\n- RMSE: ~$52,000\n- MAPE: ~23%\n\n**Top Features:** Years of experience, company tier, role type (researcher/scientist/analyst), entry-level indicator\n\n## Experiment Tracking\n\nMLflow is used for experiment tracking during model training.\n\n→ **[MLflow Overview](docs/MLflow-Overview.pdf)** | **[Feature Importance](docs/MLflow-FeatureImportance.pdf)**\n\n## Architecture\n\n![Architecture Diagram](architecture.png)\n\n## Quick Start\n\n```bash\nmake install           # Install Python dependencies\nmake frontend-install  # Install frontend dependencies\nmake pipeline          # Collect data, merge, and train model\nmake api               # Start FastAPI server (port 8000)\nmake frontend          # Start React dev server (port 5173)\n```\n\nTo train just the model: `make train`\nTo test API locally: `curl http://localhost:8000/api/health`\n\n## Tech Stack\n\n| Layer | Technology |\n|-------|------------|\n| ML | XGBoost, scikit-learn, pandas |\n| API | FastAPI, Pydantic |\n| Frontend | React, Vite, Tailwind CSS |\n| Tracking | MLFlow |\n| Cloud Storage | AWS S3 (data hosting) |\n| Cloud Deployment | Cloudflare Tunnel (API + frontend) |\n| CI/CD | GitHub Actions |\n\n## Project Structure\n\n```\nsrc/                  # ML pipeline (collectors, processing, models)\napi/                  # FastAPI endpoints\nfrontend-react/       # React frontend\nconfigs/              # YAML configuration files\nconfig.yaml           # Main pipeline configuration\nMakefile              # Build commands\nDockerfile            # Container build\n```\n\n## API\n\n```bash\n# Predict salary\ncurl -X POST https://aisalary.neevs.io/api/predict \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"job_title\": \"ML Engineer\", \"location\": \"CA\", \"experience_years\": 5}'\n\n# Get options\ncurl https://aisalary.neevs.io/api/options\n```\n\n## Documentation\n\n- [Setup Guide](docs/SETUP.md) - Local development\n- [Deployment Guide](docs/DEPLOYMENT.md) - Cloud deployment \u0026 AWS setup\n\n## Limitations \u0026 Ethical Considerations\n \n- **Geographic bias:** H1B data skews toward CA, NY, WA where most visa sponsors operate\n- **Role coverage:** Limited to AI/ML titles; doesn't cover adjacent roles well\n- **Temporal lag:** H1B filings reflect offers made 6-12 months prior\n- **Company representation:** Large tech companies overrepresented vs. startups\n- **Responsible Use:** Use predictions as one data point among many; avoid anchoring salary negotiations solely on model outputs\n\n## AI Usage Acknowledgement\n\n**AI Assistants:**\n- Claude Code (Anthropic) - code development, documentation, and research\n- Gemini 3 Pro Image / Nano Banana Pro (Google) - visual design\n\nAll code and analysis were reviewed, tested, and thoroughly understood by the team. The team takes full responsibility for the implementation and can explain all design decisions.\n\n## Authors\n\nJonas De Oliveira Neves \u0026 Omkar Sreekanth\n\nDuke University - AIPI 510, 2025\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonasneves%2Faipi510-project3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjonasneves%2Faipi510-project3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonasneves%2Faipi510-project3/lists"}