{"id":50861192,"url":"https://github.com/mohsinraza2999/generous-tipper","last_synced_at":"2026-06-14T21:34:49.153Z","repository":{"id":337828768,"uuid":"1152864517","full_name":"mohsinraza2999/generous-tipper","owner":"mohsinraza2999","description":"A production level modular data science project aims to predict generous tippers for taxi drivers.","archived":false,"fork":false,"pushed_at":"2026-02-11T19:08:35.000Z","size":1061,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-11T21:49:30.659Z","etag":null,"topics":["backend-development","ci-pipeline","data-analysis","data-cleaning-and-preprocessing","docker","exploratory-data-analysis","fastapi","feature-engineering","front-end","hypothesis-testing","logistic-regression","randon-forest","understanding-business-problem","xgboost-classifier"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mohsinraza2999.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-08T14:59:13.000Z","updated_at":"2026-02-11T19:20:36.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mohsinraza2999/generous-tipper","commit_stats":null,"previous_names":["mohsinraza2999/generous-tipper"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/mohsinraza2999/generous-tipper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohsinraza2999%2Fgenerous-tipper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohsinraza2999%2Fgenerous-tipper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohsinraza2999%2Fgenerous-tipper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohsinraza2999%2Fgenerous-tipper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mohsinraza2999","download_url":"https://codeload.github.com/mohsinraza2999/generous-tipper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohsinraza2999%2Fgenerous-tipper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34339195,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backend-development","ci-pipeline","data-analysis","data-cleaning-and-preprocessing","docker","exploratory-data-analysis","fastapi","feature-engineering","front-end","hypothesis-testing","logistic-regression","randon-forest","understanding-business-problem","xgboost-classifier"],"created_at":"2026-06-14T21:34:47.697Z","updated_at":"2026-06-14T21:34:49.148Z","avatar_url":"https://github.com/mohsinraza2999.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Generous Tip Giver Prediction\n\n## Problem\nTaxi ride-hailing platforms rely heavily on tips as a key component of driver income, yet passenger tipping behavior is highly variable and difficult to predict. This unpredictability limits the platform’s ability to optimize driver–rider matching, incentives, and service quality. Large volumes of trip, fare, temporal, and behavioral data are generated but remain underutilized for tipping prediction. A data science and machine learning approach can identify patterns that distinguish generous tippers from others. Ultimately, this leads to higher service quality, better retention, and increased platform efficiency.\n\n## Solution\nBuilt a full ML pipeline including:\n- Data ingestion \u0026 cleaning\n- Feature engineering\n- Model training (XGBoost, Random Forest, Logistic Regression)\n- Fast API deployment\n- Dockerized application\n\n## 📊 Dataset\n\n* **Type:** Yellow Taxi Trip dataset from kaggle\n* **Target:** Generous Tipper\n* **Features:** Eighteen Numerical and encoded categorical attributes\n* **Size:** 22700 Observations\n\n## Tech Stack\nPython, Pandas, Scikit-learn, XGBoost, FastAPI, Docker\n\n## Architecture\n```text\ngenerous-tipper/\n│\n├── data/                 # raw \u0026 processed data\n├── config/               # data \u0026 training configurations\n├── frontend/             # Core frontend logic with dockerization\n├── notebooks/            # Training and data cleaning notebooks\n├── src/                  # Core data, training and backend pipeline logic\n├── tests/                # Basic unit tests of data, training, api pipelines\n├── docker-compose.yaml   # dockerizing back and frontend with health check every 30 seconds\n├── Dockerfile            # multi-step dockerization for clean containerization\n├── pyproject.toml\n├── README.md\n└── LICENSE\n```\n\n---\n\n## 🚀 Quick Start\n\n```bash\ngit clone https://github.com/mohsinraza2999/generous-tipper.git\ncd house-price-prediction\npython src/cli.py preprocess\npython src/cli.py train\npython src/cli.py route\n```\n\n---\n\n## 🔮 Making Predictions\n```bash\npython src/cli.py route\n```\nFor only backend and Swagger UI.\n```text\nhttp://localhost:8000/docs\n```\nExample response:\n\n```json\n{\n  \"prediction\": \"generous\",\n  \"processed_at\": \"10-02-2026T07:30:21S\",\n  \"latency_ms\": 0.04\n}\n```\n\n---\n\n## 🧪 Testing\n\nRun all unit and integration tests:\n\n```bash\npip install pytest\npytest tests/\n```\n\nTests cover:\n\n* Data preprocessing pipeline\n* API routes\n* Model inference behavior\n\n---\n\n## 🧱 Docker Build\nDockerize back and frontend. Also check health in every 30 seconds.\n```bash\ndocker-compose up --build\n```\n\n1. Run in browser for both front and backend\n```text\nhttp://localhost:3000 \n```\n2. For only backend and Swagger UI.\n```text\nhttp://localhost:8000/docs\n```\nExample response:\n\n```json\n{\n  \"prediction\": \"generous\",\n  \"processed_at\": \"10-02-2026T07:30:21S\",\n  \"latency_ms\": 0.04\n}\n```\n\n\n---\n\n## 🔧 Configuration\n\n* All hyperparameters stored in YAML files\n* Data paths, training parameters, and inference behavior configurable\n* Environment-agnostic (local or containerized)\n\n---\n\n## 🧠 Design Decisions \u0026 Trade-offs\n\n* **Why Dachine Learning?**\n  Beause tree-based models perform well on tabular data, so neural networks are not chosen to practice model abstraction, extensibility, and deployment workflows.\n\n* **Why config-driven pipelines?**\n  To separate experimentation from code changes and improve reproducibility.\n\n* **Why both CLI and scripts?**\n  CLI serves developers; scripts support automation and CI.\n\n---\n\n## Future Improvements\n  * Model monitoring \u0026 drift detection\n  * Cloud deployment\n\n---\n\n## 🧠 Key Learnings\n\n* ML systems should be designed as maintainable software\n* Testing pipelines prevents silent failures\n* Separation of training and inference is critical\n\n---\n\n## 📜 CI \u0026 Automation\n\n* GitHub Actions pipeline:\n  * Runs tests on push\n  * Ensures build stability\n* Docker build validation included\n\n---\n\n## 📬 Contact\n\n**Author:** Mohsin Raza\n**Target Role:** Machine Learning Engineer / AI Engineer\n**GitHub:** [github/mohsinraza2999](https://github.com/mohsinraza2999)\n**LinkedIn:** *[linkedin/mohsin-raza](https://www.linkedin.com/in/mohsin-raza-b7ab73328)*","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohsinraza2999%2Fgenerous-tipper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmohsinraza2999%2Fgenerous-tipper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohsinraza2999%2Fgenerous-tipper/lists"}