{"id":37510128,"url":"https://github.com/5umitpandey/iit_madras_dataverse","last_synced_at":"2026-01-16T08:01:20.054Z","repository":{"id":330703072,"uuid":"1117938524","full_name":"5umitpandey/IIT_Madras_Dataverse","owner":"5umitpandey","description":"This project presents an end to end data analysis and market intelligence framework built on a global food restaurant dataset.","archived":false,"fork":false,"pushed_at":"2026-01-01T05:14:51.000Z","size":4998,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-05T23:35:47.767Z","etag":null,"topics":["dataverse","eda","hackathon","iit","iitmadras","zomato"],"latest_commit_sha":null,"homepage":"https://unstop.com/competitions/dataverse-the-business-analytics-challenge-iit-madras-1586925","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/5umitpandey.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-17T03:16:09.000Z","updated_at":"2026-01-01T05:14:54.000Z","dependencies_parsed_at":"2025-12-29T04:00:28.883Z","dependency_job_id":null,"html_url":"https://github.com/5umitpandey/IIT_Madras_Dataverse","commit_stats":null,"previous_names":["5umitpandey/iit_madras_dataverse"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/5umitpandey/IIT_Madras_Dataverse","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5umitpandey%2FIIT_Madras_Dataverse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5umitpandey%2FIIT_Madras_Dataverse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5umitpandey%2FIIT_Madras_Dataverse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5umitpandey%2FIIT_Madras_Dataverse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/5umitpandey","download_url":"https://codeload.github.com/5umitpandey/IIT_Madras_Dataverse/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5umitpandey%2FIIT_Madras_Dataverse/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478047,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataverse","eda","hackathon","iit","iitmadras","zomato"],"created_at":"2026-01-16T08:00:46.383Z","updated_at":"2026-01-16T08:01:19.929Z","avatar_url":"https://github.com/5umitpandey.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IIT Madras | Dataverse Hackathon\n\n![6930a51ac4598_dataverse-the-business-analytics-challenge](https://github.com/user-attachments/assets/b1ee218d-2a35-4ba5-a14e-2adbdac20e6c)\n\n\n# Food Restaurant Data Intelligence \u0026 Market Strategy Analysis\n\nThis project presents an end to end data analysis and market intelligence framework built on a global food restaurant dataset.  \nThe objective is to uncover strategic insights across markets, cuisines, pricing, engagement, and growth opportunities using data driven reasoning.\n\nThe analysis was developed as part of a hackathon submission and focuses on **business impact**, not just exploratory analysis.\n\n---\n\n## Dataset Overview\n\nThe dataset represents a large scale view of restaurant ecosystems across multiple countries and cities.\n\n### Key Statistics\n\n- **Total restaurants**: 9,551  \n- **Countries covered**: 15  \n- **Cities covered**: 141  \n- **Restaurants in India**: 8,652  \n- **India share**: 90.6 percent  \n- **Rated restaurants**: 7,403  \n- **Rating coverage**: 77.5 percent  \n- **Final feature dimensions**: 22  \n\n![Restaurants By Countries](https://github.com/5umitpandey/IIT_Madras_Dataverse/blob/main/Images/Rest_By_Country.png)\n\nThis immediately establishes India as the dominant market in both scale and engagement.\n\n---\n\n## Strategic Market Positioning\n\nIndia is the core driver of platform activity.\n\nMore than 90 percent of restaurants belong to India, which means:\n- Any improvement in ranking, pricing, discovery, or quality directly impacts the majority of users\n- India is the most effective market for experimentation and optimization\n- Non India markets are better suited for niche and premium strategies\n\n![India Density Map](https://github.com/5umitpandey/IIT_Madras_Dataverse/blob/main/Images/India_dense_map.png)\n\n**Conclusion**  \nIndia should be treated as the primary experimentation and monetization market.\n\n---\n\n## Data Cleaning \u0026 Standardization\n\nStrong insights depend on clean data. The following steps were applied to ensure reliability.\n\n### Country and Currency Fixes\n\n- Numeric country codes were mapped to country names\n- Zero missing values after mapping\n- Incorrect Philippines currency was fixed\n- All costs were standardized to INR for fair comparison\n\n### Column Optimization\n\nRemoved non informative columns including:\n- Locality Verbose  \n- Rating color  \n- Rating text  \n- Switch to order menu  \n- Country code  \n\n**Business Impact**  \nStandardized pricing and clean features enable reliable pricing intelligence and cross market analysis.\n\n---\n\n## Cross Country Cost Structure Analysis\n\nAverage cost for two was analyzed across countries after INR normalization.\n\n### Market Segmentation\n\n- **India**  \n  - Average cost around 623 INR  \n  - Large volume market  \n  - Best suited for scale driven growth  \n\n- **United States and emerging markets**  \n  - Average cost between 2,222 and 2,353 INR  \n  - Medium scale  \n  - Suitable for feature testing  \n\n- **UAE and United Kingdom**  \n  - Average cost between 4,113 and 5,785 INR  \n  - Small sample size  \n  - Premium positioning markets  \n\n![avg_cost_country](https://github.com/5umitpandey/IIT_Madras_Dataverse/blob/main/Images/avg_cost_country.png)\n\n**Insight**  \nNon India markets have limited samples and should focus on premium and niche strategies rather than scale.\n\n---\n\n## Cuisine Intelligence \u0026 Portfolio Strategy\n\nRestaurants serve multiple cuisines stored as comma separated values.  \nA hybrid feature engineering approach was used to preserve restaurant identity while enabling cuisine level analysis.\n\n### Feature Engineering Flow\n\n1. Original dataset with 9,551 restaurants  \n2. Controlled cuisine explosion to 19,710 cuisine rows  \n3. Validation of 145 unique cuisines  \n\n---\n\n## Popularity vs Satisfaction Gap\n\n### Most Popular Cuisines\n\n- North Indian: 3,960 restaurants  \n- Chinese: 2,735 restaurants  \n- Fast Food: 1,986 restaurants  \n\n### Highest Rated Cuisines with scale\n\n- Italian: 3.56 average rating  \n- Continental: 3.52 average rating \n- Cafe: 3.32 average rating  \n\n### Popular but Underperforming\n\n- North Indian: 2.51 rating  \n- Chinese: 2.62 rating  \n- Fast Food: 2.56 rating  \n\n**Key Insight**  \nPopularity does not imply satisfaction.\n\nHighly popular cuisines require quality standardization, while high rated cuisines should be boosted in discovery systems.\n\n![cuisine_rating_comparison](https://github.com/5umitpandey/IIT_Madras_Dataverse/blob/main/Images/AvgRating_VS_NoOfRest.png)\n\n---\n\n## Menu Structure \u0026 Engagement Dynamics\n\nMenu diversity plays a critical role in engagement and ratings.\n\n### Cuisine Count Distribution\n\n![Cusine Count By Resturants](https://github.com/5umitpandey/IIT_Madras_Dataverse/blob/main/Images/cusineperrest.png)\n\n### Engagement Patterns\n\n- High engagement cuisines consistently exceed 3.5 rating  \n- Low engagement cuisines cluster below 3.0 rating  \n\n**Recommendation**  \nRecommendation systems should balance popularity with engagement metrics, not popularity alone.\n\n---\n\n## City Level Dynamics in India\n\n### High Volume Cities\n\n![city_visibility_gap](https://github.com/5umitpandey/IIT_Madras_Dataverse/blob/main/Images/Unrated_Vs_City.png)\n\n**Insight**  \nThis is a discovery problem, not a demand problem.  \nUser engagement exists but ratings are not being captured.\n\n---\n\n## Locality Level Quality Variation\n\nCity level strategies are too coarse.\n\n### Example Localities\n\n- Connaught Place  \n  - 3.69 average rating  \n\n- Rajouri Garden  \n  - 3.59 average rating  \n\n- Shahdara  \n  - 1.41 average rating  \n\n\n**Business Impact**  \nLocality specific onboarding, audits, and promotions deliver higher ROI than city wide actions.\n\n---\n\n## 📈 Pricing Dynamics \u0026 Rating Drivers\n\n### Model Performance\n\n- RMSE: 0.41 on a 5 point scale  \n- Strong predictive accuracy  \n\n### Correlations\n\n- Price range vs rating: 0.44  \n- Votes vs rating: 0.31  \n\n### Feature Importance\n\n- Votes: 71 percent  \n- Average cost for two: 20 percent  \n- Online delivery: 4 percent  \n- Price range: 3 percent  \n- Table booking: 2 percent  \n\n![feature_importance](https://github.com/5umitpandey/IIT_Madras_Dataverse/blob/main/Images/Feature_Importance.png)\n\n**Strategic Conclusion**  \nEngagement is the single strongest driver of ratings.  \nInvestment in engagement tools outperforms cosmetic feature additions.\n\n---\n\n### Service Availability Impact on Restaurant Ratings\n\nThis section analyzes how **online delivery** and **table booking** features influence restaurant ratings and customer engagement.\nThe visuals compare rating distributions for restaurants with and without these services, highlighting their relative impact.\n\n\n| Online Delivery Impact | Table Booking Impact |\n|------------------------|----------------------|\n| ![online_delivery_impact](https://github.com/5umitpandey/IIT_Madras_Dataverse/blob/main/Images/Ratings_vs_online_delivery.png) | ![table_booking_impact](https://github.com/5umitpandey/IIT_Madras_Dataverse/blob/main/Images/ratings_Vs_Table.png) |\n\n---\n\n\n\n## Hidden High Quality Growth Opportunities\n\nCertain cuisines show strong ratings with limited competition.\n\n### High Potential Cuisines\n\n| Cuisine          | Restaurant Count | Average Rating |\n|------------------|------------------|----------------|\n| Sandwich         | 53               | 4.086038       |\n| Steak            | 62               | 3.985484       |\n| Sushi            | 75               | 3.973333       |\n| Breakfast        | 41               | 3.965854       |\n| Mediterranean    | 112              | 3.948214       |\n| Bar Food         | 39               | 3.933333       |\n| Indian           | 70               | 3.918571       |\n| European         | 148              | 3.910811       |\n| BBQ              | 33               | 3.903030       |\n| Seafood          | 174              | 3.862069       |\n\n\n**Strategy**  \nThese cuisines are ideal for premium discovery, editorial promotion, and supply expansion.\n\n---\n\n## Final Takeaways\n\n- India is the core growth and experimentation market  \n- Engagement drives ratings more than pricing or features  \n- Popular cuisines need quality improvement  \n- High quality niche cuisines need visibility  \n- Locality level actions outperform city level strategies  \n\n---\n\n## Notebooks and Code\n\nAll notebooks used for data cleaning, feature engineering, modeling, and analysis are available here:\n\n🔗 **Drive Link** : https://drive.google.com/drive/folders/1KS_0ilJyk-KP8BpQ8iAdJ--b2_5zZUYC\n\n---\n\n## Team\n\n**Team Name**: ASHSUM\n\u003cbr\u003e\n**Team Members**:  \n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003e\n\u003cimg src=\"https://github.com/ashir1s.png\" width=\"100px;\" alt=\"Ashirwad Sinha\"/\u003e\u003cbr/\u003e\n\u003ca href=\"https://github.com/ashir1s\"\u003eAshiwad Sinha\u003c/a\u003e\n\n\u003c/td\u003e\n\n\u003ctd align=\"center\"\u003e\n\u003cimg src=\"https://github.com/5umitpandey.png\" width=\"100px;\" alt=\"Sumit Pandey\"/\u003e\u003cbr/\u003e\n\u003ca href=\"https://github.com/5umitpandey\"\u003eSumit Pandey\u003c/a\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\nThis project was built as a hackathon submission with a strong focus on real world business decision making using data.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F5umitpandey%2Fiit_madras_dataverse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F5umitpandey%2Fiit_madras_dataverse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F5umitpandey%2Fiit_madras_dataverse/lists"}