{"id":18483919,"url":"https://github.com/computingvictor/yelp_stars","last_synced_at":"2025-05-13T20:37:24.159Z","repository":{"id":171077340,"uuid":"586606468","full_name":"ComputingVictor/Yelp_Stars","owner":"ComputingVictor","description":"Final project of the Machine Learning subject using the Yelp dataset to set a business case and create a predictive model","archived":false,"fork":false,"pushed_at":"2023-01-09T16:39:19.000Z","size":8167,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-16T21:41:38.284Z","etag":null,"topics":["business","cunef","jupyter","machine-learning","networkx","python","yelp-dataset"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ComputingVictor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-08T18:12:41.000Z","updated_at":"2023-09-18T09:22:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"f87b7cc5-ed18-4539-8b84-72c79bc18052","html_url":"https://github.com/ComputingVictor/Yelp_Stars","commit_stats":null,"previous_names":["computingvictor/yelp_stars"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ComputingVictor%2FYelp_Stars","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ComputingVictor%2FYelp_Stars/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ComputingVictor%2FYelp_Stars/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ComputingVictor%2FYelp_Stars/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ComputingVictor","download_url":"https://codeload.github.com/ComputingVictor/Yelp_Stars/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254021912,"owners_count":22001019,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["business","cunef","jupyter","machine-learning","networkx","python","yelp-dataset"],"created_at":"2024-11-06T12:37:53.321Z","updated_at":"2025-05-13T20:37:24.132Z","avatar_url":"https://github.com/ComputingVictor.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Yelp Stars Prediction\n\n\n\u003cp align=\"center\"\u003e\n\n![imagen_readme.jpeg](./images/readme_image.png)\n\n\u003c/p\u003e\n\n\n\n\n\n## Project\n\n\n\nThis is the final project for the subject Machine Learning of CUNEF Master´s in Data Science. The objective of this project is to study the [Yelp dataset](https://www.yelp.com/dataset/download), find business cases and to create predictive models.\n\n\n## Business Case\n\n\nThe business objective is through the information registered by the businesses on the platform and their characteristics, to be able to predict whether the average score they will achieve from the users will be high (\u003e=4 stars) or low (\u003c4 stars).\n\n\nTo achieve this, a preprocessing of the data has been carried out, where JSON files have been treated and exported as a parquet.  Then, an exploratory analysis of the data, the creation of pipelines to treat the selected variables according to the type of data, and testing of different models.\n\nFinally, we proceed to calculate the local and global explainability to obtain the importance of the variables, also we created a graph with a business case applicable to the set used in our model.\n\n## How to run the Project?\n\n\nTo run the project, you should install the environment writing in the shell:\n\npip3 install -r requirements.txt\n\nThen, you should download the Yelp dataset, extract and move it to the data/raw folder.\n\n## What did we use?\n\n\n- Python 3.9.13\n\n- Visual Studio Code\n\n- Jupyter Notebook\n\n- Networkx\n\n## Index\n\n\n\n0. Data Preprocessing\n\n1. EDA\n2. Feature Engineering\n3. Models\n\n    - Base Model (Dummy Model)\n    - Logistic Regression Lasso\n    - Random Forest\n    - Light GBM\n    - Support Vector Machine\n    - XGBoost\n\n\n4. Model Selection\n5. Interpretability\n6. Graph \n\n## Content of the repository\n\n\n\n- `data`:\n\n\t- raw: Documents downloaded from the source of the dataset.\n\n\t- processed: Data dictionay processed and data processed. \n\t\n    - maps: Map to load the stars numbers by state\n    \n    - graphs: Folder where the graph will be exported.\n\n\n- `images`: Pictures used in the differents notebooks.\n\n\n\n- `html`: Notebooks exported as html files.\n\n\n\n- `notebooks`: Notebooks of the project and functions .py files.\n\n\n\n- `models`: Pickles of the different models. \n\n- `env`: Requirements of the environment.\n\n\n\n## Authors\n\n\n\nVictor Viloria Vázquez \n\n- Email: victor.viloria@cunef.edu\n\n- LinkedIn: https://www.linkedin.com/in/vicviloria/\n\n\n\n\n\nAntonio Nogués Podadera:\n\n- Email: antonionpodadera@gmail.com\n\n- LinkedIn: https://www.linkedin.com/in/antonio-nogu%C3%A9s-podadera/\n\n\n\n## Project Link: \n\nhttps://github.com/ComputingVictor/Yelp_Stars\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomputingvictor%2Fyelp_stars","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcomputingvictor%2Fyelp_stars","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomputingvictor%2Fyelp_stars/lists"}