{"id":18553538,"url":"https://github.com/quantumudit/analyzing-yell-cafes","last_synced_at":"2026-05-02T03:08:35.297Z","repository":{"id":128744282,"uuid":"424968862","full_name":"quantumudit/Analyzing-Yell-Cafes","owner":"quantumudit","description":"This project focuses on scraping data related to cafes and coffee shops in London, England from the Yellow Pages (Yell.com) website; performing necessary transformations on the scraped data and then analyzing \u0026 visualizing it using Jupyter Notebook and Power BI.","archived":false,"fork":false,"pushed_at":"2021-12-10T19:53:57.000Z","size":1156,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-15T11:49:27.478Z","etag":null,"topics":["data-analysis","data-science","data-transformation","data-visualization","etl","jupyter-notebook","power-bi","python","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/quantumudit.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-05T14:04:08.000Z","updated_at":"2022-02-25T01:35:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"c9df7df4-2c25-4346-b243-d39fc3623a06","html_url":"https://github.com/quantumudit/Analyzing-Yell-Cafes","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/quantumudit/Analyzing-Yell-Cafes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FAnalyzing-Yell-Cafes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FAnalyzing-Yell-Cafes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FAnalyzing-Yell-Cafes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FAnalyzing-Yell-Cafes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/quantumudit","download_url":"https://codeload.github.com/quantumudit/Analyzing-Yell-Cafes/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FAnalyzing-Yell-Cafes/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32521132,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-02T01:12:54.858Z","status":"online","status_checked_at":"2026-05-02T02:00:05.923Z","response_time":132,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-science","data-transformation","data-visualization","etl","jupyter-notebook","power-bi","python","webscraping"],"created_at":"2024-11-06T21:17:29.790Z","updated_at":"2026-05-02T03:08:35.281Z","avatar_url":"https://github.com/quantumudit.png","language":"Python","funding_links":["https://www.patreon.com/quantumudit"],"categories":[],"sub_categories":[],"readme":"![Project Logo][project_logo]\n\n---\n\n\u003ch4 align=\"center\"\u003eScraping \u0026 Analyzing top café \u0026 coffee shops in London from \u003ca href=\"https://www.yell.com/\" target=\"_blank\"\u003eYell.com\u003c/a\u003e website with \u003ca href=\"https://en.wikipedia.org/wiki/Python_(programming_language)\" target=\"_blank\"\u003ePython\u003c/a\u003e and \u003ca href=\"https://en.wikipedia.org/wiki/Microsoft_Power_BI\" target=\"_blank\"\u003ePower BI\u003c/a\u003e\u003c/h4\u003e\n\n\u003cp align='center'\u003e\n\u003cimg src=\"https://i.ibb.co/KxfMMsP/built-with-love.png\" alt=\"built-with-love\" border=\"0\"\u003e\n\u003cimg src=\"https://i.ibb.co/MBDK1Pk/powered-by-coffee.png\" alt=\"powered-by-coffee\" border=\"0\"\u003e\n\u003cimg src=\"https://i.ibb.co/CtGqhQH/cc-nc-sa.png\" alt=\"cc-nc-sa\" border=\"0\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#overview\"\u003eOverview\u003c/a\u003e •\n  \u003ca href=\"#prerequisites\"\u003ePrerequisites\u003c/a\u003e •\n  \u003ca href=\"#architecture\"\u003eArchitecture\u003c/a\u003e •\n  \u003ca href=\"#demo\"\u003eDemo\u003c/a\u003e •\n  \u003ca href=\"#support\"\u003eSupport\u003c/a\u003e •\n  \u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\n\u003c/p\u003e\n\n## Overview\n\nThis project focuses on scraping top café \u0026 coffee shops in London and their associated metrics from [Yell.com][website_link], performing exploratory data analysis to generate insights and visualize them with the help of Power BI.\n\nThe repository directory structure is as follows:\n\nAnalyzing-Yell-Cafes\u003cbr\u003e\n├─ 01_WEBSCRAPING\u003cbr\u003e\n├─ 02_ETL\u003cbr\u003e\n├─ 03_DATA\u003cbr\u003e\n├─ 04_ANALYSIS\u003cbr\u003e\n├─ 05_DASHBOARD\u003cbr\u003e\n├─ 06_RESOURCES\u003cbr\u003e\n\nThe type of content present in the directories is as follows:\n\n**01_WEBSCRAPING**\n\nThis directory contains the python script to scrape data from the website along with flat file that has the scraped data.\n\n**02_ETL**\n\nThis directory contains the ETL script that takes the scraped dataset as input, transforms it and exports an analysis-ready dataset into the _03_DATA_ directory.\n\n**03_DATA**\n\nThis directory contains the data that can be directly used for exploratory data analysis and data visualization purposes.\n\n**04_ANALYSIS**\n\nThis directory contains the python notebooks that analyzes the clean dataset to generate insights\n\n**05_DASHBOARD**\n\nThis directory contains the python notebook with an embedded Power BI report that visualizes the data. The Power BI dashboard contains slicers, cross-filtering and other advance capabilities that end user can play with to visualize a specific facet of the data or, to get additional insights.\n\n**06_RESOURCES**\n\nThis directory contains images, icons, layouts, etc. that are used in this project\n\n## Prerequisites\n\nThe major skills that are required as prerequisite to fully understand this project are as follows:\n\n- Basics of Python\n- Python libraries: Requests-HTML, Pandas, DateTime, concurrent_futures\n- Basics of Python Notebooks\n- Basics of Power BI\n\nIn order to complete the project, I've used the following applications and libraries\n\n- Python\n- Python libraries mentioned in requirements.txt file\n- Jupyter Notebook\n- Visual Studio Code\n- Microsoft Power BI\n\n\u003e The choice of applications \u0026 their installation might vary based on individual preferences \u0026 system settings.\n\n## Architecture\n\nThe project architecture is quite straight forward and can be explained through the below image:\n\n![Process Architecture][process_workflow]\n\nAs per the above workflow suggests; we are first scraping the data from the website using the Python script and collecting the same in a flat file which is then processed and cleaned with another ETL specific Python script.\n\nFinally; we leverage the clean \u0026 analysis-ready dataset for some exploratory data analysis (EDA) using Jupyter Notebook and creating an insightful report using Power BI\n\n## Demo\n\nThe below graphic shows scraping of data from the website:\n\n![Scraping Graphic][scraping_graphic]\n\nFrom this graphic we can observe a significant reduce in the time of scraping due to the use of multi-threading.\n\n## Support\n\nIf you have any doubts, queries or, suggestions then, please connect with me in any of the following platforms:\n\n[![Linkedin Badge][linkedinbadge]][linkedin] [![Twitter Badge][twitterbadge]][twitter]\n\nIf you like my work then, you may support me at Patreon:\n\n\u003ca href=\"https://www.patreon.com/quantumudit\" target=\"_blank\"\u003e\n\u003cimg src=\"https://i.ibb.co/94bkJwp/become-a-patreon.png\" alt=\"become-a-patreon\" border=\"0\" width=\"170\" height=\"50\"\u003e\n\u003c/a\u003e\n\n## License\n\n\u003ca href = 'https://creativecommons.org/licenses/by-nc-sa/4.0/' target=\"_blank\"\u003e\n    \u003cimg src=\"https://i.ibb.co/mvmWGkm/by-nc-sa.png\" alt=\"by-nc-sa\" border=\"0\" width=\"88\" height=\"31\"\u003e\n\u003c/a\u003e\n\nThis license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.\n\n\u003c!-- Image Links --\u003e\n\n[project_logo]: 06_RESOURCES/project_cover_image.png\n[process_workflow]: 06_RESOURCES/process_architecture.png\n[scraping_graphic]: 06_RESOURCES/scraping_graphic.gif\n\n\u003c!-- External Links --\u003e\n\n[website_link]: https://www.yell.com/ucs/UcsSearchAction.do?keywords=Cafes+And+Coffee+Shops\u0026location=london\u0026scrambleSeed=850094717\u0026pageNum=1\n\n\u003c!-- Profile Links --\u003e\n\n[linkedin]: https://www.linkedin.com/in/uditkumarchatterjee/\n[twitter]: https://twitter.com/quantumudit\n\n\u003c!-- Shields Profile Links --\u003e\n\n[linkedinbadge]: https://img.shields.io/badge/-uditkumarchatterjee-0e76a8?style=flat\u0026labelColor=0e76a8\u0026logo=linkedin\u0026logoColor=white\n[twitterbadge]: https://img.shields.io/badge/-@quantumudit-1ca0f1?style=flat\u0026labelColor=1ca0f1\u0026logo=twitter\u0026logoColor=white\u0026link=https://twitter.com/quantumudit\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantumudit%2Fanalyzing-yell-cafes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantumudit%2Fanalyzing-yell-cafes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantumudit%2Fanalyzing-yell-cafes/lists"}