{"id":18553533,"url":"https://github.com/quantumudit/test-store-data-analysis","last_synced_at":"2026-04-11T02:48:30.756Z","repository":{"id":220859721,"uuid":"752763796","full_name":"quantumudit/Test-Store-Data-Analysis","owner":"quantumudit","description":"This repository showcases a web scraper with a pipeline structure for efficient data extraction and transformation from websites. The tool can be tailored to leverage its capabilities for insightful data analysis, providing valuable insights and informed decision-making.","archived":false,"fork":false,"pushed_at":"2024-02-05T10:36:01.000Z","size":493,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-15T11:50:03.484Z","etag":null,"topics":["data","data-visualization","dataanalytics","python","python-webscraping","webscraper","webscraping-data"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/quantumudit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-04T18:32:47.000Z","updated_at":"2024-02-04T20:21:05.000Z","dependencies_parsed_at":"2024-12-26T08:42:17.201Z","dependency_job_id":"bfb7f83a-2c76-4f47-a226-9682c5faad4a","html_url":"https://github.com/quantumudit/Test-Store-Data-Analysis","commit_stats":null,"previous_names":["quantumudit/test-store-data-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/quantumudit/Test-Store-Data-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FTest-Store-Data-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FTest-Store-Data-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FTest-Store-Data-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FTest-Store-Data-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/quantumudit","download_url":"https://codeload.github.com/quantumudit/Test-Store-Data-Analysis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quantumudit%2FTest-Store-Data-Analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265786687,"owners_count":23828315,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-visualization","dataanalytics","python","python-webscraping","webscraper","webscraping-data"],"created_at":"2024-11-06T21:17:27.355Z","updated_at":"2026-04-11T02:48:25.735Z","avatar_url":"https://github.com/quantumudit.png","language":"Jupyter Notebook","readme":"\u003c!-- # ![Project Logo][project_logo] --\u003e\n# Test Store Data Analysis\n\n---\n\n\u003ch4 align=\"center\"\u003eEmpowering users to scrape the products data from \u003ca href=\"https://gopher1.extrkt.com/\" target=\"_blank\"\u003eJohn's Test Store\u003c/a\u003e website. \n\n\u003c!-- This web application, developed with \u003ca href=\"https://www.python.org/\" target=\"_blank\"\u003ePython\u003c/a\u003e and \u003ca href=\"https://streamlit.io/\" target=\"_blank\"\u003eStreamlit\u003c/a\u003e, streamlines the process of downloading books that match their preferences.\u003c/h4\u003e --\u003e\n\n\u003cp align='center'\u003e\n\u003cimg src=\"https://forthebadge.com/images/badges/built-with-love.svg\" alt=\"built-with-love\" border=\"0\"\u003e\n\u003cimg src=\"https://forthebadge.com/images/badges/powered-by-coffee.svg\" alt=\"powered-by-coffee\" border=\"0\"\u003e\n\u003cimg src=\"https://forthebadge.com/images/badges/cc-nc-sa.svg\" alt=\"cc-nc-sa\" border=\"0\"\u003e\n\u003c/p\u003e\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#overview\"\u003eOverview\u003c/a\u003e •\n  \u003ca href=\"#prerequisites\"\u003ePrerequisites\u003c/a\u003e •\n  \u003ca href=\"#architecture\"\u003eArchitecture\u003c/a\u003e •\n  \u003ca href=\"#demo\"\u003eDemo\u003c/a\u003e •\n  \u003ca href=\"#support\"\u003eSupport\u003c/a\u003e •\n  \u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\n\u003c/p\u003e\n\n## Overview\n\nThe primary goal of this project revolves around the retrieval of comprehensive products data from the [John's Test Store][website_link] website and analyze it.\n\n\u003cp align='center'\u003e\n  \u003ca href=\"https://gopher1.extrkt.com/\"\u003e\n    \u003cimg src=\"./images/website_snippet.png\" alt=\"website-snippet\" style=\"0\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003c!-- The web application has been meticulously designed to cater to on-demand web scraping. In essence, it selectively extracts essential book information based on the user's specified choices regarding category, subject, and topic.\n\nOnce the user designates a category, the application promptly generates a list of associated subjects for the user to select from. Likewise, upon selecting a subject, the application dynamically populates a dropdown menu with relevant topics (if available).\n\n\u003cp align='center'\u003e\n  \u003ca href=\"https://ebooks-extractor-app.streamlit.app/\"\u003e\n    \u003cimg src=\"./images/webapp_image.png\" alt=\"webapp_image\" style=\"0\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nArmed with these three choices, users can effortlessly obtain their desired information in the form of a downloadable CSV file, simply by clicking the \"Get Data\" button. --\u003e\n\nThe project repository exhibits the following structure:\n\n```\nTest-Store-Data-Analysis/\n├── ⚙️.env\n├── 📜.gitignore\n├── ⚙️.pre-commit-config.yaml\n├── 🔑LICENSE\n├── 🐍main.py\n├── 🔒poetry.lock\n├── 📇pyproject.toml\n├── 📝README.md\n├── 🗒️requirements.txt\n├── 🐍setup.py\n├── 🐍template.py\n├── 📁.github\n│   └── 📂workflows\n│       └── 📃actions.yaml\n├── 📁conf\n│   └── 📃configs.yaml\n├── 📁data\n│   ├── 📂external\n│   │   ├── 📑products_link.csv\n│   │   └── 📑scraped_products.csv\n│   └── 📂processed\n│       └── 📑products.csv\n├── 📁images\n│   └── 🖼️topmate_featured.png\n├── 📁logs\n│   └── 🧾2024_02_04_02_44_21_PM.log\n├── 📁notebooks\n│   ├── 📙01_web_scraping_tests.ipynb\n│   └── 📙02_data_preprocessing.ipynb\n├── 📁reports\n│   └── .gitkeep\n└── 📁src\n    ├── 🐍constants.py\n    ├── 🐍exception.py\n    ├── 🐍logger.py\n    ├── 🐍__init__.py\n    ├── 📂components\n    │   ├── 🐍data_preprocessor.py\n    │   ├── 🐍link_extraction.py\n    │   └── 🐍product_scraper.py\n    ├── 📂pipelines\n    │   ├── 🐍stage_01_data_extraction.py\n    │   └── 🐍stage_02_data_preprocessor.py\n    └── 📂utils\n        └── 🐍basic_utils.py\n\n```\n\u003c!-- The Streamlit application is driven by two fundamental Python scripts:\n\n- **🐍[app.py][app]**: This script capitalizes on functions from the [scraper_functions.py][scraper_funcs] file, enabling seamless web scraping. Moreover, it stands as the cornerstone of the Streamlit application.\n\n- **🐍[scraper_functions.py][scraper_funcs]**: This file houses a collection of functions specifically designed for data extraction via web scraping techniques. --\u003e\n\n\n## Prerequisites\n\nTo fully grasp the concepts and processes involved in this project, it is recommended to have a solid understanding of the following skills:\n\n- Fundamental knowledge of Python \u0026 Modular coding\n- Familiarity with the Python libraries listed in the 🗒️[requirements.txt][requirements] file\n- Basic familiarity with data analytics and Power BI\n\nHaving these skills as a foundation will help to ensure a smooth and effective experience while working on this project.\n\n\u003e The selection of applications and their installation process may differ depending on personal preferences and computer configurations.\n\n## Architecture\n\n[CONTENT TO BE ADDED]\n\n\u003c!-- The architectural design of this project is transparent and can be readily comprehended with the assistance of the accompanying diagram illustrated below:\n\n![Process Architecture][process_workflow]\n\nThe project's architectural framework encompasses the following key steps:\n\n### User Interaction\n\nThe user initiates the process by selecting their desired category from the available options.\nBased on the chosen category, the web application dynamically scrapes and presents a list of related subjects for the user's selection.\n\nUpon subject selection, the web app proceeds to scrape topics associated with the selected subject (if available).\n\nThe user can then finalize their selection by choosing \"Get Data\"\n\n### Data Retrieval\n\nSubsequently, the web application conducts a comprehensive scraping operation to gather book-related information. This gathered data is then structured into a CSV file format.\n\n### User Output\n\nThe user is provided with a downloadable CSV file containing the acquired book data, facilitating easy access to the information they require. --\u003e\n\n\n## Demo\n\n[CONTENT TO BE ADDED]\n\n\u003c!-- The following illustration demonstrates the process of collecting data by providing necessary inputs to the web application: --\u003e\n\n\u003c!-- \u003cp align='center'\u003e\n  \u003ca href=\"https://ebooks-extractor-app.streamlit.app/\"\u003e\n    \u003cimg src=\"./images/webapp_graphic.gif\" alt=\"webapp-graphic\" style=\"0\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003e Access the web application by clicking here: **[Ebooks Extractor App][webapp_link]** --\u003e\n\n\n\n## Support\n\nIf you have any questions, concerns, or suggestions, feel free to reach out to me through any of the following channels:\n\n[![Linkedin Badge][linkedinbadge]][linkedin] [![Twitter Badge][twitterbadge]][twitter] [![Medium Badge][mediumbadge]][medium]\n\n\nIf you find my work valuable, you can show your appreciation by [buying me a coffee][buy_me_a_coffee]\n\n\u003ca href=\"https://www.buymeacoffee.com/quantumudit\" target=\"_blank\"\u003e\n\u003cimg src=\"https://i.ibb.co/9cyrq6m/buy-me-a-coffee.png\" alt=\"buy-me-a-coffee\" border=\"0\" width=\"170\" height=\"50\"\u003e\n\u003c/a\u003e\n\n## License\n\n\u003ca href = 'https://creativecommons.org/licenses/by-nc-sa/4.0/' target=\"_blank\"\u003e\n    \u003cimg src=\"https://i.ibb.co/mvmWGkm/by-nc-sa.png\" alt=\"by-nc-sa\" border=\"0\" width=\"88\" height=\"31\"\u003e\n\u003c/a\u003e\n\nThis license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.\n\n---\n\u003cp align='center'\u003e\n  \u003ca href=\"https://topmate.io/quantumudit\"\u003e\n    \u003cimg src=\"./images/topmate_featured.png\" alt=\"topmate-udit\" style=\"0\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n\u003c!-- Image Links --\u003e\n\n[project_logo]: ./images/ebooks_logo.png\n[process_workflow]: ./images/process_workflow.png\n\n\u003c!-- External Links --\u003e\n\n[website_link]: https://gopher1.extrkt.com/\n[webapp_link]: https://ebooks-extractor-app.streamlit.app/\n[requirements]: ./requirements.txt\n\n\n\u003c!-- Project Specific Links --\u003e\n\n[app]: ./app.py\n[scraper_funcs]: ./scraper_functions.py \n\n\u003c!-- Profile Links --\u003e\n\n[linkedin]: https://www.linkedin.com/in/uditkumarchatterjee/\n[twitter]: https://twitter.com/quantumudit\n[medium]: https://medium.com/@quantumudit\n[buy_me_a_coffee]: https://www.buymeacoffee.com/quantumudit\n\n\u003c!-- Shields Profile Links --\u003e\n\n[linkedinbadge]: https://img.shields.io/badge/-uditkumarchatterjee-0e76a8?style=flat\u0026labelColor=0e76a8\u0026logo=linkedin\u0026logoColor=white\n[twitterbadge]: https://img.shields.io/badge/-quantumudit-000000?style=flat\u0026labelColor=000000\u0026logo=x\u0026logoColor=white\u0026link=https://twitter.com/quantumudit\n[mediumbadge]: https://img.shields.io/badge/-quantumudit-02b875?style=flat\u0026labelColor=02b875\u0026logo=medium\u0026logoColor=white","funding_links":["https://www.buymeacoffee.com/quantumudit"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantumudit%2Ftest-store-data-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantumudit%2Ftest-store-data-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantumudit%2Ftest-store-data-analysis/lists"}