Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/quantumudit/test-store-data-analysis

This repository showcases a web scraper with a pipeline structure for efficient data extraction and transformation from websites. The tool can be tailored to leverage its capabilities for insightful data analysis, providing valuable insights and informed decision-making.
https://github.com/quantumudit/test-store-data-analysis

data data-visualization dataanalytics python python-webscraping webscraper webscraping-data

Last synced: 1 day ago
JSON representation

This repository showcases a web scraper with a pipeline structure for efficient data extraction and transformation from websites. The tool can be tailored to leverage its capabilities for insightful data analysis, providing valuable insights and informed decision-making.

Awesome Lists containing this project

README

        

# Test Store Data Analysis

---

Empowering users to scrape the products data from John's Test Store website.


built-with-love
powered-by-coffee
cc-nc-sa


Overview
Prerequisites
Architecture
Demo
Support
License

## Overview

The primary goal of this project revolves around the retrieval of comprehensive products data from the [John's Test Store][website_link] website and analyze it.



website-snippet

The project repository exhibits the following structure:

```
Test-Store-Data-Analysis/
├── ⚙️.env
├── 📜.gitignore
├── ⚙️.pre-commit-config.yaml
├── 🔑LICENSE
├── 🐍main.py
├── 🔒poetry.lock
├── 📇pyproject.toml
├── 📝README.md
├── 🗒️requirements.txt
├── 🐍setup.py
├── 🐍template.py
├── 📁.github
│ └── 📂workflows
│ └── 📃actions.yaml
├── 📁conf
│ └── 📃configs.yaml
├── 📁data
│ ├── 📂external
│ │ ├── 📑products_link.csv
│ │ └── 📑scraped_products.csv
│ └── 📂processed
│ └── 📑products.csv
├── 📁images
│ └── 🖼️topmate_featured.png
├── 📁logs
│ └── 🧾2024_02_04_02_44_21_PM.log
├── 📁notebooks
│ ├── 📙01_web_scraping_tests.ipynb
│ └── 📙02_data_preprocessing.ipynb
├── 📁reports
│ └── .gitkeep
└── 📁src
├── 🐍constants.py
├── 🐍exception.py
├── 🐍logger.py
├── 🐍__init__.py
├── 📂components
│ ├── 🐍data_preprocessor.py
│ ├── 🐍link_extraction.py
│ └── 🐍product_scraper.py
├── 📂pipelines
│ ├── 🐍stage_01_data_extraction.py
│ └── 🐍stage_02_data_preprocessor.py
└── 📂utils
└── 🐍basic_utils.py

```

## Prerequisites

To fully grasp the concepts and processes involved in this project, it is recommended to have a solid understanding of the following skills:

- Fundamental knowledge of Python & Modular coding
- Familiarity with the Python libraries listed in the 🗒️[requirements.txt][requirements] file
- Basic familiarity with data analytics and Power BI

Having these skills as a foundation will help to ensure a smooth and effective experience while working on this project.

> The selection of applications and their installation process may differ depending on personal preferences and computer configurations.

## Architecture

[CONTENT TO BE ADDED]

## Demo

[CONTENT TO BE ADDED]

## Support

If you have any questions, concerns, or suggestions, feel free to reach out to me through any of the following channels:

[![Linkedin Badge][linkedinbadge]][linkedin] [![Twitter Badge][twitterbadge]][twitter] [![Medium Badge][mediumbadge]][medium]

If you find my work valuable, you can show your appreciation by [buying me a coffee][buy_me_a_coffee]


buy-me-a-coffee

## License


by-nc-sa

This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.

---



topmate-udit

---

[project_logo]: ./images/ebooks_logo.png
[process_workflow]: ./images/process_workflow.png

[website_link]: https://gopher1.extrkt.com/
[webapp_link]: https://ebooks-extractor-app.streamlit.app/
[requirements]: ./requirements.txt

[app]: ./app.py
[scraper_funcs]: ./scraper_functions.py

[linkedin]: https://www.linkedin.com/in/uditkumarchatterjee/
[twitter]: https://twitter.com/quantumudit
[medium]: https://medium.com/@quantumudit
[buy_me_a_coffee]: https://www.buymeacoffee.com/quantumudit

[linkedinbadge]: https://img.shields.io/badge/-uditkumarchatterjee-0e76a8?style=flat&labelColor=0e76a8&logo=linkedin&logoColor=white
[twitterbadge]: https://img.shields.io/badge/-quantumudit-000000?style=flat&labelColor=000000&logo=x&logoColor=white&link=https://twitter.com/quantumudit
[mediumbadge]: https://img.shields.io/badge/-quantumudit-02b875?style=flat&labelColor=02b875&logo=medium&logoColor=white