Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/quantumudit/ebooks-extractor-app
An efficient web scraping tool for gathering detailed book information from eBooks.com, with user-friendly selection options for categories, subjects, and topics, culminating in the generation of downloadable CSV files.
https://github.com/quantumudit/ebooks-extractor-app
python streamlit webapp websraping
Last synced: about 15 hours ago
JSON representation
An efficient web scraping tool for gathering detailed book information from eBooks.com, with user-friendly selection options for categories, subjects, and topics, culminating in the generation of downloadable CSV files.
- Host: GitHub
- URL: https://github.com/quantumudit/ebooks-extractor-app
- Owner: quantumudit
- License: other
- Created: 2023-10-14T17:55:17.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-17T09:20:31.000Z (about 1 year ago)
- Last Synced: 2023-10-18T09:13:01.016Z (about 1 year ago)
- Topics: python, streamlit, webapp, websraping
- Language: Python
- Homepage: https://ebooks-extractor-app.streamlit.app/
- Size: 3.46 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# ![Project Logo][project_logo]
---
Empowering users to access tailored book selections from the Ebooks website. This web application, developed with Python and Streamlit, streamlines the process of downloading books that match their preferences.
Overview •
Prerequisites •
Architecture •
Demo •
Support •
License## Overview
The primary goal of this project revolves around the retrieval of comprehensive book data from the [Ebooks][website_link] website.
The web application has been meticulously designed to cater to on-demand web scraping. In essence, it selectively extracts essential book information based on the user's specified choices regarding category, subject, and topic.
Once the user designates a category, the application promptly generates a list of associated subjects for the user to select from. Likewise, upon selecting a subject, the application dynamically populates a dropdown menu with relevant topics (if available).
Armed with these three choices, users can effortlessly obtain their desired information in the form of a downloadable CSV file, simply by clicking the "Get Data" button.
The project repository exhibits the following structure:
```
Ebooks-Extractor-App/
└─ 📁.streamlit/
├─ ⚙️config.toml
├─ 🐍app.py
├─ 🐍scraper_functions.py
├─ 🗒️readme.md
├─ 🗒️requirements.txt
├─ 📜.gitignore
├─ 🔑LICENSE
└─ 📁images/
├─ 🖼️books_image.jpg
├─ 🖼️ebooks_logo.png
├─ 🖼️process_workflow.png
├─ 🖼️webapp_graphic.gif
├─ 🖼️webapp_image.png
├─ 🖼️website_snippet.png
```
The Streamlit application is driven by two fundamental Python scripts:- **🐍[app.py][app]**: This script capitalizes on functions from the [scraper_functions.py][scraper_funcs] file, enabling seamless web scraping. Moreover, it stands as the cornerstone of the Streamlit application.
- **🐍[scraper_functions.py][scraper_funcs]**: This file houses a collection of functions specifically designed for data extraction via web scraping techniques.
## Prerequisites
To fully grasp the concepts and processes involved in this project, it is recommended to have a solid understanding of the following skills:
- Fundamental knowledge of Python, APIs, Streamlit
- Familiarity with the Python libraries listed in the 🗒️[requirements.txt][requirements] file
- Basic familiarity with browser developer toolsHaving these skills as a foundation will help to ensure a smooth and effective experience while working on this project.
> The selection of applications and their installation process may differ depending on personal preferences and computer configurations.
## Architecture
The architectural design of this project is transparent and can be readily comprehended with the assistance of the accompanying diagram illustrated below:
![Process Architecture][process_workflow]
The project's architectural framework encompasses the following key steps:
### User Interaction
The user initiates the process by selecting their desired category from the available options.
Based on the chosen category, the web application dynamically scrapes and presents a list of related subjects for the user's selection.Upon subject selection, the web app proceeds to scrape topics associated with the selected subject (if available).
The user can then finalize their selection by choosing "Get Data"
### Data Retrieval
Subsequently, the web application conducts a comprehensive scraping operation to gather book-related information. This gathered data is then structured into a CSV file format.
### User Output
The user is provided with a downloadable CSV file containing the acquired book data, facilitating easy access to the information they require.
## Demo
The following illustration demonstrates the process of collecting data by providing necessary inputs to the web application:
> Access the web application by clicking here: **[Ebooks Extractor App][webapp_link]**
## Support
If you have any questions, concerns, or suggestions, feel free to reach out to me through any of the following channels:
[![Linkedin Badge][linkedinbadge]][linkedin] [![Twitter Badge][twitterbadge]][twitter] [![Medium Badge][mediumbadge]][medium]
If you find my work valuable, you can show your appreciation by [buying me a coffee][buy_me_a_coffee]
## License
This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.
---
---
[project_logo]: ./images/ebooks_logo.png
[process_workflow]: ./images/process_workflow.png[website_link]: https://www.ebooks.com/
[webapp_link]: https://ebooks-extractor-app.streamlit.app/
[requirements]: ./requirements.txt[app]: ./app.py
[scraper_funcs]: ./scraper_functions.py[linkedin]: https://www.linkedin.com/in/uditkumarchatterjee/
[twitter]: https://twitter.com/quantumudit
[medium]: https://medium.com/@quantumudit
[buy_me_a_coffee]: https://www.buymeacoffee.com/quantumudit[linkedinbadge]: https://img.shields.io/badge/-uditkumarchatterjee-0e76a8?style=flat&labelColor=0e76a8&logo=linkedin&logoColor=white
[twitterbadge]: https://img.shields.io/badge/-quantumudit-000000?style=flat&labelColor=000000&logo=x&logoColor=white&link=https://twitter.com/quantumudit
[mediumbadge]: https://img.shields.io/badge/-quantumudit-02b875?style=flat&labelColor=02b875&logo=medium&logoColor=white