Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/niravjoshi33/news_crunch
App to scrape articles data and display in single page
https://github.com/niravjoshi33/news_crunch
data-mining data-science gui webscraping
Last synced: 9 days ago
JSON representation
App to scrape articles data and display in single page
- Host: GitHub
- URL: https://github.com/niravjoshi33/news_crunch
- Owner: NiravJoshi33
- Created: 2023-10-10T01:45:49.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-09T10:23:55.000Z (9 months ago)
- Last Synced: 2024-02-10T08:25:38.997Z (9 months ago)
- Topics: data-mining, data-science, gui, webscraping
- Language: Python
- Homepage:
- Size: 614 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
News Crunch
An app to scrape data from news websites and display the articles in web GUI.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
## About The Project
![Product Name Screen Shot](https://github.com/NiravJoshi33/news_crunch/blob/main/app_screenshot.png)
This is an app that scrapes news article details such as title, date, auther etc. from different news websites, processes the data and shows on a single page.
This app is inspired by [inshorts](https://m.inshorts.com/en/read)
### Built With
[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)
## Getting Started
### Prerequisites & Installation
Before starting, please make sure that following dependencies are installed on your machine
* [Python](https://www.python.org/downloads/)
* [pip](https://pypi.org/project/pip/)After the above dependecies are installed, follow below instructions:
* Clone the repo
Clone the repo
```sh
git clone https://github.com/NiravJoshi33/news_crunch.git
```
* Navigate to the project folder using CLI
* Install other dependecies with following command
````
pip install -r requirements.txt
````
Wait for the packages to be installed.## How to Use
Follow the below instructions to run the project
* Run following script
```
main.py
```
* After the script has run, browser should open and display a GUI. In case, it doesn't open, open it manually and open following url
```
http://localhost:8501
```
* By default, a side bar will load with the page. From there, you can deselect any website you don't want to see news from and use the slider to select the number of articles to show.## To Do
- [ ] Resolve Major Bugs with the current Basic Version
- [X] Run the app with the single script
- [X] OpenSSL Error occuring sometimes
- [X] Clean Data before showing in GUI
- [X] Dates from all websites in same format
- [ ] Inconsistent card size due to different size of thumbs and excerpts
- [ ] Test app on macOS
- [ ] Data storage and access from an online databaseSee the [open issues](https://github.com/NiravJoshi33/news_crunch/issues) for a full list of proposed features (and known issues).
## Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
Don't forget to give the project a star! Thanks again!1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request## Contact
Nirav Joshi \
Email - [email protected] \
Project Link: [https://github.com/NiravJoshi33/news_crunch](https://github.com/NiravJoshi33/news_crunch)## Acknowledgments
* [Scrapy Course - Python Web Scraping for Beginners](https://www.youtube.com/watch?v=mBoX_JCKZTE&pp=ygUNc2NyYXB5IGNvdXJzZQ%3D%3D) by freecodecamp.org
* [Python Streamlit Full Course](https://www.youtube.com/watch?v=RjiqbTLW9_E&list=PLa6CNrvKM5QU7AjAS90zCMIwi9RTFNIIW)
* [Best-README-Template](https://github.com/othneildrew/Best-README-Template) by [Othneil Drew](https://github.com/othneildrew)
* Awesome community on [stackoverflow](https://stackoverflow.com/)
* [ChatGPT](https://chat.openai.com/) by [OpenAI](https://openai.com/) for some Debugging[contributors-shield]: https://img.shields.io/github/contributors/NiravJoshi33/news_crunch.svg?style=for-the-badge
[contributors-url]: https://github.com/NiravJoshi33/news_crunch/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/NiravJoshi33/news_crunch.svg?style=for-the-badge
[forks-url]: https://github.com/NiravJoshi33/news_crunch/network/members
[stars-shield]: https://img.shields.io/github/stars/NiravJoshi33/news_crunch.svg?style=for-the-badge
[stars-url]: https://github.com/NiravJoshi33/news_crunch/stargazers
[issues-shield]: https://img.shields.io/github/issues/NiravJoshi33/news_crunch.svg?style=for-the-badge
[issues-url]: https://github.com/NiravJoshi33/news_crunch/issues
[license-shield]: https://img.shields.io/github/license/NiravJoshi33/news_crunch.svg?style=for-the-badge
[license-url]: https://github.com/NiravJoshi33/news_crunch/blob/master/LICENSE.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/linkedin_username
[product-screenshot]: images/screenshot.png
[Next.js]: https://img.shields.io/badge/next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white
[Next-url]: https://nextjs.org/
[React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB
[React-url]: https://reactjs.org/
[Vue.js]: https://img.shields.io/badge/Vue.js-35495E?style=for-the-badge&logo=vuedotjs&logoColor=4FC08D
[Vue-url]: https://vuejs.org/
[Angular.io]: https://img.shields.io/badge/Angular-DD0031?style=for-the-badge&logo=angular&logoColor=white
[Angular-url]: https://angular.io/
[Svelte.dev]: https://img.shields.io/badge/Svelte-4A4A55?style=for-the-badge&logo=svelte&logoColor=FF3E00
[Svelte-url]: https://svelte.dev/
[Laravel.com]: https://img.shields.io/badge/Laravel-FF2D20?style=for-the-badge&logo=laravel&logoColor=white
[Laravel-url]: https://laravel.com
[Bootstrap.com]: https://img.shields.io/badge/Bootstrap-563D7C?style=for-the-badge&logo=bootstrap&logoColor=white
[Bootstrap-url]: https://getbootstrap.com
[JQuery.com]: https://img.shields.io/badge/jQuery-0769AD?style=for-the-badge&logo=jquery&logoColor=white
[JQuery-url]: https://jquery.com