Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/niravjoshi33/news_crunch

App to scrape articles data and display in single page
https://github.com/niravjoshi33/news_crunch

data-mining data-science gui webscraping

Last synced: 9 days ago
JSON representation

App to scrape articles data and display in single page

Awesome Lists containing this project

README

        





Logo

News Crunch


An app to scrape data from news websites and display the articles in web GUI.


Explore the docs »




View Demo
·
Report Bug
·
Request Feature


Table of Contents



  1. About The Project



  2. Getting Started


  3. Usage

  4. To Do

  5. Contributing

  6. Contact

  7. Acknowledgments

## About The Project

![Product Name Screen Shot](https://github.com/NiravJoshi33/news_crunch/blob/main/app_screenshot.png)

This is an app that scrapes news article details such as title, date, auther etc. from different news websites, processes the data and shows on a single page.

This app is inspired by [inshorts](https://m.inshorts.com/en/read)

(back to top)

### Built With

[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)

(back to top)

## Getting Started

### Prerequisites & Installation

Before starting, please make sure that following dependencies are installed on your machine
* [Python](https://www.python.org/downloads/)
* [pip](https://pypi.org/project/pip/)

After the above dependecies are installed, follow below instructions:
* Clone the repo
Clone the repo
```sh
git clone https://github.com/NiravJoshi33/news_crunch.git
```
* Navigate to the project folder using CLI
* Install other dependecies with following command
````
pip install -r requirements.txt
````
Wait for the packages to be installed.

## How to Use

Follow the below instructions to run the project

* Run following script
```
main.py
```
* After the script has run, browser should open and display a GUI. In case, it doesn't open, open it manually and open following url
```
http://localhost:8501
```
* By default, a side bar will load with the page. From there, you can deselect any website you don't want to see news from and use the slider to select the number of articles to show.

## To Do

- [ ] Resolve Major Bugs with the current Basic Version
- [X] Run the app with the single script
- [X] OpenSSL Error occuring sometimes
- [X] Clean Data before showing in GUI
- [X] Dates from all websites in same format
- [ ] Inconsistent card size due to different size of thumbs and excerpts
- [ ] Test app on macOS
- [ ] Data storage and access from an online database

See the [open issues](https://github.com/NiravJoshi33/news_crunch/issues) for a full list of proposed features (and known issues).

(back to top)

## Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
Don't forget to give the project a star! Thanks again!

1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

(back to top)

## Contact

Nirav Joshi \
Email - [email protected] \
Project Link: [https://github.com/NiravJoshi33/news_crunch](https://github.com/NiravJoshi33/news_crunch)

(back to top)

## Acknowledgments

* [Scrapy Course - Python Web Scraping for Beginners](https://www.youtube.com/watch?v=mBoX_JCKZTE&pp=ygUNc2NyYXB5IGNvdXJzZQ%3D%3D) by freecodecamp.org
* [Python Streamlit Full Course](https://www.youtube.com/watch?v=RjiqbTLW9_E&list=PLa6CNrvKM5QU7AjAS90zCMIwi9RTFNIIW)
* [Best-README-Template](https://github.com/othneildrew/Best-README-Template) by [Othneil Drew](https://github.com/othneildrew)
* Awesome community on [stackoverflow](https://stackoverflow.com/)
* [ChatGPT](https://chat.openai.com/) by [OpenAI](https://openai.com/) for some Debugging

(back to top)

[contributors-shield]: https://img.shields.io/github/contributors/NiravJoshi33/news_crunch.svg?style=for-the-badge
[contributors-url]: https://github.com/NiravJoshi33/news_crunch/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/NiravJoshi33/news_crunch.svg?style=for-the-badge
[forks-url]: https://github.com/NiravJoshi33/news_crunch/network/members
[stars-shield]: https://img.shields.io/github/stars/NiravJoshi33/news_crunch.svg?style=for-the-badge
[stars-url]: https://github.com/NiravJoshi33/news_crunch/stargazers
[issues-shield]: https://img.shields.io/github/issues/NiravJoshi33/news_crunch.svg?style=for-the-badge
[issues-url]: https://github.com/NiravJoshi33/news_crunch/issues
[license-shield]: https://img.shields.io/github/license/NiravJoshi33/news_crunch.svg?style=for-the-badge
[license-url]: https://github.com/NiravJoshi33/news_crunch/blob/master/LICENSE.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/linkedin_username
[product-screenshot]: images/screenshot.png
[Next.js]: https://img.shields.io/badge/next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white
[Next-url]: https://nextjs.org/
[React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB
[React-url]: https://reactjs.org/
[Vue.js]: https://img.shields.io/badge/Vue.js-35495E?style=for-the-badge&logo=vuedotjs&logoColor=4FC08D
[Vue-url]: https://vuejs.org/
[Angular.io]: https://img.shields.io/badge/Angular-DD0031?style=for-the-badge&logo=angular&logoColor=white
[Angular-url]: https://angular.io/
[Svelte.dev]: https://img.shields.io/badge/Svelte-4A4A55?style=for-the-badge&logo=svelte&logoColor=FF3E00
[Svelte-url]: https://svelte.dev/
[Laravel.com]: https://img.shields.io/badge/Laravel-FF2D20?style=for-the-badge&logo=laravel&logoColor=white
[Laravel-url]: https://laravel.com
[Bootstrap.com]: https://img.shields.io/badge/Bootstrap-563D7C?style=for-the-badge&logo=bootstrap&logoColor=white
[Bootstrap-url]: https://getbootstrap.com
[JQuery.com]: https://img.shields.io/badge/jQuery-0769AD?style=for-the-badge&logo=jquery&logoColor=white
[JQuery-url]: https://jquery.com