Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/manastiwari-ir/news-web-scraper
The News Scraper Web Application is a project designed to scrape and present the latest news headlines along with their associated images
https://github.com/manastiwari-ir/news-web-scraper
beautifulsoup4 cors flask javascript python
Last synced: 5 days ago
JSON representation
The News Scraper Web Application is a project designed to scrape and present the latest news headlines along with their associated images
- Host: GitHub
- URL: https://github.com/manastiwari-ir/news-web-scraper
- Owner: manasTiwari-IR
- Created: 2024-11-17T19:14:47.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-17T19:51:55.000Z (2 months ago)
- Last Synced: 2024-11-17T20:38:23.955Z (2 months ago)
- Topics: beautifulsoup4, cors, flask, javascript, python
- Language: Python
- Homepage: https://news-web-scraper-jade.vercel.app
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- Changelog: news_scrape.py
Awesome Lists containing this project
README
# News Scraper Web Application
## About the Project
The **News Scraper Web Application** is a project designed to scrape and present the latest news headlines along with their associated images. Users can specify a category (e.g., international news, business, innovation etc) to fetch relevant news. The application employs a Python-based backend for web scraping and a modern frontend interface for data presentation.This project bridges web scraping and interactive web development, allowing users to retrieve dynamic information from a third-party news source ( here bbc.com ) and display it in a visually appealing way.
## Key Features
- Scrapes news headlines, associated images, and additional metadata from a news website (e.g., BBC News).
- Provides the ability to filter news by category.
- Presents the scraped data through an interactive web interface.## Technologies Used
### Backend
- **Python**: The core language used for web scraping and data processing.
- **Flask**: A lightweight web framework used to create the backend API for the application.
- **BeautifulSoup**: A Python library for parsing HTML and extracting relevant data from web pages.
- **Requests**: A library for making HTTP requests to fetch web page content.### Frontend
- **HTML**: The structure of the web page.
- **CSS**: For styling the web interface.
- **JavaScript (Axios)**: To interact with the Flask backend and dynamically display news data.### Additional Tools
- **Flask-CORS**: Enables cross-origin requests, allowing the frontend and backend to communicate seamlessly.## Workflow
1. **User Input**: The user specifies a news category through the frontend.
2. **Backend Processing**: The Flask backend receives the request and uses the Requests and BeautifulSoup libraries to scrape relevant news from the specified website.
3. **Data Filtering and Saving**: Data is filtered to remove invalid or placeholder images.
4. **Frontend Display**: The data is sent to the frontend and displayed in a clean, user-friendly interface.## How It Works
### Backend
- **Flask Endpoint**: The `/scrape` endpoint receives a GET request with a query parameter specifying the news category.
- **Web Scraping**: Using BeautifulSoup, the backend extracts headlines, images, and paragraphs from the targeted news website.
- **Response**: The backend responds with a JSON object containing the filtered news data.### Frontend
- **Axios**: The frontend sends requests to the backend using the Axios and retrieves the scraped data.
- **Dynamic Rendering**: JavaScript dynamically renders the data into the webpage, providing a seamless user experience.## Challenges and Solutions
### CORS Issues
- **Challenge**: Cross-origin requests between the frontend and backend caused errors.
- **Solution**: Implemented `Flask-CORS` to allow requests from different origins.### Placeholder Images
- **Challenge**: Some scraped images were placeholders, reducing the quality of the output.
- **Solution**: Filtered out entries with missing or placeholder images.## Future Improvements
- Add support for scraping from multiple news sources.
- Implement user authentication and personalized news feeds.
- Enhance the web interface with frameworks like React or Angular.
- Use a database to store and retrieve historical news data.## Conclusion
The News Scraper Web Application demonstrates the power of combining web scraping with web development. It provides users with a practical and interactive tool for staying updated on the latest news in their areas of interest.