https://github.com/notshrirang/article-reader-app
This Python script is designed to extract structured data from various news articles. It utilizes web scraping techniques to extract information such as article titles and bodies from different news websites. The script supports multiple websites, and you can easily extend it to include more by adding functions for each website.
https://github.com/notshrirang/article-reader-app
beautifulsoup regex
Last synced: about 1 year ago
JSON representation
This Python script is designed to extract structured data from various news articles. It utilizes web scraping techniques to extract information such as article titles and bodies from different news websites. The script supports multiple websites, and you can easily extend it to include more by adding functions for each website.
- Host: GitHub
- URL: https://github.com/notshrirang/article-reader-app
- Owner: NotShrirang
- Created: 2024-02-02T12:12:49.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-05T11:29:56.000Z (over 2 years ago)
- Last Synced: 2025-02-11T12:36:29.837Z (over 1 year ago)
- Topics: beautifulsoup, regex
- Language: Python
- Homepage:
- Size: 12.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Article-Reader-App
This Python script is designed to extract structured data from various news articles. It utilizes web scraping techniques to extract information such as article titles and bodies from different news websites. The script supports multiple websites, and you can easily extend it to include more by adding functions for each website.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
## Prerequisites
- Python 3.x
- Required Python libraries (install via `pip install -r requirements.txt`):
- `requests`
- `bs4`
## Installation
1. Clone the repository:
```bash
git clone https://github.com/NotShrirang/Article-Reader-App
```
2. Navigate to the project directory:
```bash
cd Article-Reader-App
```
3. Install the required dependencies:
```bash
pip install -r requirements.txt
```
## Usage
Edit the config.json file to configure the list of news article URLs.
Run the main script:
```bash
python main.py
```
The extracted data will be saved as output.json in the project directory.
## Configuration
config.json: This file contains the configuration for the script. It includes a list of news article URLs that you want to extract data from. Add or remove URLs as needed.