https://github.com/notshrirang/news-web-scraper

This Python script is designed to scrape news data from Google, Yahoo, and Bing search engines for a list of companies. The scraped data is then saved into a CSV file.
https://github.com/notshrirang/news-web-scraper

beautifulsoup webscraping

Last synced: about 1 year ago
JSON representation

This Python script is designed to scrape news data from Google, Yahoo, and Bing search engines for a list of companies. The scraped data is then saved into a CSV file.

Host: GitHub
URL: https://github.com/notshrirang/news-web-scraper
Owner: NotShrirang
License: apache-2.0
Created: 2024-01-23T16:06:49.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-01-30T10:02:38.000Z (over 2 years ago)
Last Synced: 2025-02-11T12:36:31.672Z (over 1 year ago)
Topics: beautifulsoup, webscraping
Language: Python
Homepage:
Size: 155 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# News Web Scraper

This Python script is designed to scrape news data from Google, Yahoo, and Bing search engines for a list of companies. The scraped data is then saved into a CSV file.

## Table of Contents

- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [License](#license)

## Prerequisites

- Python 3.x
- Required Python libraries (install via `pip install -r requirements.txt`):
- `pandas`
- `bs4` (Beautiful Soup)
- `requests`
- `tqdm`

## Installation

1. Clone the repository:

```bash
git clone https://github.com/NotShrirang/News-Web-Scraper.git
```

2. Navigate to the project directory:

```bash
cd News-Web-Scraper
```

3. Install the required dependencies:

```bash
pip install -r requirements.txt
```

## Usage

1. Edit the `config.json` file to configure the companies, keywords, and search engines.

2. Run the main script:

```bash
python main.py
```

3. The scraped data will be saved as `news.csv` in the project directory.

## Configuration

- **config.json**: This file contains the configuration for the script. It includes the list of companies, keywords, search engines, and other parameters.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Authors

- Shrirang Mahajan
- Chirantan Degloorkar
- Aditi Mokashi
- Shivam Dandavate
- Sakshi Panhalkar

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/notshrirang/news-web-scraper

Awesome Lists containing this project

README