https://github.com/notshrirang/news-web-scraper
This Python script is designed to scrape news data from Google, Yahoo, and Bing search engines for a list of companies. The scraped data is then saved into a CSV file.
https://github.com/notshrirang/news-web-scraper
beautifulsoup webscraping
Last synced: about 1 year ago
JSON representation
This Python script is designed to scrape news data from Google, Yahoo, and Bing search engines for a list of companies. The scraped data is then saved into a CSV file.
- Host: GitHub
- URL: https://github.com/notshrirang/news-web-scraper
- Owner: NotShrirang
- License: apache-2.0
- Created: 2024-01-23T16:06:49.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-30T10:02:38.000Z (over 2 years ago)
- Last Synced: 2025-02-11T12:36:31.672Z (over 1 year ago)
- Topics: beautifulsoup, webscraping
- Language: Python
- Homepage:
- Size: 155 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# News Web Scraper
This Python script is designed to scrape news data from Google, Yahoo, and Bing search engines for a list of companies. The scraped data is then saved into a CSV file.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [License](#license)
## Prerequisites
- Python 3.x
- Required Python libraries (install via `pip install -r requirements.txt`):
- `pandas`
- `bs4` (Beautiful Soup)
- `requests`
- `tqdm`
## Installation
1. Clone the repository:
```bash
git clone https://github.com/NotShrirang/News-Web-Scraper.git
```
2. Navigate to the project directory:
```bash
cd News-Web-Scraper
```
3. Install the required dependencies:
```bash
pip install -r requirements.txt
```
## Usage
1. Edit the `config.json` file to configure the companies, keywords, and search engines.
2. Run the main script:
```bash
python main.py
```
3. The scraped data will be saved as `news.csv` in the project directory.
## Configuration
- **config.json**: This file contains the configuration for the script. It includes the list of companies, keywords, search engines, and other parameters.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Authors
- Shrirang Mahajan
- Chirantan Degloorkar
- Aditi Mokashi
- Shivam Dandavate
- Sakshi Panhalkar