https://github.com/oxylabs/news-scraping

A tutorial for scraping news
https://github.com/oxylabs/news-scraping

news news-scraper web-scraping

Last synced: 2 months ago
JSON representation

A tutorial for scraping news

Host: GitHub
URL: https://github.com/oxylabs/news-scraping
Owner: oxylabs
Created: 2022-02-28T11:59:29.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2025-06-26T08:23:24.000Z (4 months ago)
Last Synced: 2025-06-26T09:31:01.899Z (4 months ago)
Topics: news, news-scraper, web-scraping
Homepage:
Size: 8.79 KB
Stars: 13
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # News Scraping

[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.io/pages/gitoxy?utm_source=877&utm_medium=affiliate&groupid=877&utm_content=news-scraping-github&transaction_id=102f49063ab94276ae8f116d224b67)

[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)

[](https://github.com/topics/playwright) [](https://github.com/topics/Proxy)

- [Fetch HTML Page](#fetch-html-page)

- [Parsing HTML](#parsing-html)

- [Extracting Text](#extracting-text)

This article discusses everything you need to know about news scraping, including the benefits and use cases of news scraping as well as how you can use Python to create an article scraper.

For a detailed explanation, see our [blog post](https://oxy.yt/YrD0).

## Fetch HTML Page

```shell

pip3 install requests

```

Create a new Python file and enter the following code:

```python

import requests

response = requests.get(https://quotes.toscrape.com')

print(response.text) # Prints the entire HTML of the webpage.

```

## Parsing HTML

```shell

pip3 install lxml beautifulsoup4

```

```python

from bs4 import BeautifulSoup

response = requests.get('https://quotes.toscrape.com')

soup = BeautifulSoup(response.text, 'lxml')

title = soup.find('title')

```

## Extracting Text

```python

print(title.get_text()) # Prints page title.

```

### Fine Tuning

```python

soup.find('small',itemprop="author")

```

```python

soup.find('small',class_="author")

```

### Extracting Headlines

```python

headlines = soup.find_all(itemprop="text")

for headline in headlines:

    print(headline.get_text())

```

If you wish to find out more about News Scraping, see our [blog post](https://oxy.yt/YrD0).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oxylabs/news-scraping

Awesome Lists containing this project

README