An open API service indexing awesome lists of open source software.

https://github.com/kotartemiy/topic-labeled-news-dataset

100k+ topic labeled news articles published from thousands of news websites
https://github.com/kotartemiy/topic-labeled-news-dataset

media news nlp topic topic-modeling topics

Last synced: 3 months ago
JSON representation

100k+ topic labeled news articles published from thousands of news websites

Awesome Lists containing this project

README

        

# topic-labeled-news-dataset
100k+ topic labeled news articles published from thousands of news websites

### Context

We're [NewsCatcher](https://newscatcherapi.com/) team: we collect and index news articles. We provide News API to find relevant news data.

We contribute a lot to the open-source community by sharing our work (find other links at the bottom of the description)

### Content

We collected over 100k articles for 8 different news topics
`BUSINESS` | 15000
`ENTERTAINMENT` | 15000
`HEALTH` | 15000
`NATION` | 15000
`SCIENCE` | 3774
`SPORTS` | 15000
`TECHNOLOGY` | 15000
`WORLD` | 15000

Those articles got published over the first half of August 2020.

All `topics` have 15k articles except for `SCIENCE` which is 3774. Those articles are published by thousands of different news websites.

### Other Useful Links
[newscatcher Py package](https://github.com/kotartemiy/newscatcher) - Programmatically collect normalized news from (almost) any website.

[pygooglenews](https://github.com/kotartemiy/pygooglenews) - If Google News had a Python library

### Support Us

The best you can do for us is to let people know about our [News API](https://newscatcherapi.com/news-api)

### Need a bigger dataset?

Connect with me on Linkedin or email at artem [at] newscatcherapi [dot] com