https://github.com/kotartemiy/topic-labeled-news-dataset
100k+ topic labeled news articles published from thousands of news websites
https://github.com/kotartemiy/topic-labeled-news-dataset
media news nlp topic topic-modeling topics
Last synced: 3 months ago
JSON representation
100k+ topic labeled news articles published from thousands of news websites
- Host: GitHub
- URL: https://github.com/kotartemiy/topic-labeled-news-dataset
- Owner: kotartemiy
- License: mit
- Created: 2020-08-18T12:49:09.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-08-18T13:08:14.000Z (almost 5 years ago)
- Last Synced: 2025-01-18T08:44:47.166Z (5 months ago)
- Topics: media, news, nlp, topic, topic-modeling, topics
- Homepage: https://newscatcherapi.com/
- Size: 10.6 MB
- Stars: 18
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# topic-labeled-news-dataset
100k+ topic labeled news articles published from thousands of news websites### Context
We're [NewsCatcher](https://newscatcherapi.com/) team: we collect and index news articles. We provide News API to find relevant news data.
We contribute a lot to the open-source community by sharing our work (find other links at the bottom of the description)
### Content
We collected over 100k articles for 8 different news topics
`BUSINESS` | 15000
`ENTERTAINMENT` | 15000
`HEALTH` | 15000
`NATION` | 15000
`SCIENCE` | 3774
`SPORTS` | 15000
`TECHNOLOGY` | 15000
`WORLD` | 15000Those articles got published over the first half of August 2020.
All `topics` have 15k articles except for `SCIENCE` which is 3774. Those articles are published by thousands of different news websites.
### Other Useful Links
[newscatcher Py package](https://github.com/kotartemiy/newscatcher) - Programmatically collect normalized news from (almost) any website.[pygooglenews](https://github.com/kotartemiy/pygooglenews) - If Google News had a Python library
### Support Us
The best you can do for us is to let people know about our [News API](https://newscatcherapi.com/news-api)
### Need a bigger dataset?
Connect with me on Linkedin or email at artem [at] newscatcherapi [dot] com