https://github.com/jonathanporta/bbc-news-article-archive
Thrown together in order to grab a pickle of this data to run for a model training.
https://github.com/jonathanporta/bbc-news-article-archive
Last synced: over 1 year ago
JSON representation
Thrown together in order to grab a pickle of this data to run for a model training.
- Host: GitHub
- URL: https://github.com/jonathanporta/bbc-news-article-archive
- Owner: JonathanPorta
- Created: 2019-02-20T18:00:38.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-20T18:01:34.000Z (over 7 years ago)
- Last Synced: 2025-01-24T19:37:09.974Z (over 1 year ago)
- Language: Python
- Size: 2.29 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.TXT
Awesome Lists containing this project
README
Consists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005.
Natural Classes: 5 (business, entertainment, politics, sport, tech)
If you make use of the dataset, please consider citing the publication:
- D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006.
All rights, including copyright, in the content of the original articles are owned by the BBC.
Contact Derek Greene for further information.
http://mlg.ucd.ie/datasets/bbc.html