Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yutkin/lenta.ru-news-dataset
Corpus of Russian news articles collected from Lenta.Ru
https://github.com/yutkin/lenta.ru-news-dataset
asynchronous asyncio corpus dataset lenta lenta-ru news nlp parser python russian
Last synced: 8 days ago
JSON representation
Corpus of Russian news articles collected from Lenta.Ru
- Host: GitHub
- URL: https://github.com/yutkin/lenta.ru-news-dataset
- Owner: yutkin
- Created: 2017-04-04T06:56:45.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-11-19T00:08:16.000Z (almost 2 years ago)
- Last Synced: 2023-11-07T19:44:52.469Z (about 1 year ago)
- Topics: asynchronous, asyncio, corpus, dataset, lenta, lenta-ru, news, nlp, parser, python, russian
- Language: Python
- Homepage:
- Size: 14.6 KB
- Stars: 133
- Watchers: 8
- Forks: 22
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Corpus of news articles of Lenta.Ru
* Size: 337 Mb (2 Gb uncompressed)
* News articles: 800K+
* Dates: 30/08/1999 - 14/12/2019+ [Script](../master/download_lenta.py) for news downloading (Python **3.7**+ is required).
# Download
* [GitHub](https://github.com/yutkin/Lenta.Ru-News-Dataset/releases)
* [Kaggle](https://www.kaggle.com/yutkin/corpus-of-russian-news-articles-from-lenta/)# Decompression
`bzip2 -d lenta-ru-news.csv.bz2`