Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yutkin/lenta.ru-news-dataset

Corpus of Russian news articles collected from Lenta.Ru
https://github.com/yutkin/lenta.ru-news-dataset

asynchronous asyncio corpus dataset lenta lenta-ru news nlp parser python russian

Last synced: 8 days ago
JSON representation

Corpus of Russian news articles collected from Lenta.Ru

Awesome Lists containing this project

README

        

# Corpus of news articles of Lenta.Ru
* Size: 337 Mb (2 Gb uncompressed)
* News articles: 800K+
* Dates: 30/08/1999 - 14/12/2019

+ [Script](../master/download_lenta.py) for news downloading (Python **3.7**+ is required).

# Download
* [GitHub](https://github.com/yutkin/Lenta.Ru-News-Dataset/releases)
* [Kaggle](https://www.kaggle.com/yutkin/corpus-of-russian-news-articles-from-lenta/)

# Decompression
`bzip2 -d lenta-ru-news.csv.bz2`