An open API service indexing awesome lists of open source software.

https://github.com/ivanbongiorni/datasets

Useful and/or interesting datasets for ML
https://github.com/ivanbongiorni/datasets

Last synced: 7 months ago
JSON representation

Useful and/or interesting datasets for ML

Awesome Lists containing this project

README

          

# datasets
Useful and/or interesting datasets for ML for quick experimentations.

- `/sentiment140`: processed subset of Kaggle's [Sentiment140 dataset](https://www.kaggle.com/kazanova/sentiment140). Tokens are already vectorized into ints and ready to be fed to ML models. The folder contains 500k train obs, 150k test obs, relative labels, and a Python dict to reverse vectorization.