https://github.com/juliaml/mldatasets.jl
Utility package for accessing common Machine Learning datasets in Julia
https://github.com/juliaml/mldatasets.jl
dataset julia machine-learning
Last synced: 21 days ago
JSON representation
Utility package for accessing common Machine Learning datasets in Julia
- Host: GitHub
- URL: https://github.com/juliaml/mldatasets.jl
- Owner: JuliaML
- License: mit
- Created: 2016-09-08T09:16:50.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2024-09-16T16:52:45.000Z (8 months ago)
- Last Synced: 2025-04-02T03:18:01.504Z (28 days ago)
- Topics: dataset, julia, machine-learning
- Language: Julia
- Homepage: https://juliaml.github.io/MLDatasets.jl/stable
- Size: 2.6 MB
- Stars: 226
- Watchers: 9
- Forks: 47
- Open Issues: 49
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MLDatasets.jl
[](https://JuliaML.github.io/MLDatasets.jl/stable)
[](https://JuliaML.github.io/MLDatasets.jl/dev)
[](https://github.com/JuliaML/MLDatasets.jl/actions)This package represents a community effort to provide a common interface for accessing common Machine Learning datasets.
In contrast to other data-related Julia packages, the focus of MLDatasets.jl is specifically on downloading, unpacking, and accessing benchmark datasets.
Functionality for the purpose of data processing or visualization is only provided to a degree that is special to some dataset.This package is a part of the
[JuliaML](https://github.com/JuliaML) ecosystem.
Its functionality is built on top of the package
[DataDeps.jl](https://github.com/oxinabox/DataDeps.jl).## Available Datasets
Datasets are grouped into different categories. Click on the links below for a full list of datasets available in each category.
- [Graphs](https://juliaml.github.io/MLDatasets.jl/dev/datasets/graphs) - Datasets with an underlying graph structure: Cora, PubMed, CiteSeer, ...
- [Misc](https://juliaml.github.io/MLDatasets.jl/dev/datasets/misc/) - Datasets that do not fall into any of the other categories: Iris, BostonHousing, ...
- [Text](https://juliaml.github.io/MLDatasets.jl/dev/datasets/text/) - Datasets for language models.
- [Vision](https://juliaml.github.io/MLDatasets.jl/dev/datasets/vision/) - Vision related datasets such as MNIST, CIFAR10, CIFAR100, ...## Installation
To install MLDatasets.jl, start up Julia and type the following code snippet into the REPL. It makes use of the native Julia package manger.
```julia
import Pkg
Pkg.add("MLDatasets")
```## Contributing to MLDatasets
Pull requests contributing new datasets are warmly welcome. See the source code of any of the available implemented datasets for
implementation examples.## Other data repositories for Julia
If you don't find here the dataset you are looking for, please let us know by opening an issue.
Moreover, you can check out these other packages to find what you need:- [OutlierDetectionData.jl](https://github.com/OutlierDetectionJL/OutlierDetectionData.jl)
- [MarketData.jl](https://github.com/JuliaQuant/MarketData.jl)
- [ForecastData.jl](https://github.com/viraltux/ForecastData.jl)
- [RDatasets.jl](https://github.com/JuliaStats/RDatasets.jl)
- [CDSAPI.jl](https://github.com/JuliaClimate/CDSAPI.jl)
- [HuggingFaceDatasets.jl](https://github.com/CarloLucibello/HuggingFaceDatasets.jl)## License
This code is free to use under the terms of the MIT license.