{"id":13718893,"url":"https://github.com/red-data-tools/red-datasets","last_synced_at":"2025-04-13T04:58:57.012Z","repository":{"id":38147570,"uuid":"100153342","full_name":"red-data-tools/red-datasets","owner":"red-data-tools","description":"A RubyGem that provides common datasets","archived":false,"fork":false,"pushed_at":"2025-04-08T06:50:51.000Z","size":374,"stargazers_count":32,"open_issues_count":74,"forks_count":26,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-04-13T04:58:50.012Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/red-data-tools.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-13T04:15:54.000Z","updated_at":"2025-04-08T06:50:54.000Z","dependencies_parsed_at":"2023-11-18T07:23:09.511Z","dependency_job_id":"4bbc6d93-4d80-4203-a010-63c7c53a4f37","html_url":"https://github.com/red-data-tools/red-datasets","commit_stats":{"total_commits":277,"total_committers":22,"mean_commits":"12.590909090909092","dds":"0.35018050541516244","last_synced_commit":"4fa77ae7526fb7129081834df7dc17a645bd36cb"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/red-data-tools%2Fred-datasets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/red-data-tools%2Fred-datasets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/red-data-tools%2Fred-datasets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/red-data-tools%2Fred-datasets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/red-data-tools","download_url":"https://codeload.github.com/red-data-tools/red-datasets/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248665759,"owners_count":21142123,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T01:00:39.170Z","updated_at":"2025-04-13T04:58:56.993Z","avatar_url":"https://github.com/red-data-tools.png","language":"Ruby","funding_links":[],"categories":["Data sets","Ruby"],"sub_categories":[],"readme":"# Red Datasets\n\n[![Gem Version](https://badge.fury.io/rb/red-datasets.svg)](https://badge.fury.io/rb/red-datasets)\n\n## Description\n\nRed Datasets provides classes that provide common datasets such as iris dataset.\n\nYou can use datasets easily because you can access each dataset with multiple ways such as `#each` and Apache Arrow Record Batch.\n\n## Install\n\n```console\n% gem install red-datasets\n```\n\n## Available datasets\n\n* Adult Dataset\n* Aozora Bunko\n* California Housing\n* CIFAR-10 Dataset\n* CIFAR-100 Dataset\n* CLDR language plural rules\n* Communities and crime\n* Diamonds Dataset\n* E-Stat Japan\n* Fashion-MNIST\n* Fuel Economy Dataset\n* Geolonia Japanese Addresses\n* Hepatitis\n* House of Councillors of Japan\n* House of Representatives of Japan\n* Iris Dataset\n* Libsvm\n* MNIST database\n* Mushroom\n* Penguins\n* The Penn Treebank Project\n* PMJT - Pre-Modern Japanese Text dataset list\n* Postal Codes in Japan\n* Rdatasets\n* Seaborn\n* Sudachi Synonym Dictionary\n* Wikipedia\n* Wine Dataset\n\n## Usage\n\nHere is an example to access [Iris Data Set](https://archive.ics.uci.edu/ml/datasets/iris) by `#each`  or `Table#to_h` or `Table#fetch_values`.\n\n```ruby\nrequire \"datasets\"\n\niris = Datasets::Iris.new\niris.each do |record|\n  p [\n     record.sepal_length,\n     record.sepal_width,\n     record.petal_length,\n     record.petal_width,\n     record.label,\n  ]\nend\n# =\u003e [5.1, 3.5, 1.4, 0.2, \"Iris-setosa\"]\n# =\u003e [4.9, 3.0, 1.4, 0.2, \"Iris-setosa\"]\n  :\n# =\u003e [7.0, 3.2, 4.7, 1.4, \"Iris-versicolor\"]\n\n\niris_hash = iris.to_table.to_h\np iris_hash[:sepal_length]\n# =\u003e [5.1, 4.9, .. , 7.0, ..\np iris_hash[:sepal_width]\n# =\u003e [3.5, 3.0, .. , 3.2, ..\np iris_hash[:petal_length]\n# =\u003e [1.4, 1.4, .. , 4.7, ..\np iris_hash[:petal_width]\n# =\u003e [0.2, 0.2, .. , 1.4, ..\np iris_hash[:label]\n# =\u003e [\"Iris-setosa\", \"Iris-setosa\", .. , \"Iris-versicolor\", ..\n\n\niris_table = iris.to_table\np iris_table.fetch_values(:sepal_length, :sepal_width, :petal_length, :petal_width).transpose\n# =\u003e [[5.1, 3.5, 1.4, 0.2],\n      [4.9, 3.0, 1.4, 0.2],\n      :\n      [7.0, 3.2, 4.7, 1.4],\n      :\n\np iris_table[:label]\n# =\u003e [\"Iris-setosa\", \"Iris-setosa\", .. , \"Iris-versicolor\", ..\n```\n\n\nHere is an example to access [The CIFAR-10/100 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) by `#each`:\n\n**CIFAR-10**\n\n```ruby\nrequire \"datasets\"\n\ncifar = Datasets::CIFAR.new(n_classes: 10, type: :train)\ncifar.metadata\n#=\u003e #\u003cstruct Datasets::Metadata name=\"CIFAR-10\", url=\"https://www.cs.toronto.edu/~kriz/cifar.html\", licenses=nil, description=\"CIFAR-10 is 32x32 image dataset\"\u003elicenses=nil, description=\"CIFAR-10 is 32x32 image datasets\"\u003e\ncifar.each do |record|\n  p record.pixels\n  # =\u003e [59, 43, 50, 68, 98, 119, 139, 145, 149, 143, .....]\n  p record.label\n  # =\u003e 6\nend\n```\n\n**CIFAR-100**\n\n```ruby\nrequire \"datasets\"\n\ncifar = Datasets::CIFAR.new(n_classes: 100, type: :test)\ncifar.metadata\n#=\u003e #\u003cstruct Datasets::Metadata name=\"CIFAR-100\", url=\"https://www.cs.toronto.edu/~kriz/cifar.html\", licenses=nil, description=\"CIFAR-100 is 32x32 image dataset\"\u003e\ncifar.each do |record|\n  p record.pixels\n  #=\u003e [199, 196, 195, 195, 196, 197, 198, 198, 199, .....]\n  p record.coarse_label\n  #=\u003e 10\n  p record.fine_label\n  #=\u003e 49\nend\n```\n\n**MNIST**\n\n```ruby\nrequire \"datasets\"\n\nmnist = Datasets::MNIST.new(type: :train)\nmnist.metadata\n#=\u003e #\u003cstruct Datasets::Metadata name=\"MNIST-train\", url=\"http://yann.lecun.com/exdb/mnist/\", licenses=nil, description=\"a training set of 60,000 examples\"\u003e\n\nmnist.each do |record|\n  p record.pixels\n  # =\u003e [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .....]\n  p record.label\n  # =\u003e 5\nend\n```\n\n## NArray compatibility\n\n* [red-datasets-numo-narray](https://github.com/red-data-tools/red-datasets-numo-narray)\n\n## How to develop Red Datasets\n1. Fork https://github.com/red-data-tools/red-datasets \n2. Create a feature branch from master\n3. Develop in the feature branch\n4. Pull request from the feature branch to https://github.com/red-data-tools/red-datasets\n\n## License\n\nThe MIT license. See `LICENSE.txt` for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fred-data-tools%2Fred-datasets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fred-data-tools%2Fred-datasets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fred-data-tools%2Fred-datasets/lists"}