{"id":13478241,"url":"https://github.com/google/grain","last_synced_at":"2026-01-14T10:21:04.312Z","repository":{"id":56780431,"uuid":"521562289","full_name":"google/grain","owner":"google","description":"Library for reading and processing ML training data.","archived":false,"fork":false,"pushed_at":"2026-01-12T16:20:53.000Z","size":5638,"stargazers_count":648,"open_issues_count":131,"forks_count":61,"subscribers_count":12,"default_branch":"main","last_synced_at":"2026-01-12T21:57:14.050Z","etag":null,"topics":["data-pr","jax","machine-learning","python"],"latest_commit_sha":null,"homepage":"https://google-grain.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-08-05T08:29:51.000Z","updated_at":"2026-01-12T10:44:12.000Z","dependencies_parsed_at":"2025-12-25T21:06:42.280Z","dependency_job_id":null,"html_url":"https://github.com/google/grain","commit_stats":{"total_commits":160,"total_committers":19,"mean_commits":8.421052631578947,"dds":0.575,"last_synced_commit":"5d86144ac9af047b14c4a4f95bb5887e1ca39a59"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/google/grain","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fgrain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fgrain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fgrain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fgrain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google","download_url":"https://codeload.github.com/google/grain/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fgrain/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28416943,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T10:18:03.274Z","status":"ssl_error","status_checked_at":"2026-01-14T10:16:11.865Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-pr","jax","machine-learning","python"],"created_at":"2024-07-31T16:01:54.484Z","updated_at":"2026-01-14T10:21:04.308Z","avatar_url":"https://github.com/google.png","language":"Python","readme":"# Grain - Feeding JAX Models\n\n[![Continuous integration](https://github.com/google/grain/actions/workflows/tests.yml/badge.svg)](https://github.com/google/grain/actions/workflows/tests.yml)\n[![PyPI version](https://img.shields.io/pypi/v/grain)](https://pypi.org/project/grain/)\n\n[**Installation**](#installation)\n| [**Quickstart**](#quickstart)\n| [**Reference docs**](https://google-grain.readthedocs.io/en/latest/)\n| [**Change logs**](https://google-grain.readthedocs.io/en/latest/changelog.html)\n\nGrain is a Python library for reading and processing data for training and\nevaluating JAX models. It is flexible, fast and deterministic.\n\nGrain allows to define data processing steps in a simple declarative way:\n\n```python\nimport grain\n\ndataset = (\n    grain.MapDataset.source([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n    .shuffle(seed=42)  # Shuffles elements globally.\n    .map(lambda x: x+1)  # Maps each element.\n    .batch(batch_size=2)  # Batches consecutive elements.\n)\n\nfor batch in dataset:\n  # Training step.\n```\n\nGrain is designed to work with JAX models but it does not require JAX to run\nand can be used with other frameworks as well.\n\n## Installation\n\nGrain is available on [PyPI](https://pypi.org/project/grain/) and can be\ninstalled with `pip install grain`.\n\n### Supported platforms\n\nGrain does not directly use GPU or TPU in its transformations, the processing\nwithin Grain will be done on the CPU by default.\n\n|         |  Linux  |   Mac   | Windows |\n|---------|---------|---------|---------|\n| x86_64  | yes     | no      | yes     |\n| aarch64 | yes     | yes     | n/a     |\n\n## Quickstart\n\n- [Basic `Dataset` tutorial](https://google-grain.readthedocs.io/en/latest/tutorials/dataset_basic_tutorial.html)\n\n## Citing Grain\n\nTo cite this repository:\n\n```\n@software{grain2023github,\n  author = {Marvin Ritter and Ihor Indyk and Aayush Singh and Andrew Audibert and Anoosha Seelam and Camelia Hanes and Eric Lau and Jacek Olesiak and Jiyang Kang and Xihui Wu},\n  title = {{Grain} - Feeding JAX Models},\n  url = {http://github.com/google/grain},\n  version = {0.2.12},\n  year = {2023},\n}\n```\n\nThe version number is intended to be that from [pyproject.toml](https://github.com/google/grain/blob/main/pyproject.toml), and the year corresponds to the project's open-source release.\n\n## Existing users\n\nGrain is used by [MaxText](https://github.com/google/maxtext/tree/main),\n[Gemma](https://github.com/google-deepmind/gemma),\n[kauldron](https://github.com/google-research/kauldron),\n[maxdiffusion](https://github.com/AI-Hypercomputer/maxdiffusion) and multiple\ninternal Google projects.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Fgrain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle%2Fgrain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Fgrain/lists"}