https://github.com/benchopt/benchmark_nanogpt
https://github.com/benchopt/benchmark_nanogpt
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/benchopt/benchmark_nanogpt
- Owner: benchopt
- License: bsd-3-clause
- Created: 2025-07-02T11:07:20.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2026-01-15T08:19:25.000Z (5 months ago)
- Last Synced: 2026-01-15T14:55:35.328Z (5 months ago)
- Language: Python
- Size: 47.9 KB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.rst
- License: LICENSE.md
Awesome Lists containing this project
README
Benchmarking deep learning optimization with nanoGPT
====================================================
|Build Status| |Python 3.10+|
This benchmark is dedicated to evaluate new deep learning optimization methods
on the nanoGPT architecture.
The optimization problem is defined as in the original speedrun of nanoGPT (see `modded nanogpt`_):
- The training and validation is perfromed on `FineWeb`_ -- Do not change the dataloaders.
- The training is stopped once the validation loss is below ``3.28``. (Still todo)
For now, the repository contains a single solver, Adam, and run on CPU.
The dataloaders are working but with fixed sequence length of 128 tokens.
We used the original code from nanoGPT (`GPT2 from llm.c`_), but use the simple dataloader from `modded-nanogpt`_.
TODO:
- Tweak the dataloaders to make it more efficient/less error prone.
- See if we want to add imporevments to the architecture (QK-norm, Rotary embeddings, etc.).
Install
--------
This benchmark can be run using the following commands:
.. code-block::
$ pip install -U benchopt
$ git clone https://github.com/tomMoral/benchmark_nanogpt
$ benchopt run benchmark_nanogpt
Apart from the problem, options can be passed to ``benchopt run``, to restrict the benchmarks to some solvers or datasets, e.g.:
.. code-block::
$ benchopt run benchmark_nanogpt -s solver1 -d dataset2 --max-runs 10 --n-repetitions 10
Use ``benchopt run -h`` for more details about these options, or visit https://benchopt.github.io/api.html.
.. |Build Status| image:: https://github.com/tomMoral/benchmark_nanogpt/actions/workflows/main.yml/badge.svg
:target: https://github.com/tomMoral/benchmark_nanogpt/actions
.. |Python 3.10+| image:: https://img.shields.io/badge/python-3.10%2B-blue
:target: https://www.python.org/downloads/release/python-3100/
.. _FineWeb: https://huggingface.co/datasets/HuggingFaceFW/fineweb
.. _modded nanogpt: https://github.com/KellerJordan/modded-nanogpt
.. _GPT2 from llm.c: https://github.com/karpathy/llm.c