https://github.com/benchopt/benchmark_nanogpt

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/benchopt/benchmark_nanogpt
Owner: benchopt
License: bsd-3-clause
Created: 2025-07-02T11:07:20.000Z (12 months ago)
Default Branch: main
Last Pushed: 2026-01-15T08:19:25.000Z (5 months ago)
Last Synced: 2026-01-15T14:55:35.328Z (5 months ago)
Language: Python
Size: 47.9 KB
Stars: 1
Watchers: 0
Forks: 1
Open Issues: 5
Metadata Files:
- Readme: README.rst
- License: LICENSE.md

Awesome Lists containing this project

README

          
Benchmarking deep learning optimization with nanoGPT

====================================================

|Build Status| |Python 3.10+|

This benchmark is dedicated to evaluate new deep learning optimization methods

on the nanoGPT architecture.

The optimization problem is defined as in the original speedrun of nanoGPT (see `modded nanogpt`_):

- The training and validation is perfromed on `FineWeb`_ -- Do not change the dataloaders.

- The training is stopped once the validation loss is below ``3.28``. (Still todo)

For now, the repository contains a single solver, Adam, and run on CPU.

The dataloaders are working but with fixed sequence length of 128 tokens.

We used the original code from nanoGPT (`GPT2 from llm.c`_), but use the simple dataloader from `modded-nanogpt`_.

TODO:

- Tweak the dataloaders to make it more efficient/less error prone.

- See if we want to add imporevments to the architecture (QK-norm, Rotary embeddings, etc.).

Install

--------

This benchmark can be run using the following commands:

.. code-block::

   $ pip install -U benchopt

   $ git clone https://github.com/tomMoral/benchmark_nanogpt

   $ benchopt run benchmark_nanogpt

Apart from the problem, options can be passed to ``benchopt run``, to restrict the benchmarks to some solvers or datasets, e.g.:

.. code-block::

	$ benchopt run benchmark_nanogpt -s solver1 -d dataset2 --max-runs 10 --n-repetitions 10

Use ``benchopt run -h`` for more details about these options, or visit https://benchopt.github.io/api.html.

.. |Build Status| image:: https://github.com/tomMoral/benchmark_nanogpt/actions/workflows/main.yml/badge.svg

   :target: https://github.com/tomMoral/benchmark_nanogpt/actions

.. |Python 3.10+| image:: https://img.shields.io/badge/python-3.10%2B-blue

   :target: https://www.python.org/downloads/release/python-3100/

.. _FineWeb: https://huggingface.co/datasets/HuggingFaceFW/fineweb

.. _modded nanogpt: https://github.com/KellerJordan/modded-nanogpt

.. _GPT2 from llm.c: https://github.com/karpathy/llm.c

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/benchopt/benchmark_nanogpt

Awesome Lists containing this project

README