https://github.com/Kipok/NeMo-Skills

A pipeline to improve skills of large language models
https://github.com/Kipok/NeMo-Skills

Last synced: 6 months ago
JSON representation

A pipeline to improve skills of large language models

Host: GitHub
URL: https://github.com/Kipok/NeMo-Skills
Owner: Kipok
License: apache-2.0
Created: 2024-02-11T23:29:26.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-11-11T23:01:57.000Z (6 months ago)
Last Synced: 2024-11-12T00:17:51.766Z (6 months ago)
Language: Python
Homepage: https://kipok.github.io/NeMo-Skills/
Size: 5.82 MB
Stars: 187
Watchers: 10
Forks: 41
Open Issues: 31
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - Kipok/NeMo-Skills

README

        # NeMo Skills

NeMo-Skills is a collection of pipelines to improve "skills" of large language models.

We mainly focus on the ability to solve mathematical problems, but you can use our pipelines for many other tasks as well.

Here are some of the things we support.

- [Flexible inference](https://kipok.github.io/NeMo-Skills/basics/inference): Seamlessly switch between API providers, local server and large-scale slurm jobs for LLM inference.

- [Multiple formats](https://kipok.github.io/NeMo-Skills/pipelines/checkpoint-conversion): Use any of the [NeMo](https://github.com/NVIDIA/NeMo), [vLLM](https://github.com/vllm-project/vllm)

  and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) servers and easily convert checkpoints from one format to another.

- [Model evaluation](https://kipok.github.io/NeMo-Skills/pipelines/evaluation): Evaluate your models on many popular benchmarks

    - Math problem solving: gsm8k, math, amc23, aime24, omni-math (and many more)

    - Coding skills: human-eval, mbpp

    - Chat/instruction following: ifeval, arena-hard

    - General knowledge: mmlu (generative)

- [Model training](https://kipok.github.io/NeMo-Skills/pipelines/training): Train models at speed-of-light using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner/).

You can find the full documentation [here](https://kipok.github.io/NeMo-Skills/).

## OpenMathInstruct-2

Using our pipelines we created [OpenMathInstruct-2 dataset](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2)

which consists of 14M question-solution pairs (600K unique questions), making it nearly eight times larger

than the previous largest open-source math reasoning dataset.

The models trained on this dataset achieve strong results on common mathematical benchmarks.

  

    model

    GSM8K

    MATH

    AMC 2023

    AIME 2024

    Omni-MATH

  

  

    Llama3.1-8B-Instruct

    84.5

    51.9

    9/40

    2/30

    12.7

  

  

    OpenMath2-Llama3.1-8B (nemo | HF)

    91.7

    67.8

    16/40

    3/30

    22.0

  

  

    + majority@256

    94.1

    76.1

    23/40

    3/30

    24.6

  

  

    Llama3.1-70B-Instruct

    95.1

    68.0

    19/40

    6/30

    19.0

  

  

    OpenMath2-Llama3.1-70B (nemo | HF)

    94.9

    71.9

    20/40

    4/30

    23.1

  

  

    + majority@256

    96.0

    79.6

    24/40

    6/30

    27.6

  

We provide all instructions to [fully reproduce our results](https://kipok.github.io/NeMo-Skills/openmathinstruct2).

See our [paper](https://arxiv.org/abs/2410.01560) for ablations studies and more details!

## Nemo Inspector

We also provide a convenient [tool](/nemo_inspector/Readme.md) for visualizing inference and data analysis

|                                              Overview                                               |                                                     Inference Page                                                      |                                                    Analyze Page                                                     |

| :-------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------: |

| [![Demo of the tool](/nemo_inspector/images/demo.png)](https://www.youtube.com/watch?v=EmBFEl7ydqE) | [![Demo of the inference page](/nemo_inspector/images/inference_page.png)](https://www.youtube.com/watch?v=6utSkPCdNks) | [![Demo of the analyze page](/nemo_inspector/images/analyze_page.png)](https://www.youtube.com/watch?v=cnPyDlDmQXg) |

## Papers

If you find our work useful, please consider citing us!

```bibtex

@article{toshniwal2024openmathinstruct2,

  title   = {OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data},

  author  = {Shubham Toshniwal and Wei Du and Ivan Moshkov and Branislav Kisacanin and Alexan Ayrapetyan and Igor Gitman},

  year    = {2024},

  journal = {arXiv preprint arXiv: Arxiv-2410.01560}

}

```

```bibtex

@article{toshniwal2024openmathinstruct1,

  title   = {OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset},

  author  = {Shubham Toshniwal and Ivan Moshkov and Sean Narenthiran and Daria Gitman and Fei Jia and Igor Gitman},

  year    = {2024},

  journal = {arXiv preprint arXiv: Arxiv-2402.10176}

}

```

Disclaimer: This project is strictly for research purposes, and not an official product from NVIDIA.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Kipok/NeMo-Skills

Awesome Lists containing this project

README