https://github.com/siddhant-k-code/llm-parallelism-explorer-poc

It is a cutting-edge research tool designed to optimize parallelism strategies for large language models, with a particular focus on Mixture of Experts (MoE) architectures.
https://github.com/siddhant-k-code/llm-parallelism-explorer-poc

hydra llm parallelism poc

Last synced: about 1 year ago
JSON representation

It is a cutting-edge research tool designed to optimize parallelism strategies for large language models, with a particular focus on Mixture of Experts (MoE) architectures.

Host: GitHub
URL: https://github.com/siddhant-k-code/llm-parallelism-explorer-poc
Owner: Siddhant-K-code
Created: 2024-10-07T07:02:54.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-17T07:44:45.000Z (over 1 year ago)
Last Synced: 2025-04-24T05:47:55.724Z (about 1 year ago)
Topics: hydra, llm, parallelism, poc
Language: Python
Homepage: https://dev.to/siddhantkcode/exploring-parallelism-in-large-language-models-llms-5991
Size: 6.84 KB
Stars: 2
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# LLM Parallelism Explorer PoC

LLM Parallelism Explorer PoC is a cutting-edge research tool designed to optimize parallelism strategies for large language models, with a particular focus on Mixture of Experts (MoE) architectures. This proof-of-concept project performs a comprehensive search across various parallelism configurations to estimate memory usage and identify optimal setups for efficient training on distributed systems. It also uses [facebookresearch/hydra](https://github.com/facebookresearch/hydra) for easy configuration management.

## Features

- Supports advanced parallelism techniques:
- Tensor Parallelism (TP)
- Pipeline Parallelism (PP)
- Expert Parallelism (EP)
- Context Parallelism (CP)
- Data Parallelism (DP)
- Precise memory estimation for model parameters, optimizer states, and activations
- Flexible search space for parallelism configurations
- Multiple data parallel sharding strategies
- CSV output for in-depth analysis of results
- Hydra-powered configuration management

## Installation

Install the required dependencies with:

```bash
pip install -r requirements.txt
```

## Usage

Use a specific configuration file and customize the GPU range:

```bash
python main.py \
--config-name llama3.1-405b.yaml \
+ngpus_range="[8, 128, 1024, 10240]"
```

## Configuration

Leverage Hydra for easy configuration management. Modify these parameters in your YAML config file:

- Model architecture details (e.g., hidden size, number of layers)
- MoE-specific settings (e.g., number of experts, expert frequency)
- Training parameters (e.g., global batch size, data types)
- Parallelism search ranges (e.g., TP, PP, EP ranges)

## Output

The script generates memory_estimation.csv with comprehensive memory estimations for each valid parallelism configuration, including:

- Total memory usage
- Model and optimizer states memory
- Activations memory
- Expert and non-expert parameters
- Component-specific activation memory

## Credits

This project builds upon state-of-the-art parallelism techniques from recent research:

- [Tensor Parallelism: Shoeybi et al., "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism" (2019)](https://arxiv.org/abs/1909.08053)
- [Pipeline Parallelism: Huang et al., "GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism" (2019)](https://arxiv.org/abs/1811.06965)
- [Expert Parallelism: Lepikhin et al., "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding" (2020)](https://arxiv.org/abs/2006.16668)
- [Context Parallelism: Korthikanti et al., "Reducing Activation Recomputation in Large Transformer Models" (2022)](https://arxiv.org/abs/2205.05198)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/siddhant-k-code/llm-parallelism-explorer-poc

Awesome Lists containing this project

README