https://github.com/lixin97/wirelessmathlm

WirelessMathLM:Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning - Official repository for WirelessMathLM paper
https://github.com/lixin97/wirelessmathlm

datasets large-language-models machine-learning mathematical-reasoning mathmatics reinforcement-learning wireless wireless-communication

Last synced: 14 days ago
JSON representation

WirelessMathLM:Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning - Official repository for WirelessMathLM paper

Host: GitHub
URL: https://github.com/lixin97/wirelessmathlm
Owner: LiXin97
Created: 2025-09-27T09:36:27.000Z (10 months ago)
Default Branch: master
Last Pushed: 2025-09-28T09:30:58.000Z (10 months ago)
Last Synced: 2025-09-28T11:39:07.793Z (10 months ago)
Topics: datasets, large-language-models, machine-learning, mathematical-reasoning, mathmatics, reinforcement-learning, wireless, wireless-communication
Language: HTML
Homepage: http://lixin.ai/WirelessMathLM/
Size: 1.02 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning

[![Website](https://img.shields.io/badge/Website-Live-blue)](https://lixin.ai/WirelessMathLM)

[![arXiv](https://img.shields.io/badge/arXiv-Coming%20Soon-red)](https://arxiv.org/)

[![Code](https://img.shields.io/badge/Code-Coming%20Soon-green)](https://github.com/)

> **Authors:** [Xin Li](https://lixin.ai/), [Mengbing Liu](https://liumengbing.com/), [Yiyang Zhu](https://scholar.google.com/citations?user=LWh42_8AAAAJ), Wenhe Zhang, [Li Wei](https://scholar.google.com.sg/citations?user=zdSz9-gAAAAJ), [Jiancheng An](https://scholar.google.com/citations?user=QbTi47kAAAAJ), [Chau Yuen](https://blogs.ntu.edu.sg/chau-yuen/)

> **Affiliation:** Nanyang Technological University

## 📖 Abstract

Large language models (LLMs) excel at general mathematical reasoning but fail catastrophically on specialized technical mathematics. In wireless communications, where problems require precise manipulation of information-theoretic bounds, optimization constraints, and signal processing formulations, even state-of-the-art models struggle to achieve competent performance.

We present **WirelessMathLM**, demonstrating that compact models (0.5B–7B parameters) can match or exceed much larger models through domain-specific reinforcement learning with verifiable rewards. Our key insight is that wireless mathematics problems possess a unique property—verifiable correctness—that enables effective reinforcement learning without human feedback.

## 🎯 Key Contributions

- **WirelessMathBench-XL**: A comprehensive benchmark of 4,027 problems from 970 papers in wireless communications

- **Domain-specific RL**: Group Relative Policy Optimization (GRPO) with binary verification rewards, training directly from base checkpoints without supervised warm-start

- **Efficient Performance**: Our 7B model achieves 39.5% accuracy, approaching GPT-4o (40.4%) while using ~100× fewer parameters than DeepSeek-R1 (671B, 57.4%)

- **Transfer Learning**: Positive transfer to general mathematics benchmarks (+8.4 points average across MATH, Minerva-Math, OlympiadBench, AMC, and AIME)

## 📊 Results Overview

### Model Performance on WirelessMathBench-XL

| Model | Parameters | Accuracy |

|-------|------------|----------|

| **WirelessMathLM-7B** | 7B | **39.5%** |

| GPT-4o | ~1.8T | 40.4% |

| DeepSeek-R1 | 671B | 57.4% |

### GRPO Training Impact

GRPO training nearly doubles performance across all model scales:

- **0.5B**: +11% improvement

- **3B**: +103% improvement

- **7B**: +81% improvement

## 📋 Dataset: WirelessMathBench-XL

WirelessMathBench-XL contains **4,027 mathematical problems** extracted from **970 research papers** in wireless communications, covering:

- Information theory and channel capacity

- Signal processing and beamforming

- Optimization in wireless networks

- MIMO systems and spatial diversity

- Resource allocation and scheduling

- Network coding and cooperative communications

## 🔬 Methodology

### Group Relative Policy Optimization (GRPO)

Our approach uses GRPO with binary verification rewards:

1. **No Supervised Fine-tuning**: Train directly from base model checkpoints

2. **Verifiable Rewards**: Leverage the mathematical nature of wireless problems for automatic verification

3. **Domain-specific Training**: Focus specifically on wireless communications mathematics

4. **Efficient Scaling**: Achieve strong performance with compact models

### Training Pipeline

```

Base Model → GRPO Training → WirelessMathLM

    ↑              ↑              ↓

Qwen2.5    Binary Rewards   Wireless Math

                              Expertise

```

## 📈 Transfer Learning Results

Our models show positive transfer to general mathematics:

| Benchmark | Improvement |

|-----------|-------------|

| MATH | +8.2 points |

| Minerva-Math | +7.9 points |

| OlympiadBench | +9.1 points |

| AMC | +8.7 points |

| AIME | +8.5 points |

| **Average** | **+8.4 points** |

## 📚 Citation

```bibtex

@article{li2025wirelessmathlm,

  title={WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning},

  author={Li, Xin and Liu, Mengbing and Zhu, Yiyang and Zhang, Wenhe and Wei, Li and An, Jiancheng and Yuen, Chau},

  journal={arXiv preprint},

  year={2025}

}

```

## 🔗 Resources

- **Paper**: Coming soon on arXiv

- **Code**: Will be released upon publication

- **Website**: [Project Homepage](website/index.html)

- **Overview**: [WirelessMathLM-Overview.pdf](arXiv_WirelessMathLM/WirelessMathLM-Overview.pdf)

## 📧 Contact

For questions or collaborations, please contact:

- **Xin Li**: [xin019@ntu.edu.sg](mailto:xin019@ntu.edu.sg)

---

**Nanyang Technological University** | **Project Maxwell**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lixin97/wirelessmathlm

Awesome Lists containing this project

README