An open API service indexing awesome lists of open source software.

https://github.com/gabe-zhang/paper2summary

Lightweight scientific paper summarizer using LoRA fine-tuning and RAG-based Q&A
https://github.com/gabe-zhang/paper2summary

arxiv fine-tuning huggingface llama lora-training nlp paper-summarization peft pytorch rag transformers

Last synced: 4 months ago
JSON representation

Lightweight scientific paper summarizer using LoRA fine-tuning and RAG-based Q&A

Awesome Lists containing this project

README

          

# paper2summary

[![Model](https://img.shields.io/badge/HuggingFace-Model-blue)](https://huggingface.co/gabe-zhang/Llama-PaperSummarization-LoRA)
[![License](https://img.shields.io/badge/License-MIT-green)](./LICENSE)

A lightweight scientific paper summarizer combining LoRA fine-tuning with
RAG-based question answering. Run locally on consumer hardware.

## Overview

Reading scientific papers is time-consuming due to knowledge gaps and high
publication volumes. While LLMs like ChatGPT can help, they lack intuitive
citations and have limited scope.

**paper2summary** addresses this by:
- Fine-tuning a lightweight model (1.3GB) for paper summarization
- Providing source references with highlighting for fact-checking
- Supporting local deployment on laptops
- Enabling flexible switching to larger models via APIs
- Minimal codebase (~300 lines for fine-tuning)

## Demo

RAG-based paper Q&A using [Kotaemon](https://github.com/Cinnamon/kotaemon)
with GPT-4o-mini:
- [Single Document Q&A](https://youtu.be/SqDGsJR9OOI) - Query the
"Attention Is All You Need" paper
- [Multi-Document Q&A](https://youtu.be/NsxGwMrflAE) - Compare Transformer
and LoRA papers

## Results

Evaluated on 6,440 test samples with beam search (beam size = 4):

| Model | ROUGE-1 | ROUGE-2 | ROUGE-3 | ROUGE-L |
|-------|---------|---------|---------|---------|
| Llama-3.2-1B-Instruct (baseline) | 36.69 | 7.47 | 1.95 | 19.36 |
| **Llama-PaperSummarization-LoRA** | **41.56** | **11.31** | **2.67** | **21.86** |

The [LoRA model](https://huggingface.co/gabe-zhang/Llama-PaperSummarization-LoRA) shows **+51% ROUGE-2** and **+37% ROUGE-3** improvement.

## Installation

```bash
git clone https://github.com/gabe-zhang/paper2summary.git
cd paper2summary

uv venv && uv sync
uv run python -m spacy download en_core_web_sm
```

## Usage

### Training
```bash
uv run python src/train.py
```

### Testing
```bash
# Quick test (10 samples)
uv run python src/test.py --model_path ./output/lora

# Full benchmark (6,440 samples)
uv run python src/test.py --model_path ./output/lora --num_samples 6440
```

## Project Structure

```
paper2summary/
├── src/
│ ├── train.py # LoRA fine-tuning script
│ ├── test.py # Model evaluation script
│ ├── paper_dataset.py # Dataset loading utilities
│ ├── config/
│ │ └── lora_config.py # Training hyperparameters
│ └── utils/
│ ├── eval.py # ROUGE evaluation metrics
│ └── testing_utils.py
├── output/ # Model checkpoints (generated)
└── pyproject.toml
```

## Training Details

| Parameter | Value |
|-----------|-------|
| Base Model | Llama-3.2-1B-Instruct (1.3GB) |
| LoRA Rank | 8 |
| Target Modules | q_proj, v_proj |
| Trainable Parameters | ~850K (0.07%) |
| Context Length | 10,182 tokens |
| Gradient Accumulation | 4 steps |
| Training Steps | 5,000 |
| Evaluation Interval | Every 20 steps |
| Training Time | ~28 hours on RTX A6000 |

## Dataset

Fine-tuned on 10% of
[ccdv/arxiv-summarization](https://huggingface.co/datasets/ccdv/arxiv-summarization):

| Split | Samples | Avg. Article Tokens | Avg. Abstract Tokens |
|-------|---------|---------------------|----------------------|
| Train | ~20,000 | 6,038 | 299 |
| Validation | ~640 | 5,894 | 172 |
| Test | 6,440 | 5,905 | 174 |

## RAG Architecture

The RAG pipeline uses [Kotaemon](https://github.com/Cinnamon/kotaemon)
for document Q&A:

| Component | Implementation |
|-----------|----------------|
| LLM | GPT-4o-mini (or Llama-3.2-1B-LoRA via Ollama) |
| Embedding | text-embedding-3-small (OpenAI) |
| Reranker | GPT-4o-mini |
| Vector DB | Chroma |
| Document Parser | Docling |

## References

- [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
(Hu et al., ICLR 2022)
- [A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents](https://arxiv.org/abs/1804.05685)
(Cohan et al., NAACL 2018)

## License

- **Code**: [MIT License](./LICENSE)
- **Llama 3.2**: [Llama 3.2 Community License](https://www.llama.com/llama-downloads)
- **Third-party**: [THIRD_PARTY_LICENSES.md](./THIRD_PARTY_LICENSES.md)