https://github.com/gabe-zhang/paper2summary

Lightweight scientific paper summarizer using LoRA fine-tuning and RAG-based Q&A
https://github.com/gabe-zhang/paper2summary

arxiv fine-tuning huggingface llama lora-training nlp paper-summarization peft pytorch rag transformers

Last synced: 4 months ago
JSON representation

Lightweight scientific paper summarizer using LoRA fine-tuning and RAG-based Q&A

Host: GitHub
URL: https://github.com/gabe-zhang/paper2summary
Owner: gabe-zhang
License: mit
Created: 2024-11-06T21:59:29.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2026-01-25T08:33:23.000Z (5 months ago)
Last Synced: 2026-01-25T11:04:31.050Z (5 months ago)
Topics: arxiv, fine-tuning, huggingface, llama, lora-training, nlp, paper-summarization, peft, pytorch, rag, transformers
Language: Python
Homepage:
Size: 267 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # paper2summary

[![Model](https://img.shields.io/badge/HuggingFace-Model-blue)](https://huggingface.co/gabe-zhang/Llama-PaperSummarization-LoRA)

[![License](https://img.shields.io/badge/License-MIT-green)](./LICENSE)

A lightweight scientific paper summarizer combining LoRA fine-tuning with

RAG-based question answering. Run locally on consumer hardware.

## Overview

Reading scientific papers is time-consuming due to knowledge gaps and high

publication volumes. While LLMs like ChatGPT can help, they lack intuitive

citations and have limited scope.

**paper2summary** addresses this by:

- Fine-tuning a lightweight model (1.3GB) for paper summarization

- Providing source references with highlighting for fact-checking

- Supporting local deployment on laptops

- Enabling flexible switching to larger models via APIs

- Minimal codebase (~300 lines for fine-tuning)

## Demo

RAG-based paper Q&A using [Kotaemon](https://github.com/Cinnamon/kotaemon)

with GPT-4o-mini:

- [Single Document Q&A](https://youtu.be/SqDGsJR9OOI) - Query the

  "Attention Is All You Need" paper

- [Multi-Document Q&A](https://youtu.be/NsxGwMrflAE) - Compare Transformer

  and LoRA papers

## Results

Evaluated on 6,440 test samples with beam search (beam size = 4):

| Model | ROUGE-1 | ROUGE-2 | ROUGE-3 | ROUGE-L |

|-------|---------|---------|---------|---------|

| Llama-3.2-1B-Instruct (baseline) | 36.69 | 7.47 | 1.95 | 19.36 |

| **Llama-PaperSummarization-LoRA** | **41.56** | **11.31** | **2.67** | **21.86** |

The [LoRA model](https://huggingface.co/gabe-zhang/Llama-PaperSummarization-LoRA) shows **+51% ROUGE-2** and **+37% ROUGE-3** improvement.

## Installation

```bash

git clone https://github.com/gabe-zhang/paper2summary.git

cd paper2summary

uv venv && uv sync

uv run python -m spacy download en_core_web_sm

```

## Usage

### Training

```bash

uv run python src/train.py

```

### Testing

```bash

# Quick test (10 samples)

uv run python src/test.py --model_path ./output/lora

# Full benchmark (6,440 samples)

uv run python src/test.py --model_path ./output/lora --num_samples 6440

```

## Project Structure

```

paper2summary/

├── src/

│   ├── train.py           # LoRA fine-tuning script

│   ├── test.py            # Model evaluation script

│   ├── paper_dataset.py   # Dataset loading utilities

│   ├── config/

│   │   └── lora_config.py # Training hyperparameters

│   └── utils/

│       ├── eval.py        # ROUGE evaluation metrics

│       └── testing_utils.py

├── output/                # Model checkpoints (generated)

└── pyproject.toml

```

## Training Details

| Parameter | Value |

|-----------|-------|

| Base Model | Llama-3.2-1B-Instruct (1.3GB) |

| LoRA Rank | 8 |

| Target Modules | q_proj, v_proj |

| Trainable Parameters | ~850K (0.07%) |

| Context Length | 10,182 tokens |

| Gradient Accumulation | 4 steps |

| Training Steps | 5,000 |

| Evaluation Interval | Every 20 steps |

| Training Time | ~28 hours on RTX A6000 |

## Dataset

Fine-tuned on 10% of

[ccdv/arxiv-summarization](https://huggingface.co/datasets/ccdv/arxiv-summarization):

| Split | Samples | Avg. Article Tokens | Avg. Abstract Tokens |

|-------|---------|---------------------|----------------------|

| Train | ~20,000 | 6,038 | 299 |

| Validation | ~640 | 5,894 | 172 |

| Test | 6,440 | 5,905 | 174 |

## RAG Architecture

The RAG pipeline uses [Kotaemon](https://github.com/Cinnamon/kotaemon)

for document Q&A:

| Component | Implementation |

|-----------|----------------|

| LLM | GPT-4o-mini (or Llama-3.2-1B-LoRA via Ollama) |

| Embedding | text-embedding-3-small (OpenAI) |

| Reranker | GPT-4o-mini |

| Vector DB | Chroma |

| Document Parser | Docling |

## References

- [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)

  (Hu et al., ICLR 2022)

- [A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents](https://arxiv.org/abs/1804.05685)

  (Cohan et al., NAACL 2018)

## License

- **Code**: [MIT License](./LICENSE)

- **Llama 3.2**: [Llama 3.2 Community License](https://www.llama.com/llama-downloads)

- **Third-party**: [THIRD_PARTY_LICENSES.md](./THIRD_PARTY_LICENSES.md)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gabe-zhang/paper2summary

Awesome Lists containing this project

README