https://github.com/gabe-zhang/paper2summary
Lightweight scientific paper summarizer using LoRA fine-tuning and RAG-based Q&A
https://github.com/gabe-zhang/paper2summary
arxiv fine-tuning huggingface llama lora-training nlp paper-summarization peft pytorch rag transformers
Last synced: 4 months ago
JSON representation
Lightweight scientific paper summarizer using LoRA fine-tuning and RAG-based Q&A
- Host: GitHub
- URL: https://github.com/gabe-zhang/paper2summary
- Owner: gabe-zhang
- License: mit
- Created: 2024-11-06T21:59:29.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-01-25T08:33:23.000Z (5 months ago)
- Last Synced: 2026-01-25T11:04:31.050Z (5 months ago)
- Topics: arxiv, fine-tuning, huggingface, llama, lora-training, nlp, paper-summarization, peft, pytorch, rag, transformers
- Language: Python
- Homepage:
- Size: 267 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# paper2summary
[](https://huggingface.co/gabe-zhang/Llama-PaperSummarization-LoRA)
[](./LICENSE)
A lightweight scientific paper summarizer combining LoRA fine-tuning with
RAG-based question answering. Run locally on consumer hardware.
## Overview
Reading scientific papers is time-consuming due to knowledge gaps and high
publication volumes. While LLMs like ChatGPT can help, they lack intuitive
citations and have limited scope.
**paper2summary** addresses this by:
- Fine-tuning a lightweight model (1.3GB) for paper summarization
- Providing source references with highlighting for fact-checking
- Supporting local deployment on laptops
- Enabling flexible switching to larger models via APIs
- Minimal codebase (~300 lines for fine-tuning)
## Demo
RAG-based paper Q&A using [Kotaemon](https://github.com/Cinnamon/kotaemon)
with GPT-4o-mini:
- [Single Document Q&A](https://youtu.be/SqDGsJR9OOI) - Query the
"Attention Is All You Need" paper
- [Multi-Document Q&A](https://youtu.be/NsxGwMrflAE) - Compare Transformer
and LoRA papers
## Results
Evaluated on 6,440 test samples with beam search (beam size = 4):
| Model | ROUGE-1 | ROUGE-2 | ROUGE-3 | ROUGE-L |
|-------|---------|---------|---------|---------|
| Llama-3.2-1B-Instruct (baseline) | 36.69 | 7.47 | 1.95 | 19.36 |
| **Llama-PaperSummarization-LoRA** | **41.56** | **11.31** | **2.67** | **21.86** |
The [LoRA model](https://huggingface.co/gabe-zhang/Llama-PaperSummarization-LoRA) shows **+51% ROUGE-2** and **+37% ROUGE-3** improvement.
## Installation
```bash
git clone https://github.com/gabe-zhang/paper2summary.git
cd paper2summary
uv venv && uv sync
uv run python -m spacy download en_core_web_sm
```
## Usage
### Training
```bash
uv run python src/train.py
```
### Testing
```bash
# Quick test (10 samples)
uv run python src/test.py --model_path ./output/lora
# Full benchmark (6,440 samples)
uv run python src/test.py --model_path ./output/lora --num_samples 6440
```
## Project Structure
```
paper2summary/
├── src/
│ ├── train.py # LoRA fine-tuning script
│ ├── test.py # Model evaluation script
│ ├── paper_dataset.py # Dataset loading utilities
│ ├── config/
│ │ └── lora_config.py # Training hyperparameters
│ └── utils/
│ ├── eval.py # ROUGE evaluation metrics
│ └── testing_utils.py
├── output/ # Model checkpoints (generated)
└── pyproject.toml
```
## Training Details
| Parameter | Value |
|-----------|-------|
| Base Model | Llama-3.2-1B-Instruct (1.3GB) |
| LoRA Rank | 8 |
| Target Modules | q_proj, v_proj |
| Trainable Parameters | ~850K (0.07%) |
| Context Length | 10,182 tokens |
| Gradient Accumulation | 4 steps |
| Training Steps | 5,000 |
| Evaluation Interval | Every 20 steps |
| Training Time | ~28 hours on RTX A6000 |
## Dataset
Fine-tuned on 10% of
[ccdv/arxiv-summarization](https://huggingface.co/datasets/ccdv/arxiv-summarization):
| Split | Samples | Avg. Article Tokens | Avg. Abstract Tokens |
|-------|---------|---------------------|----------------------|
| Train | ~20,000 | 6,038 | 299 |
| Validation | ~640 | 5,894 | 172 |
| Test | 6,440 | 5,905 | 174 |
## RAG Architecture
The RAG pipeline uses [Kotaemon](https://github.com/Cinnamon/kotaemon)
for document Q&A:
| Component | Implementation |
|-----------|----------------|
| LLM | GPT-4o-mini (or Llama-3.2-1B-LoRA via Ollama) |
| Embedding | text-embedding-3-small (OpenAI) |
| Reranker | GPT-4o-mini |
| Vector DB | Chroma |
| Document Parser | Docling |
## References
- [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
(Hu et al., ICLR 2022)
- [A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents](https://arxiv.org/abs/1804.05685)
(Cohan et al., NAACL 2018)
## License
- **Code**: [MIT License](./LICENSE)
- **Llama 3.2**: [Llama 3.2 Community License](https://www.llama.com/llama-downloads)
- **Third-party**: [THIRD_PARTY_LICENSES.md](./THIRD_PARTY_LICENSES.md)