https://github.com/owen-liuyuxuan/localpdfsummarizer

a purely simple local attempt for summarizing academic PDF with learning-based tool-box.
https://github.com/owen-liuyuxuan/localpdfsummarizer

Last synced: 6 months ago
JSON representation

a purely simple local attempt for summarizing academic PDF with learning-based tool-box.

Host: GitHub
URL: https://github.com/owen-liuyuxuan/localpdfsummarizer
Owner: Owen-Liuyuxuan
Created: 2024-02-23T08:52:35.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-03-04T05:21:54.000Z (over 1 year ago)
Last Synced: 2025-04-17T01:35:36.583Z (6 months ago)
Language: Python
Size: 1.16 MB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

# PDF Summarizer

This project presents a purely simple local attempt for summarizing academic PDF with learning-based tool-box.

## Main Components and Functions

#### Components and Setup

This project contains the following parts:

1. PyMuPDF PDF parser to handel PDF files.
2. EfficientDet Layout detection model from [layoutparser](https://github.com/Layout-Parser/layout-parser?tab=readme-ov-file). Install with `pip install "layoutparser[effdet]"`.
3. Open-source EN/CN LLM [ChatGLM](https://github.com/THUDM/ChatGLM-6B). Install pytorch with cuda and transformers (version<4.37.0).
4. Streamlit library for web page creation.

After installing libraries for layoutparser and ChatGLM, run `pip3 install -r requirements.txt` to install other dependencies.

Start the local serving with

`python3 -m streamlit run web_ui.py --server.fileWatcherType none`

#### Functions

1. Accept PDF upload / link upload.
2. Extract all figures and tables in the PDF file.
3. ChatGLM tries to summarize the paper's idea from the first a few thousands of characters of the text (depends on parameters and GPU memory). Giving response in English first, then in Chinese.

## Examples

Take the chatGLM paper as example.

![image](docs/input_example.png)
![image](docs/output_example.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/owen-liuyuxuan/localpdfsummarizer

Awesome Lists containing this project

README