https://github.com/owen-liuyuxuan/localpdfsummarizer
a purely simple local attempt for summarizing academic PDF with learning-based tool-box.
https://github.com/owen-liuyuxuan/localpdfsummarizer
Last synced: 6 months ago
JSON representation
a purely simple local attempt for summarizing academic PDF with learning-based tool-box.
- Host: GitHub
- URL: https://github.com/owen-liuyuxuan/localpdfsummarizer
- Owner: Owen-Liuyuxuan
- Created: 2024-02-23T08:52:35.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-03-04T05:21:54.000Z (over 1 year ago)
- Last Synced: 2025-04-17T01:35:36.583Z (6 months ago)
- Language: Python
- Size: 1.16 MB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# PDF Summarizer
This project presents a purely simple local attempt for summarizing academic PDF with learning-based tool-box.
## Main Components and Functions
#### Components and Setup
This project contains the following parts:
1. PyMuPDF PDF parser to handel PDF files.
2. EfficientDet Layout detection model from [layoutparser](https://github.com/Layout-Parser/layout-parser?tab=readme-ov-file). Install with `pip install "layoutparser[effdet]"`.
3. Open-source EN/CN LLM [ChatGLM](https://github.com/THUDM/ChatGLM-6B). Install pytorch with cuda and transformers (version<4.37.0).
4. Streamlit library for web page creation.After installing libraries for layoutparser and ChatGLM, run `pip3 install -r requirements.txt` to install other dependencies.
Start the local serving with
`python3 -m streamlit run web_ui.py --server.fileWatcherType none`
#### Functions
1. Accept PDF upload / link upload.
2. Extract all figures and tables in the PDF file.
3. ChatGLM tries to summarize the paper's idea from the first a few thousands of characters of the text (depends on parameters and GPU memory). Giving response in English first, then in Chinese.## Examples
Take the chatGLM paper as example.

