https://github.com/lfoppiano/structure-vision
Viewer for the structure extracted by Grobid on PDF documents
https://github.com/lfoppiano/structure-vision
documents grobid hamburger-to-cow pdf streamlit structure
Last synced: 6 months ago
JSON representation
Viewer for the structure extracted by Grobid on PDF documents
- Host: GitHub
- URL: https://github.com/lfoppiano/structure-vision
- Owner: lfoppiano
- License: apache-2.0
- Created: 2023-12-14T00:27:52.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-04T11:54:32.000Z (about 1 year ago)
- Last Synced: 2024-12-04T12:38:27.623Z (about 1 year ago)
- Topics: documents, grobid, hamburger-to-cow, pdf, streamlit, structure
- Language: Python
- Homepage: https://structure-vision.streamlit.app/
- Size: 501 KB
- Stars: 42
- Watchers: 2
- Forks: 9
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# structure-vision
Streamlit application, visualising the structures extracted by [Grobid](https://github.com/kermitt2/grobid) that include PDF coordinates.
It uses the Streamlit component [streamlit-pdf-viewer](https://github.com/lfoppiano/streamlit-pdf-viewer) we've been developing.
This application allows you to visualise the following components:
- authors
- affiliations
- title
- head of sections
- paragraphs or sentences
- callout references in text
- figures
- formulas
- bibliographic references
In addition, the sidebar contains other options, mostly for testing the various
**Demo**: https://structure-vision.streamlit.app/

This tool was built as a test application for [streamlit-pdf-viewer](https://github.com/lfoppiano/streamlit-pdf-viewer), a new streamlit component for visualising enhanced PDF documents.
## Getting started
```shell
pip install -r requirements
streamlit run streamlit_app.py
```
The Grobid URL should be provideed via environment variables, or via a `.env` file, which should be created in the the root folder of the project.
Here an example of how the project is configured to hit the local Grobid instance:
```
GROBID_URL=http://localhost:8070
```
## Developer notes
To install the Streamlit PDF viewer via github:
```shell
pip install -e git+https://github.com/lfoppiano/streamlit-pdf-viewer
```