https://github.com/majd-kontar/pdf-highlight-extractor
https://github.com/majd-kontar/pdf-highlight-extractor
extract-text highlight pdf python
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/majd-kontar/pdf-highlight-extractor
- Owner: majd-kontar
- Created: 2022-04-16T08:43:58.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-04-16T10:08:56.000Z (about 4 years ago)
- Last Synced: 2025-02-28T09:32:40.749Z (over 1 year ago)
- Topics: extract-text, highlight, pdf, python
- Language: Python
- Homepage:
- Size: 30.3 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PDF-Highlight-Extractor

## Description
This repo allows researchers to easily extract the text they highlight in multiple articles and writes them to a docx file as bullet points under the
name of the pdf where they were highlighted as a header.
## Getting Started
Install requirements
`pip install -r requirements.txt`
## Usage
Run `main.py`
Enter the path to the folder containing the pdfs you have highlighted (make sure that all the pdfs are in the same
folder).
Enter the output file name (the name of the docx file to be generated)
example: `output`
will save the output docx file in the directory of the pdfs as `output.docx`
## References
https://stackoverflow.com/a/63686095