https://github.com/majd-kontar/pdf-highlight-extractor

extract-text highlight pdf python

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/majd-kontar/pdf-highlight-extractor
Owner: majd-kontar
Created: 2022-04-16T08:43:58.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2022-04-16T10:08:56.000Z (about 4 years ago)
Last Synced: 2025-02-28T09:32:40.749Z (over 1 year ago)
Topics: extract-text, highlight, pdf, python
Language: Python
Homepage:
Size: 30.3 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# PDF-Highlight-Extractor

![img.png](img.png)

## Description

This repo allows researchers to easily extract the text they highlight in multiple articles and writes them to a docx file as bullet points under the
name of the pdf where they were highlighted as a header.

## Getting Started
Install requirements

`pip install -r requirements.txt`

## Usage

Run `main.py`

Enter the path to the folder containing the pdfs you have highlighted (make sure that all the pdfs are in the same
folder).

Enter the output file name (the name of the docx file to be generated)

example: `output`
will save the output docx file in the directory of the pdfs as `output.docx`

## References

https://stackoverflow.com/a/63686095

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/majd-kontar/pdf-highlight-extractor

Awesome Lists containing this project

README