https://github.com/nlpatvcu/pdf2txt

Converts a pdf document to text.
https://github.com/nlpatvcu/pdf2txt

Last synced: about 2 months ago
JSON representation

Converts a pdf document to text.

Host: GitHub
URL: https://github.com/nlpatvcu/pdf2txt
Owner: NLPatVCU
License: apache-2.0
Created: 2021-01-15T17:38:07.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2022-04-15T19:06:03.000Z (about 4 years ago)
Last Synced: 2025-01-17T10:24:46.345Z (over 1 year ago)
Language: Java
Size: 65.4 KB
Stars: 1
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # PDF2TXT

PDF2TXT can be used to either convert a single .pdf file to a .txt file or all .pdf files in a given directory to .txt files.

![alt text](https://nlp.cs.vcu.edu/images/Edit_NanomedicineDatabase.png "Nanoinformatics")

Installation

============

when in the python 3 virtual environment:

To install PDF2TXT:

```python

git clone https://github.com/NLPatVCU/PDF2TXT.git

```

You would also need to install the Haystack framework and milvus.

```python

pip3 install pymilvus==1.0.0

pip3 install farm-haystack==1.0.0

```

If you experience any difficulties, try visiting their site: https://github.com/deepset-ai/haystack

Use

===

To convert a single file, run:

```python

python3 pdf2txt.py -f 

```

To convert an entire directory, run:

```python

python3 pdf2txt.py -d 

```

To write output files into a specific directory, append with:

```python

-o 

```

License

=======

This package is licensed under the GNU General Public License

Acknowledgments

===============

- [VCU Natural Language Processing Lab](https://nlp.cs.vcu.edu/)     ![alt text](https://nlp.cs.vcu.edu/images/vcu_head_logo "VCU")

- [Nanoinformatics Vertically Integrated Projects](https://rampages.us/nanoinformatics/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nlpatvcu/pdf2txt

Awesome Lists containing this project

README