An open API service indexing awesome lists of open source software.

https://github.com/nlpatvcu/pdf2txt

Converts a pdf document to text.
https://github.com/nlpatvcu/pdf2txt

Last synced: about 2 months ago
JSON representation

Converts a pdf document to text.

Awesome Lists containing this project

README

          

# PDF2TXT

PDF2TXT can be used to either convert a single .pdf file to a .txt file or all .pdf files in a given directory to .txt files.

![alt text](https://nlp.cs.vcu.edu/images/Edit_NanomedicineDatabase.png "Nanoinformatics")

Installation
============
when in the python 3 virtual environment:

To install PDF2TXT:
```python
git clone https://github.com/NLPatVCU/PDF2TXT.git
```
You would also need to install the Haystack framework and milvus.
```python
pip3 install pymilvus==1.0.0
pip3 install farm-haystack==1.0.0
```
If you experience any difficulties, try visiting their site: https://github.com/deepset-ai/haystack

Use
===

To convert a single file, run:
```python
python3 pdf2txt.py -f
```

To convert an entire directory, run:
```python
python3 pdf2txt.py -d
```
To write output files into a specific directory, append with:
```python
-o
```
License
=======
This package is licensed under the GNU General Public License

Acknowledgments
===============
- [VCU Natural Language Processing Lab](https://nlp.cs.vcu.edu/) ![alt text](https://nlp.cs.vcu.edu/images/vcu_head_logo "VCU")
- [Nanoinformatics Vertically Integrated Projects](https://rampages.us/nanoinformatics/)