An open API service indexing awesome lists of open source software.

https://github.com/py-pdf/pdf-crawler

This project goal is getting a large dataset of PDF documents
https://github.com/py-pdf/pdf-crawler

Last synced: 3 months ago
JSON representation

This project goal is getting a large dataset of PDF documents

Awesome Lists containing this project

README

          

# pdf-crawler

The goal of pdf-crawler is to download PDF files from web pages for testing
PyPDF2.

## Install

```
pip install -r requirements.txt
```

## Usage

It's organized in mostly isolted scripts, e.g.

```
python crawl.py
```

starts downloading PDF documents.