https://github.com/bfontaine/sq
:page_facing_up: Bulk PDFs downloader
https://github.com/bfontaine/sq
cli pdf ruby tool
Last synced: 12 months ago
JSON representation
:page_facing_up: Bulk PDFs downloader
- Host: GitHub
- URL: https://github.com/bfontaine/sq
- Owner: bfontaine
- License: mit
- Created: 2014-01-10T12:04:24.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2015-10-14T10:25:28.000Z (over 10 years ago)
- Last Synced: 2025-04-10T06:20:51.472Z (about 1 year ago)
- Topics: cli, pdf, ruby, tool
- Language: Ruby
- Homepage:
- Size: 338 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sq
[](https://travis-ci.org/bfontaine/sq)
[](http://badge.fury.io/rb/sq)
[](https://coveralls.io/r/bfontaine/sq)
[](http://inch-ci.org/github/bfontaine/sq)
**sq** is a web scrapping tool for PDFs. Give it an URL and an optional regex,
and it’ll download all PDFs linked on it.
## Install
```
gem install sq
```
## Usage
From the command-line:
```
$ sq [-o ] [-F ] []
```
Available options:
- `-F`: output format (see below), default is `%s.pdf`
- `-o`: choose the output directory
- `-V`: be more verbose
- `--formats`: list available formats
The regex is case-sensitive and is matched against the whole URL.
### Examples
```sh
# Get all PDFs from a Web page
sq http://liafa.fr/~yunes/cours/interfaces/
# Use a regexp to get only those you want
sq http://liafa.fr/~yunes/cours/interfaces/ 'fiches/\d+'
# Be more verbose
sq -V http://liafa.fr/~yunes/cours/interfaces/ 'fiches/\d+'
# Add a filename format
sq -V http://liafa.fr/~yunes/cours/interfaces/ 'fiches/\d+' -F 'class-%Z.pdf'
```
### Formats
The output format is used for each PDF filename. It’s a string with zero or
more special strings that will be replaced by a special value.
```
%n - PDF number, starting at 0
%N - PDF number, starting at 1
%z - same as %n, but zero-padded
%Z - same as %N, but zero-padded
%c - total number of PDFs
%s - name of the PDF, extracted from its URI, without `.pdf`
%S - name of the PDF, extracted from the link text
%_ - same as %S, but spaces are replaced with underscores
%- - same as %S, but spaces are replaced with hyphens
%% - litteral %
```
## API
In a Ruby file:
```ruby
require 'sq'
urls = SQ.query('http://example.com', /important/i)
```
## Tests
```
$ git clone https://github.com/bfontaine/sq.git
$ cd sq
$ bundle install
$ rake test
```
It’ll generate a `coverage/index.html`, which you can open in a Web browser.