Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ppwwyyxx/SoPaper
Automatically Search and Download Papers
https://github.com/ppwwyyxx/SoPaper
Last synced: 3 months ago
JSON representation
Automatically Search and Download Papers
- Host: GitHub
- URL: https://github.com/ppwwyyxx/SoPaper
- Owner: ppwwyyxx
- License: other
- Created: 2014-03-13T04:31:19.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2023-05-10T14:59:23.000Z (over 1 year ago)
- Last Synced: 2024-03-14T18:36:22.911Z (11 months ago)
- Language: Python
- Homepage: https://pypi.python.org/pypi/sopaper/
- Size: 3.62 MB
- Stars: 189
- Watchers: 17
- Forks: 42
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
## SoPaper, So Easy
This is a project designed for researchers to conveniently access papers they need.The command line tool ``sopaper`` can __automatically search and download__ paper
from Internet, given the title.
The downloaded paper will thus have a readable file name
(I wrote it at the beginning because I'm tired of seeing the file name being random strings).
It mainly supports searching papers in computer science.## How to Use
Install command line dependencies:
* [pdftk](https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/) command line executable.
+ Using pdftk on OSX10.11 might lead to hangs. See [here](http://stackoverflow.com/questions/32505951/pdftk-server-on-os-x-10-11) for more info.
* poppler-utils (optional)Install python package:
``pip install --user sopaper``Usage:
```bash
$ sopaper --help
$ sopaper "Distinctive image features from scale-invariant keypoints"
$ sopaper "https://arxiv.org/abs/1606.06160"
```
NOTE: If you are not in school, you may need proxy by environment variable `http_proxy` and `https_proxy`,
to be able to download from certain sites (such as 'dl.acm.org').## Features
The ``searcher`` module will fuzzy search and analyse results in
* Google Scholarand the ``fetcher`` module will further analyse the results and download papers from the following possible sources:
* direct pdf link
* [dl.acm.org](http://dl.acm.org/)
* [ieeexplore.ieee.org](http://ieeexplore.ieee.org)
* [arxiv.org](http://arxiv.org)``Searcher`` and ``Fetcher`` are __extensible__ to support more websites.
The command line tool will directly download the paper with a __clean filename__.
All downloaded paper will be __compressed__ using `ps2pdf` from poppler-utils, if available.## TODO
* Fetcher dedup: when arxiv abs/pdf apperas both in search results, page would be downloaded twice (maybe add a cache for requests)
* Don't trust arxiv link from google scholar
* Is title correctly updated for dlacm?
* Extract title from bibtex -- more accurate?
* Fetcher for other sites