{"id":13586144,"url":"https://github.com/Tishacy/SciDownl","last_synced_at":"2025-04-07T14:33:39.168Z","repository":{"id":34606632,"uuid":"180743987","full_name":"Tishacy/SciDownl","owner":"Tishacy","description":"An unofficial api for downloading papers from SciHub via DOI, PMID, title","archived":false,"fork":false,"pushed_at":"2024-02-11T13:59:28.000Z","size":259,"stargazers_count":167,"open_issues_count":19,"forks_count":40,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-04-14T12:10:30.690Z","etag":null,"topics":["doi","downloader","paper","pdf","pmid","scihub"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Tishacy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-11T08:02:29.000Z","updated_at":"2024-06-19T09:58:04.234Z","dependencies_parsed_at":"2024-06-19T09:58:02.281Z","dependency_job_id":"e220d70a-2080-485b-9b6c-c7d9e4f8f2fb","html_url":"https://github.com/Tishacy/SciDownl","commit_stats":{"total_commits":52,"total_committers":2,"mean_commits":26.0,"dds":0.05769230769230771,"last_synced_commit":"f607273fd24e1a1609221febfe9bcf23cc0fa3fc"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tishacy%2FSciDownl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tishacy%2FSciDownl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tishacy%2FSciDownl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tishacy%2FSciDownl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Tishacy","download_url":"https://codeload.github.com/Tishacy/SciDownl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247670463,"owners_count":20976574,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["doi","downloader","paper","pdf","pmid","scihub"],"created_at":"2024-08-01T15:05:21.176Z","updated_at":"2025-04-07T14:33:38.899Z","avatar_url":"https://github.com/Tishacy.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003ch1\u003eSciDownl\u003c/h1\u003e\n\nAn unofficial api for downloading papers from SciHub.\n\n- Support downloading with DOI, PMID or TITLE.\n- Easy to update newest SciHub domains.\n- Ready for changes: Encapsulate possible future changes of SciHub as configurations.\n- Support proxies.\n\n# Quick Usage\n\n```bash\n# Download with a DOI and filenmae is the paper's title.\n$ scidownl download --doi https://doi.org/10.1145/3375633\n\n# Download with a PMID and a user-defined filepath\n$ scidownl download --pmid 31395057 --out ./paper/paper-1.pdf\n\n# Download with a title\n$ scidownl download --title \"ImageNet Classification with Deep Convolutional Neural Networks\" --out ./paper/paper-1.pdf\n\n# Download with a proxy: SCHEME=PROXY_ADDRESS \n$ scidownl download --pmid 31395057 --out ./paper/paper-1.pdf --proxy http=socks5://127.0.0.1:7890\n```\n\n# Installation\n\n## Install with pip\n\nScidownl could be easily install with pip.\n\n```bash\n$ pip3 install -U scidownl\n```\n\n## Install from source code\n\n```bash\n$ git clone https://github.com/Tishacy/SciDownl.git\n$ cd Scidownl \u0026\u0026 python3 setup.py install\n```\n\n# Usage\n\n## Command line tool\n\n```bash\n$ scidownl -h\nUsage: scidownl [OPTIONS] COMMAND [ARGS]...\n\n  Command line tool to download pdfs from Scihub.\n\nOptions:\n  -h, --help  Show this message and exit.\n\nCommands:\n  config         Get global configs.\n  domain.list    List available SciHub domains in local db.\n  domain.update  Update available SciHub domains and save them to local db.\n  download       Download paper(s) by DOI or PMID.\n```\n\n### 1. Update available SciHub domains\n\n```bash\n$ scidownl domain.update --help\nUsage: scidownl domain.update [OPTIONS]\n\n  Update available SciHub domains and save them to local db.\n\nOptions:\n  -m, --mode TEXT  update mode, could be 'crawl' or 'search', default mode is\n                   'crawl'.\n  -h, --help       Show this message and exit.\n```\n\nThere are 2 update modes that you could specify with an option: `-m` or `--mode`\n\n-   `crawl`: [Default] Crawling the real-time updated SciHub domains website (aka, SciHub domain source) to get available SciHub domains. The SciHub domain source website url is configured in the global config file in the section `[scihub.domain.updater.crawl]` with the key of `scihub_domain_source`. You could use `scidownl config --location` to show the location of the global config file and edit it.\n\n    ```ini\n    ; Global config file: global.ini\n    ; ...\n    [scihub.domain.updater.crawl]\n    scihub_domain_source = http://tool.yovisun.com/scihub\n    ; ...\n    ```\n\n\tAn example of using `crawl` mode:\n\n    ```bash\n    $ scidownl domain.update --mode crawl\n    [INFO] | 2022/03/07 21:07:50 | Found 6 valid SciHub domains in total: ['http://sci-hub.ru', 'http://sci-hub.se', 'https://sci-hub.ru', 'https://sci-hub.st', 'http://sci-hub.st', 'https://sci-hub.se']\n    [INFO] | 2022/03/07 21:07:50 | Saved 6 SciHub domains to local db.\n    ```\n\n-   `search`：Generate combinations according to the rules of SciHub domains and search for available SciHub domains. This will take longer than `crawl` mode.\n\n\tAn example of using `search` mode:\n\n    ```bash\n    $ scidownl domain.update --mode search\n    [INFO] | 2022/03/07 21:08:44 | # Search valid SciHub domains from 1352 urls\n    [INFO] | 2022/03/07 21:08:48 | # Found a SciHub domain url: https://sci-hub.ru\n    [INFO] | 2022/03/07 21:08:48 | # Found a SciHub domain url: https://sci-hub.st\n    ...\n    [INFO] | 2022/03/07 21:09:04 | Found 6 valid SciHub domains in total: ['https://sci-hub.ru', 'https://sci-hub.st', ...]\n    [INFO] | 2022/03/07 21:09:04 | Saved 6 SciHub domains to local db.\n    ```\n\n### 2. List all saved SciHub domains\n\nSciDownl use [SQLite](https://www.sqlite.org/) as the local database to store all updated SciHub domains locally. You can list all saved SciHub domains with the command `domain.list`.\n\n```bash\n$ scidownl domain.list\n+--------------------+----------------+---------------+\n| Url                |   SuccessTimes |   FailedTimes |\n|--------------------+----------------+---------------|\n| http://sci-hub.ru  |              0 |             0 |\n| https://sci-hub.ru |              0 |             0 |\n| https://sci-hub.st |              0 |             0 |\n| http://sci-hub.st  |              0 |             0 |\n| https://sci-hub.se |              0 |             0 |\n| http://sci-hub.se  |              0 |             0 |\n+--------------------+----------------+---------------+\n```\n\nIn addition to the easy-to-understand Url column, the `SuccessTimes` column is used to record the number of successful paper downloads using this Url, and the `FailedTimes` column is used to record the number of failed paper downloads using this Url. These two columns are used to calculate the priority of choosing a SciHub domain when downloading papers.\n\n### 3. Download papers\n\n```\n$ scidownl download --help\nUsage: scidownl download [OPTIONS]\n\n  Download paper(s) by DOI or PMID.\n\nOptions:\n  -d, --doi TEXT         DOI string. Specifying multiple DOIs is supported,\n                         e.g., --doi FIRST_DOI --doi SECOND_DOI ...\n  -p, --pmid INTEGER     PMID numbers. Specifying multiple PMIDs is supported,\n                         e.g., --pmid FIRST_PMID --pmid SECOND_PMID ...\n  -t, --title TEXT       Title string. Specifying multiple titles is\n                         supported, e.g., --title FIRST_TITLE --title\n                         SECOND_TITLE ...\n  -o, --out TEXT         Output directory or file path, which could be an\n                         absolute path or a relative path. Output directory\n                         examples: /absolute/path/to/download/,\n                         ./relative/path/to/download/, Output file examples:\n                         /absolute/dir/paper.pdf, ../relative/dir/paper.pdf.\n                         If --out is not specified, paper will be downloaded\n                         to the current directory with the file name of the\n                         paper's title. If multiple DOIs or multiple PMIDs are\n                         provided, the --out option is always considered as\n                         the output directory, rather than the output file\n                         path.\n  -u, --scihub-url TEXT  Scihub domain url. If not specified, automatically\n                         choose one from local saved domains. It's recommended\n                         to leave this option empty.\n  -h, --help             Show this message and exit.\n```\n\n#### Download papers with DOI(s), PMID(s) or TITLE(s)\n\nUsing option `-d` or `--doi` to download papers with DOI, option `-p` or `--pmid` to download papers with PMID, \nand option `-t` or `--title` to download papers with titles. You can specify these options for multiple times, and even mix of them.\n\n```bash\n# with a single DOI\n$ scidownl download --doi https://doi.org/10.1145/3375633\n\n# with multiple DOIs\n$ scidownl download --doi https://doi.org/10.1145/3375633 --doi https://doi.org/10.1145/2785956.2787496\n\n# with a single PMID\n$ scidownl download --pmid 31395057\n\n# with multiple PMIDs\n$ scidownl download --pmid 31395057 --pmid 24686414\n\n# with a single title\n$ scidownl download --title \"ImageNet Classification with Deep Convolutional Neural Networks\"\n\n# with multiple titles\n$ scidownl download --title \"ImageNet Classification with Deep Convolutional Neural Networks\" --title \"Aggregated residual transformations for deep neural networks\"\n\n# with a mix of DOIs and PMIDs\n$ scidownl download --doi https://doi.org/10.1145/3375633 --pmid 31395057 --pmid 24686414\n```\n\n#### Customize the output location of papers\n\nBy default, the downloaded paper is named by the paper's title. With option `-o` or `--out`，you can customize the output location of downloaded papers, whcih could be an absolute path or a relative path, and a direcotry or a file path.\n\n-   Output the paepr to a directory:\n\n    ```bash\n    $ scidownl download --pmid 31395057 --out /absolute/path/of/a/directory/\n    # NOTE that the '/' at the end of the directory path is required, otherwise the last segment will be treated as the filename rather than a directory.\n    \n    $ scidownl download --pmid 31395057 --out ../relative/path/of/a/directory/\n    # The '/' at the end of the directory path is required too.\n    ```\n\n-   Output the paper with the file path.\n\n    ```bash\n    $ scidownl download --pmid 31395057 --out /absolute/dir/paper.pdf\n    $ scidownl download --pmid 31395057 --out ../relative/dir/paper.pdf\n    $ scidownl download --pmid 31395057 --out relative/dir/paper.pdf\n    $ scidownl download --pmid 31395057 --out paper  # will be downlaoded as ./paper.pdf\n    ```\n\n**NOTE** that if there are more than one papers to be downloaded, the value of the `--out` option will always be considered as a directory, rather than a file path.\n\n```bash\n$ scidownl download --pmid 31395057 --pmid 24686414 --out paper\n# will be downloaded to ./paper/ directory:\n#  ./paper/\u003cpaper-title-1\u003e.pdf\n#  ./paper/\u003cpaper-title-2\u003e.pdf\n```\n\nIf some directories in the option are not exist, SciDownl will create them for you :smile:.\n\n#### Use a specific SciHub url\n\nWith option `-u` or `--scihub-url`, you could use a specific SciHub url you want, rather than let SciDownl automatically choose one for you from local saved SciHub domains. It's recommended to let SciDownl choose a SciHub url, so you don't need to use this option in normal use.\n\n```bash\n$ scidownl download --pmid 31395057 --scihub-url http://sci-hub.se\n```\n\n## Module use\n\nYou could use `scihub_download` function to download papers.\n\n```python\nfrom scidownl import scihub_download\n\npaper = \"https://doi.org/10.1145/3375633\"\npaper_type = \"doi\"\nout = \"./paper/one_paper.pdf\"\nproxies = {\n    'http': 'socks5://127.0.0.1:7890'\n}\nscihub_download(paper, paper_type=paper_type, out=out, proxies=proxies)\n```\n\nMore examples could be seen in [examples](./example/simple.py).\n\n# LICENSE\n\nCopyright (c) 2022 tishacy.\n\nLicensed under the [MIT License](https://github.com/Tishacy/SciDownl/blob/v1.0/LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTishacy%2FSciDownl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTishacy%2FSciDownl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTishacy%2FSciDownl/lists"}