Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/crmne/googlescholarscraper
A scraper for Google Scholar, written in Python
https://github.com/crmne/googlescholarscraper
Last synced: 2 months ago
JSON representation
A scraper for Google Scholar, written in Python
- Host: GitHub
- URL: https://github.com/crmne/googlescholarscraper
- Owner: crmne
- License: bsd-2-clause
- Created: 2017-09-25T22:06:48.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2023-08-14T21:15:27.000Z (over 1 year ago)
- Last Synced: 2024-10-16T05:14:47.561Z (3 months ago)
- Language: Python
- Homepage:
- Size: 175 KB
- Stars: 6
- Watchers: 3
- Forks: 1
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
GoogleScholarScraper
====================GoogleScholarScraper is a [Scrapy][] project that implements a scraper for Google Scholar.
Features
--------* Extracts Authors, Title, Year, Journal, and Url.
* Exports to CSV, JSON and BibTeX.
* Cookie and referer support for higher query volumes.
* Optimistically tries the next page in case of server errors.
* Supports the full Google Scholar query syntax for authors, title, exclusions, inclusions, etc. Check out those [search tips].Installation
------------```bash
poetry install
```Usage
-----```bash
poetry shell
export QUERY="your query here"
export START=900 # optional: to start at page 90
make .csv # or
make .json # or
make .bib
```Example
-------```bash
export QUERY="author:einstein quantum theory"
unset START # makes sure it starts from the beginning
make einstein_quantum.bib
```Development
-----------Before coding away, just:
```bash
poetry install
poetry shell
pre-commit install
```License
-------GoogleScholarScraper is using the standard [BSD 2-Clause "Simplified" License](http://opensource.org/licenses/BSD-2-Clause).
[Scrapy]: https://scrapy.org/
[search tips]: http://www.otago.ac.nz/library/pdf/Google_Scholar_Tips.pdf