https://github.com/ryshaal/google-scholar-scraping
Google Scholar Scraper
https://github.com/ryshaal/google-scholar-scraping
article-extractor scraping-websites
Last synced: 10 months ago
JSON representation
Google Scholar Scraper
- Host: GitHub
- URL: https://github.com/ryshaal/google-scholar-scraping
- Owner: ryshaal
- Created: 2024-09-26T17:53:30.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-27T09:41:36.000Z (almost 2 years ago)
- Last Synced: 2025-03-16T11:43:33.260Z (over 1 year ago)
- Topics: article-extractor, scraping-websites
- Language: Python
- Homepage:
- Size: 14.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Google Scholar Scraper
This Python script allows you to scrape article data from Google Scholar based on a query provided in a text file. The script extracts information such as the article's title, authors, publication year, journal name, volume, citation count, and PDF links (if available). The results are saved as RIS files for easy citation management, and PDFs can be downloaded when available.
## Features
- Fetch articles from Google Scholar based on a query
- Extract information including:
- Title
- Authors
- Year of publication
- Journal name and volume
- PDF link (if available)
- Citation count
- Save article metadata as `.ris` files
- Download PDFs of articles when available
## How to Use
1. Clone this repository:
```bash
git clone https://github.com/ryshaal/Google-Scholar-Scraping/
```
2. Navigate to the project directory:
```bash
cd Google-Scholar-Scraping
```
3. Create a text file for your query:
In the `input_query` folder, create a file named `query.txt` and enter your search query.
4. Run the script:
```bash
python gscholar.py
```
5. Output:
The script will display the article information in the terminal and save the metadata as a `.ris` file in the `output` folder. If a PDF link is available, the script will download the PDF to the same folder.
**Example Query**
In the `input_query/query.txt` , you might add a search query like:
```plaintext
machine learning applications in education
```
## Requirements
- Python 3.x
- `requests`
- `beautifulsoup4`
You can install the required packages by running:
```bash
pip install -r requirements.txt