Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rpidanny/google-scholar
A minimal TypeScript library for fetching and parsing Google Scholar pages.
https://github.com/rpidanny/google-scholar
google-scholar research research-paper typescript
Last synced: 17 days ago
JSON representation
A minimal TypeScript library for fetching and parsing Google Scholar pages.
- Host: GitHub
- URL: https://github.com/rpidanny/google-scholar
- Owner: rpidanny
- License: mit
- Created: 2024-06-07T19:06:36.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-10-04T06:35:04.000Z (4 months ago)
- Last Synced: 2024-12-29T14:29:42.427Z (22 days ago)
- Topics: google-scholar, research, research-paper, typescript
- Language: HTML
- Homepage:
- Size: 628 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Google Scholar 👩🏻🏫
A minimal TypeScript library for fetching and parsing Google Scholar pages. This library allows users to search Google Scholar using any keywords or a direct Google Scholar URL.
## Installation
```bash
npm i @rpidanny/google-scholar
```## Usage
### Basic Usage
For basic usage, you can use the helper functions `search` and `parseUrl` provided by the library. These functions are straightforward and easy to use.
```typescript
import { iteratePages, parseUrl, search } from '@rpidanny/google-scholar'const searchOpts = {
keywords: 'crispr cas9',
yearLow: 2000, // [Optional] paper published after
yearHigh: 2024, // [Optional] paper published before
authors: ['JA Doudna', 'E Charpentier'], // [Optional] Papers from authors
}// Get the 1st page content for a search option
const pageContent = await search(searchOpts)
console.log(JSON.stringify(pageContent, null, 2))// Parse page using url
const pageContent2 = await parseUrl('https://scholar.google.com/scholar?q=crispr+cas9&hl=en')
console.log(JSON.stringify(pageContent2, null, 2))// Iterate over all available pages
await iteratePages(searchOpts, pageContent => JSON.stringify(pageContent, null, 2))
```### Advanced Usage
For more advanced usage and more control over the querying process, you can use the `GoogleScholar` class directly.
```ts
import { GoogleScholar } from 'google-scholar'
import { WebClient } from './web-client' // Make sure to implement the IWebClient interfaceconst webClient = new WebClient()
const googleScholar = new GoogleScholar(webClient)async function searchGoogleScholar(keywords: string) {
const results = await googleScholar.search({ keywords })
console.log(results)
}searchGoogleScholar('crispr cas9')
```> Tip: Utilize [Odysseus](https://github.com/rpidanny/odysseus) as a WebClient to handle Google Captcha. It opens pages in a browser, allowing human solving of captchas to seamlessly continue the scraping process.
### Example Response
```json
{
"papers": [
{
"title": "CRISPR–Cas9 structures and mechanisms",
"url": "https://www.annualreviews.org/doi/abs/10.1146/annurev-biophys-062215-010822",
"description": "… and Cas9 orthologs have contributed greatly to our understanding of CRISPR–Cas9 mechanisms. In this review, we briefly explain the biology underlying CRISPR–Cas9 technology …",
"source": {
"type": "pdf",
"url": "https://www.annualreviews.org/doi/pdf/10.1146/annurev-biophys-062215-010822"
},
"authors": [
{
"name": "F Jiang",
"url": "https://scholar.google.com/citations?user=gt-dzeEAAAAJ&hl=en&oi=sra"
},
{
"name": "JA Doudna",
"url": "https://scholar.google.com/citations?user=YO5XSXwAAAAJ&hl=en&oi=sra"
}
],
"citation": {
"count": 2020,
"url": "https://scholar.google.com/scholar?cites=2456688039791281496&as_sdt=2005&sciodt=0,5&hl=en"
},
"relatedArticlesUrl": "https://scholar.google.com/scholar?q=related:WMW1j8HoFyIJ:scholar.google.com/&scioq=crispr+cas9&hl=en&as_sdt=0,5"
}
],
"totalPapers": 594000,
"nextUrl": "https://scholar.google.com/scholar?start=10&q=crispr+cas9&hl=en&as_sdt=0,5",
"prevUrl": null
}
```## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.