https://github.com/scrapegraphai/scrapebiblio
https://github.com/scrapegraphai/scrapebiblio
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/scrapegraphai/scrapebiblio
- Owner: ScrapeGraphAI
- License: mit
- Created: 2024-09-08T17:00:28.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-06T12:39:38.000Z (about 1 year ago)
- Last Synced: 2025-04-12T03:53:49.103Z (10 months ago)
- Language: Python
- Size: 165 MB
- Stars: 5
- Watchers: 1
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# ScrapeBiblio: PDF Reference Extraction and Verification Library
## Powered by Scrapegraphai

[](https://pepy.tech/project/scrapebiblio)
ScrapeBiblio is a powerful library designed to extract references from PDF files, verify them against various databases, and convert the content to Markdown format.
## News 📰
- ScrapegraphAI has now his APIs! Check it out [here](https://scrapegraphai.com)!
## Features
- Extract text from PDF files
- Extract references using OpenAI's GPT models
- Verify references using Semantic Scholar, CORE, and BASE databases
- Convert PDF content to Markdown format
- Integration with ScrapeGraph for additional reference checking
## Installation
Install ScrapeBiblio using pip:
```bash
pip install scrapebiblio
```
## Configuration
Create a `.env` file in your project root with the following content:
```plaintext
OPENAI_API_KEY=your_openai_api_key
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key
CORE_API_KEY=your_core_api_key
BASE_API_KEY=your_base_api_key
```
## Usage
Here's a basic example of how to use ScrapeBiblio:
```python
from scrapebiblio.core.find_reference import process_pdf
from dotenv import load_dotenv
import os
load_dotenv()
pdf_path = 'path/to/your/pdf/file.pdf'
output_path = 'references.md'
openai_api_key = os.getenv('OPENAI_API_KEY')
semantic_scholar_api_key = os.getenv('SEMANTIC_SCHOLAR_API_KEY')
core_api_key = os.getenv('CORE_API_KEY')
base_api_key = os.getenv('BASE_API_KEY')
process_pdf(pdf_path, output_path, openai_api_key, semantic_scholar_api_key,
core_api_key=core_api_key, base_api_key=base_api_key)
```
## Advanced Usage
ScrapeBiblio offers additional functionalities:
1. Convert PDF to Markdown:
```python
from scrapebiblio.core.convert_to_md import convert_to_md
convert_to_md(pdf_path, output_path, openai_api_key)
```
2. Check references with ScrapeGraph:
```python
from scrapebiblio.utils.api.reference_utils import check_reference_with_scrapegraph
result = check_reference_with_scrapegraph("Reference Title")
```
## Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for more details.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.