https://github.com/scrapegraphai/scrapebiblio

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/scrapegraphai/scrapebiblio
Owner: ScrapeGraphAI
License: mit
Created: 2024-09-08T17:00:28.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-12-06T12:39:38.000Z (about 1 year ago)
Last Synced: 2025-04-12T03:53:49.103Z (10 months ago)
Language: Python
Size: 165 MB
Stars: 5
Watchers: 1
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

README

          # ScrapeBiblio: PDF Reference Extraction and Verification Library

## Powered by Scrapegraphai

![ScrapeBiblio Logo](docs/scrapebiblio.png)

[![Downloads](https://static.pepy.tech/badge/scrapebiblio)](https://pepy.tech/project/scrapebiblio)

ScrapeBiblio is a powerful library designed to extract references from PDF files, verify them against various databases, and convert the content to Markdown format.

## News 📰

- ScrapegraphAI has now his APIs! Check it out [here](https://scrapegraphai.com)!

## Features

- Extract text from PDF files

- Extract references using OpenAI's GPT models

- Verify references using Semantic Scholar, CORE, and BASE databases

- Convert PDF content to Markdown format

- Integration with ScrapeGraph for additional reference checking

## Installation

Install ScrapeBiblio using pip:

```bash

pip install scrapebiblio

```

## Configuration

Create a `.env` file in your project root with the following content:

```plaintext

OPENAI_API_KEY=your_openai_api_key

SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key

CORE_API_KEY=your_core_api_key

BASE_API_KEY=your_base_api_key

```

## Usage

Here's a basic example of how to use ScrapeBiblio:

```python

from scrapebiblio.core.find_reference import process_pdf

from dotenv import load_dotenv

import os

load_dotenv()

pdf_path = 'path/to/your/pdf/file.pdf'

output_path = 'references.md'

openai_api_key = os.getenv('OPENAI_API_KEY')

semantic_scholar_api_key = os.getenv('SEMANTIC_SCHOLAR_API_KEY')

core_api_key = os.getenv('CORE_API_KEY')

base_api_key = os.getenv('BASE_API_KEY')

process_pdf(pdf_path, output_path, openai_api_key, semantic_scholar_api_key,

core_api_key=core_api_key, base_api_key=base_api_key)

```

## Advanced Usage

ScrapeBiblio offers additional functionalities:

1. Convert PDF to Markdown:

```python

from scrapebiblio.core.convert_to_md import convert_to_md

convert_to_md(pdf_path, output_path, openai_api_key)

```

2. Check references with ScrapeGraph:

```python

from scrapebiblio.utils.api.reference_utils import check_reference_with_scrapegraph

result = check_reference_with_scrapegraph("Reference Title")

```

## Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for more details.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/scrapegraphai/scrapebiblio

Awesome Lists containing this project

README