Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scrapegraphai/scrapeschema-demo
ScrapeSchema: AI-Powered Entity and Schema Generation from documents
https://github.com/scrapegraphai/scrapeschema-demo
automatic-ontology extract-pdf-data json ontologies ontology-engineering pdf schema
Last synced: about 2 months ago
JSON representation
ScrapeSchema: AI-Powered Entity and Schema Generation from documents
- Host: GitHub
- URL: https://github.com/scrapegraphai/scrapeschema-demo
- Owner: ScrapeGraphAI
- License: agpl-3.0
- Created: 2024-08-24T16:57:23.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-30T17:22:27.000Z (4 months ago)
- Last Synced: 2024-09-15T03:57:51.961Z (3 months ago)
- Topics: automatic-ontology, extract-pdf-data, json, ontologies, ontology-engineering, pdf, schema
- Language: Python
- Homepage:
- Size: 531 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ScrapeSchema
ScrapeSchema is a Python-based tool designed to extract entities and their associated schema from PDF files. This tool is particularly useful for those who need to analyze and organize the structure of data embedded within PDFs, enabling efficient data extraction for further processing or analysis.
## Features
- **Entity Extraction**: Automatically identifies and extracts entities from PDF files.
- **Schema Generation**: Constructs a schema based and structure of the extracted entities.
- **Visualization**: Leverages Graphviz to visualize the extracted schema.## Official streamlit demo:
[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapeschema.streamlit.app)
## Quick Start### Prerequisites
Before you begin, ensure you have the following installed on your system:
- **Python**: Make sure Python 3.9+ is installed.
- **Graphviz**: This tool is necessary for visualizing the extracted schema.#### MacOS Installation
To install Graphviz on MacOS, use the following command:
```bash
brew install graphviz
```#### Linux Installation
To install Graphviz on Linux, use the following command:
```bash
sudo apt install graphviz
```
#### Usage
After installing the prerequisites and dependencies, you can start using ScrapeSchema to extract entities and their schema from PDFs.Here’s a basic example:
```bash
git clone https://github.com/ScrapeGraphAI/ScrapeSchema
cd ./ScrapeSchema
pip install -r requirements.txt
streamlit run main.py
```## Output
```json
{
"ROOT": {
"portfolio": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"series": {
"type": "string"
},
"fees": {
"type": "object",
"properties": {
"salesCharges": {
"type": "string"
},
"fundExpenses": {
"type": "object",
"properties": {
"managementExpenseRatio": {
"type": "string"
},
"tradingExpenseRatio": {
"type": "string"
},
"totalExpenses": {
"type": "string"
}
}
},
"trailingCommissions": {
"type": "string"
}
}
},
"withdrawalRights": {
"type": "object",
"properties": {
"timeLimit": {
"type": "string"
},
"conditions": {
"type": "array",
"items": {
"type": "string"
}
}
}
},
"contactInformation": {
"type": "object",
"properties": {
"companyName": {
"type": "string"
},
"address": {
"type": "string"
},
"phone": {
"type": "string"
},
"email": {
"type": "string"
},
"website": {
"type": "string"
}
}
},
"yearByYearReturns": {
"type": "array",
"items": {
"type": "object",
"properties": {
"year": {
"type": "string"
},
"return": {
"type": "string"
}
}
}
},
"bestWorstReturns": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string"
},
"return": {
"type": "string"
},
"date": {
"type": "string"
},
"investmentValue": {
"type": "string"
}
}
}
},
"averageReturn": {
"type": "string"
},
"targetInvestors": {
"type": "array",
"items": {
"type": "string"
}
},
"taxInformation": {
"type": "string"
}
}
}
}
}
```