{"id":18745648,"url":"https://github.com/scrapegraphai/scrapeschema-demo","last_synced_at":"2025-07-24T14:36:00.512Z","repository":{"id":254986046,"uuid":"847035105","full_name":"ScrapeGraphAI/ScrapeSchema-demo","owner":"ScrapeGraphAI","description":"ScrapeSchema: AI-Powered Entity and Schema Generation from documents","archived":false,"fork":false,"pushed_at":"2024-08-30T17:22:27.000Z","size":544,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-04-12T11:59:57.119Z","etag":null,"topics":["automatic-ontology","extract-pdf-data","json","ontologies","ontology-engineering","pdf","schema"],"latest_commit_sha":null,"homepage":"https://scrapegraphai.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ScrapeGraphAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-24T16:57:23.000Z","updated_at":"2025-03-28T20:44:18.000Z","dependencies_parsed_at":"2024-09-13T00:05:14.823Z","dependency_job_id":null,"html_url":"https://github.com/ScrapeGraphAI/ScrapeSchema-demo","commit_stats":null,"previous_names":["scrapegraphai/scrapeschema"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScrapeGraphAI%2FScrapeSchema-demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScrapeGraphAI%2FScrapeSchema-demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScrapeGraphAI%2FScrapeSchema-demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScrapeGraphAI%2FScrapeSchema-demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ScrapeGraphAI","download_url":"https://codeload.github.com/ScrapeGraphAI/ScrapeSchema-demo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248636347,"owners_count":21137433,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automatic-ontology","extract-pdf-data","json","ontologies","ontology-engineering","pdf","schema"],"created_at":"2024-11-07T16:19:00.830Z","updated_at":"2025-04-12T21:32:55.261Z","avatar_url":"https://github.com/ScrapeGraphAI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ScrapeSchema\n\nScrapeSchema is a Python-based tool designed to extract entities and their associated schema from PDF files. This tool is particularly useful for those who need to analyze and organize the structure of data embedded within PDFs, enabling efficient data extraction for further processing or analysis.\n\n## Features\n\n- **Entity Extraction**: Automatically identifies and extracts entities from PDF files.\n- **Schema Generation**: Constructs a schema based and structure of the extracted entities.\n- **Visualization**: Leverages Graphviz to visualize the extracted schema.\n\n## Official streamlit demo:\n\n[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapeschema.streamlit.app)\n## Quick Start\n\n### Prerequisites\n\nBefore you begin, ensure you have the following installed on your system:\n\n- **Python**: Make sure Python 3.9+ is installed.\n- **Graphviz**: This tool is necessary for visualizing the extracted schema.\n\n#### MacOS Installation\n\nTo install Graphviz on MacOS, use the following command:\n\n```bash\nbrew install graphviz\n```\n\n#### Linux Installation\n\nTo install Graphviz on Linux, use the following command:\n\n```bash\nsudo apt install graphviz\n```\n#### Usage\nAfter installing the prerequisites and dependencies, you can start using ScrapeSchema to extract entities and their schema from PDFs.\n\nHere’s a basic example:\n```bash\ngit clone https://github.com/ScrapeGraphAI/ScrapeSchema\ncd ./ScrapeSchema\npip install -r requirements.txt\nstreamlit run main.py\n```\n\n## Output\n```json\n{\n  \"ROOT\": {\n    \"portfolio\": {\n      \"type\": \"object\",\n      \"properties\": {\n        \"name\": {\n          \"type\": \"string\"\n        },\n        \"series\": {\n          \"type\": \"string\"\n        },\n        \"fees\": {\n          \"type\": \"object\",\n          \"properties\": {\n            \"salesCharges\": {\n              \"type\": \"string\"\n            },\n            \"fundExpenses\": {\n              \"type\": \"object\",\n              \"properties\": {\n                \"managementExpenseRatio\": {\n                  \"type\": \"string\"\n                },\n                \"tradingExpenseRatio\": {\n                  \"type\": \"string\"\n                },\n                \"totalExpenses\": {\n                  \"type\": \"string\"\n                }\n              }\n            },\n            \"trailingCommissions\": {\n              \"type\": \"string\"\n            }\n          }\n        },\n        \"withdrawalRights\": {\n          \"type\": \"object\",\n          \"properties\": {\n            \"timeLimit\": {\n              \"type\": \"string\"\n            },\n            \"conditions\": {\n              \"type\": \"array\",\n              \"items\": {\n                \"type\": \"string\"\n              }\n            }\n          }\n        },\n        \"contactInformation\": {\n          \"type\": \"object\",\n          \"properties\": {\n            \"companyName\": {\n              \"type\": \"string\"\n            },\n            \"address\": {\n              \"type\": \"string\"\n            },\n            \"phone\": {\n              \"type\": \"string\"\n            },\n            \"email\": {\n              \"type\": \"string\"\n            },\n            \"website\": {\n              \"type\": \"string\"\n            }\n          }\n        },\n        \"yearByYearReturns\": {\n          \"type\": \"array\",\n          \"items\": {\n            \"type\": \"object\",\n            \"properties\": {\n              \"year\": {\n                \"type\": \"string\"\n              },\n              \"return\": {\n                \"type\": \"string\"\n              }\n            }\n          }\n        },\n        \"bestWorstReturns\": {\n          \"type\": \"array\",\n          \"items\": {\n            \"type\": \"object\",\n            \"properties\": {\n              \"type\": {\n                \"type\": \"string\"\n              },\n              \"return\": {\n                \"type\": \"string\"\n              },\n              \"date\": {\n                \"type\": \"string\"\n              },\n              \"investmentValue\": {\n                \"type\": \"string\"\n              }\n            }\n          }\n        },\n        \"averageReturn\": {\n          \"type\": \"string\"\n        },\n        \"targetInvestors\": {\n          \"type\": \"array\",\n          \"items\": {\n            \"type\": \"string\"\n          }\n        },\n        \"taxInformation\": {\n          \"type\": \"string\"\n        }\n      }\n    }\n  }\n}\n```\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://i.ibb.co/7RPpsjV/temp.png\" alt=\"example\"\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapegraphai%2Fscrapeschema-demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscrapegraphai%2Fscrapeschema-demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapegraphai%2Fscrapeschema-demo/lists"}