https://github.com/mazzasaverio/pipeline-docs-data-extractor
(Let's build a) Robust pipeline for extracting structured data from various documents
https://github.com/mazzasaverio/pipeline-docs-data-extractor
airflow data-engineer data-engineering etl-pipeline large-language-models pdf-text-extraction unstructured
Last synced: 8 months ago
JSON representation
(Let's build a) Robust pipeline for extracting structured data from various documents
- Host: GitHub
- URL: https://github.com/mazzasaverio/pipeline-docs-data-extractor
- Owner: mazzasaverio
- Created: 2024-01-04T18:13:54.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-05-04T14:06:37.000Z (almost 2 years ago)
- Last Synced: 2024-05-04T20:53:38.538Z (almost 2 years ago)
- Topics: airflow, data-engineer, data-engineering, etl-pipeline, large-language-models, pdf-text-extraction, unstructured
- Language: Python
- Homepage:
- Size: 88.9 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md