https://github.com/fer-aguirre/pdf-2-ner
Web application for information extraction and named entity recognition for PDF files (work-in-progress).
https://github.com/fer-aguirre/pdf-2-ner
named-entity-recognition nlp pdf2text streamlit text-analysis
Last synced: about 2 months ago
JSON representation
Web application for information extraction and named entity recognition for PDF files (work-in-progress).
- Host: GitHub
- URL: https://github.com/fer-aguirre/pdf-2-ner
- Owner: fer-aguirre
- License: mit
- Created: 2023-01-20T22:53:34.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-25T21:14:15.000Z (over 2 years ago)
- Last Synced: 2025-07-09T17:05:57.251Z (4 months ago)
- Topics: named-entity-recognition, nlp, pdf2text, streamlit, text-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 294 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PDF 2 NER
Web application to convert scanned PDF files to text-based data and apply Named Entity Recognition (NER) to extract entities in Spanish
Created by: Fer Aguirre
## Directory Structure
```
├── app.py
├── assets
│ └── pdfs
├── config.ini
├── config.ini.secret
├── data
│ ├── processed
│ └── raw
├── docs
│ ├── data-dictionary.md
│ ├── explore-data.md
│ ├── references
│ └── reports
├── LICENSE
├── notebooks
│ ├── 0.0-testing-nlp-models.ipynb
│ ├── 1.0-scraping-data.ipynb
│ └── 2.0-analyzing-data.ipynb
├── outputs
│ ├── figures
│ └── tables
├── pdf_2_ner
│ ├── data
│ ├── __init__.py
│ └── utils
├── Pipfile
├── Pipfile.lock
├── README.md
└── setup.py
```
---
## License
This project is released under [MIT License](/LICENSE).