An open API service indexing awesome lists of open source software.

https://github.com/fer-aguirre/pdf-2-ner

Web application for information extraction and named entity recognition for PDF files (work-in-progress).
https://github.com/fer-aguirre/pdf-2-ner

named-entity-recognition nlp pdf2text streamlit text-analysis

Last synced: about 2 months ago
JSON representation

Web application for information extraction and named entity recognition for PDF files (work-in-progress).

Awesome Lists containing this project

README

          

# PDF 2 NER
Web application to convert scanned PDF files to text-based data and apply Named Entity Recognition (NER) to extract entities in Spanish

Created by: Fer Aguirre

## Directory Structure
```
├── app.py
├── assets
│   └── pdfs
├── config.ini
├── config.ini.secret
├── data
│   ├── processed
│   └── raw
├── docs
│   ├── data-dictionary.md
│   ├── explore-data.md
│   ├── references
│   └── reports
├── LICENSE
├── notebooks
│   ├── 0.0-testing-nlp-models.ipynb
│   ├── 1.0-scraping-data.ipynb
│   └── 2.0-analyzing-data.ipynb
├── outputs
│   ├── figures
│   └── tables
├── pdf_2_ner
│   ├── data
│   ├── __init__.py
│   └── utils
├── Pipfile
├── Pipfile.lock
├── README.md
└── setup.py
```
---

## License

This project is released under [MIT License](/LICENSE).