https://github.com/fer-aguirre/pdf-2-ner

Web application for information extraction and named entity recognition for PDF files (work-in-progress).
https://github.com/fer-aguirre/pdf-2-ner

named-entity-recognition nlp pdf2text streamlit text-analysis

Last synced: 20 days ago
JSON representation

Web application for information extraction and named entity recognition for PDF files (work-in-progress).

Host: GitHub
URL: https://github.com/fer-aguirre/pdf-2-ner
Owner: fer-aguirre
License: mit
Created: 2023-01-20T22:53:34.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-07-25T21:14:15.000Z (almost 3 years ago)
Last Synced: 2025-07-09T17:05:57.251Z (about 1 year ago)
Topics: named-entity-recognition, nlp, pdf2text, streamlit, text-analysis
Language: Jupyter Notebook
Homepage:
Size: 294 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# PDF 2 NER
Web application to convert scanned PDF files to text-based data and apply Named Entity Recognition (NER) to extract entities in Spanish

Created by: Fer Aguirre

## Directory Structure
```
├── app.py
├── assets
│   └── pdfs
├── config.ini
├── config.ini.secret
├── data
│   ├── processed
│   └── raw
├── docs
│   ├── data-dictionary.md
│   ├── explore-data.md
│   ├── references
│   └── reports
├── LICENSE
├── notebooks
│   ├── 0.0-testing-nlp-models.ipynb
│   ├── 1.0-scraping-data.ipynb
│   └── 2.0-analyzing-data.ipynb
├── outputs
│   ├── figures
│   └── tables
├── pdf_2_ner
│   ├── data
│   ├── __init__.py
│   └── utils
├── Pipfile
├── Pipfile.lock
├── README.md
└── setup.py
```
---

## License

This project is released under [MIT License](/LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fer-aguirre/pdf-2-ner

Awesome Lists containing this project

README