Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/AVoss84/pdf_extract
Text classification based on PDF inputs
https://github.com/AVoss84/pdf_extract
classification fastapi nlp python streamlit
Last synced: 3 months ago
JSON representation
Text classification based on PDF inputs
- Host: GitHub
- URL: https://github.com/AVoss84/pdf_extract
- Owner: AVoss84
- Created: 2022-12-27T12:41:00.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-04-30T13:31:21.000Z (almost 2 years ago)
- Last Synced: 2024-08-02T15:54:01.377Z (6 months ago)
- Topics: classification, fastapi, nlp, python, streamlit
- Language: Jupyter Notebook
- Homepage:
- Size: 886 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Text classification based on PDF input data
## Package structure
```
.
├── environment.yml
├── logs
├── main.py
├── README.md
├── requirements.txt
├── src
│ ├── __init__.py
│ ├── notebooks
│ │ ├── fasttext_classifier.ipynb
│ │ └── naivebayes_classifier.ipynb
│ ├── pdf_extract
│ │ ├── config
│ │ ├── data
│ │ ├── resources
│ │ ├── services
│ │ └── utils
│ ├── setup.py
│ └── templates
└── stream_app.py
```## Package installation
Create conda virtual environment with required packages
```bash
conda env create -f environment.yml
conda activate env_pdf
```Install your package
```bash
python -m spacy download en_core_web_lg
python -m spacy download de_core_news_lg # install large word embeddings
pip install -e src
```Start REST API locally:
```bash
uvicorn main:app --reload --port 5000 # checkout Swagger docs: http://127.0.0.1:5000/docs
```Start streamlit app locally:
```bash
streamlit run stream_app.py
```