{"id":18744543,"url":"https://github.com/avoss84/pdf_extract","last_synced_at":"2026-05-05T15:34:25.775Z","repository":{"id":65544871,"uuid":"582636757","full_name":"AVoss84/pdf_extract","owner":"AVoss84","description":"Text classification based on PDF inputs","archived":false,"fork":false,"pushed_at":"2023-04-30T13:31:21.000Z","size":907,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-12-28T20:12:51.468Z","etag":null,"topics":["classification","fastapi","nlp","python","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AVoss84.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-12-27T12:41:00.000Z","updated_at":"2023-08-02T22:50:36.000Z","dependencies_parsed_at":"2024-01-24T13:08:27.604Z","dependency_job_id":null,"html_url":"https://github.com/AVoss84/pdf_extract","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AVoss84%2Fpdf_extract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AVoss84%2Fpdf_extract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AVoss84%2Fpdf_extract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AVoss84%2Fpdf_extract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AVoss84","download_url":"https://codeload.github.com/AVoss84/pdf_extract/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239627267,"owners_count":19670844,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","fastapi","nlp","python","streamlit"],"created_at":"2024-11-07T16:15:13.480Z","updated_at":"2025-11-22T15:30:16.793Z","avatar_url":"https://github.com/AVoss84.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text classification based on PDF input data\r\n\r\n## Package structure\r\n\r\n```\r\n.\r\n├── environment.yml\r\n├── logs\r\n├── main.py\r\n├── README.md\r\n├── requirements.txt\r\n├── src\r\n│   ├── __init__.py\r\n│   ├── notebooks\r\n│   │   ├── fasttext_classifier.ipynb\r\n│   │   └── naivebayes_classifier.ipynb\r\n│   ├── pdf_extract\r\n│   │   ├── config\r\n│   │   ├── data\r\n│   │   ├── resources\r\n│   │   ├── services\r\n│   │   └── utils\r\n│   ├── setup.py\r\n│   └── templates\r\n└── stream_app.py\r\n```\r\n\r\n\r\n## Package installation\r\n\r\nCreate conda virtual environment with required packages \r\n```bash\r\nconda env create -f environment.yml \r\nconda activate env_pdf\r\n```\r\n\r\nInstall your package\r\n```bash\r\npython -m spacy download en_core_web_lg\r\npython -m spacy download de_core_news_lg      # install large word embeddings\r\npip install -e src\r\n``` \r\n\r\nStart REST API locally:\r\n```bash\r\nuvicorn main:app --reload --port 5000         # checkout Swagger docs: http://127.0.0.1:5000/docs \r\n``` \r\n\r\nStart streamlit app locally:\r\n```bash\r\nstreamlit run stream_app.py     \r\n``` \r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favoss84%2Fpdf_extract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Favoss84%2Fpdf_extract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favoss84%2Fpdf_extract/lists"}