{"id":30774201,"url":"https://github.com/meetptl04/pretox-classifier","last_synced_at":"2026-04-17T05:02:43.225Z","repository":{"id":312596737,"uuid":"1039084404","full_name":"meetptl04/pretox-classifier","owner":"meetptl04","description":"BioBERT-based text classifier to predict PRETOX-related text (PRETOX_REL vs NO_PRETOX_REL).","archived":false,"fork":false,"pushed_at":"2025-08-31T18:37:18.000Z","size":536,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-31T20:33:43.549Z","etag":null,"topics":["biobert","biomedical","deep-learning","huggingface","machine-learning","nlp","pytorch","streamlit","text-classification","transformer-model"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/meetptl04.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-16T12:55:25.000Z","updated_at":"2025-08-31T18:42:26.000Z","dependencies_parsed_at":"2025-09-29T07:01:15.468Z","dependency_job_id":null,"html_url":"https://github.com/meetptl04/pretox-classifier","commit_stats":null,"previous_names":["meetptl04/pretox-classifier"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/meetptl04/pretox-classifier","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meetptl04%2Fpretox-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meetptl04%2Fpretox-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meetptl04%2Fpretox-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meetptl04%2Fpretox-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/meetptl04","download_url":"https://codeload.github.com/meetptl04/pretox-classifier/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meetptl04%2Fpretox-classifier/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31915900,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-16T18:22:33.417Z","status":"online","status_checked_at":"2026-04-17T02:00:06.879Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biobert","biomedical","deep-learning","huggingface","machine-learning","nlp","pytorch","streamlit","text-classification","transformer-model"],"created_at":"2025-09-05T02:51:02.871Z","updated_at":"2026-04-17T05:02:43.219Z","avatar_url":"https://github.com/meetptl04.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PRETOX Text Classifier with BioBERT\n![Python](https://img.shields.io/badge/Python-3.9+-blue?style=flat-square\u0026logo=python)\n![Streamlit](https://img.shields.io/badge/Streamlit-App-red?style=flat-square\u0026logo=streamlit)\n![BioBERT](https://img.shields.io/badge/Model-BioBERT-green?style=flat-square\u0026logo=semanticweb)\n![Accuracy](https://img.shields.io/badge/Accuracy-94%25-brightgreen?style=flat-square)\n![License](https://img.shields.io/badge/License-MIT-yellow?style=flat-square)\n\nA high-performance biomedical text classifier built with **BioBERT** to determine if a given text is related to **PRETOX** (Preclinical Toxicology). This tool is designed to help researchers efficiently sift through vast amounts of literature to find toxicology-relevant information.\n\n---\n\n## Key Features\n\n- **High Accuracy**: Achieves **94% accuracy** on the test dataset, ensuring reliable classification.\n- **Domain-Specific AI**: Utilizes **BioBERT**, a language model pre-trained on biomedical text, for nuanced understanding of scientific terminology.\n- **Interactive Interface**: A simple and intuitive web application built with Streamlit allows for easy, code-free use by researchers and domain experts.\n- **Reproducible**: The entire training and evaluation pipeline is available in a Jupyter Notebook for full transparency and custom experimentation.\n\n---\n\n## Live Demo\n\nHere is a preview of the interactive Streamlit application. Users can input text directly or use the provided examples to get an instant classification.\n\n![Streamlit App Demo](images/streamlit_app_image.png)\n\n---\n\n## Model Performance\n\nThe model demonstrates strong performance in distinguishing between PRETOX-related and non-related texts.\n\n**Final Test Accuracy**: **94%**\n\n### Classification Report\n\n| Class             | Precision | Recall | F1-Score |\n| ----------------- | :-------: | :----: | :------: |\n| `0 (NO_PRETOX_REL)` |   0.97    |  0.93  |   0.95   |\n| `1 (PRETOX_REL)`  |   0.91    |  0.96  |   0.93   |\n\n### Training \u0026 Validation Curves\n\nThe training history shows stable learning and good generalization from training to validation data.\n\n![Accuracy and Loss Curves](images/accuracy.png)\n\n---\n\n## Getting Started\n\nFollow these steps to set up and run the project locally.\n\n### 1. Clone the Repository\n```\ngit clone https://github.com/meetptl04/pretox-classifier.git\ncd pretox-classifier\n```\n\n### 2\\. Set Up a Virtual Environment\n\nIt is recommended to use a virtual environment to manage dependencies.\n\n  - **Linux / macOS:**\n    ```bash\n    python3 -m venv venv\n    source venv/bin/activate\n    ```\n  - **Windows:**\n    ```bash\n    python -m venv venv\n    venv\\Scripts\\activate\n    ```\n\n### 3\\. Install Dependencies\n\n```bash\npip install -r requirements.txt\n```\n\n### 4\\. Run the Streamlit App\n\n```bash\nstreamlit run app.py\n```\n\nNavigate to the local URL provided in your terminal to start using the application.\n\n-----\n\n## Model \u0026 Dataset\n\n  - **Model**: The classifier is a fine-tuned version of [**BioBERT (v1.1)**](https://huggingface.co/dmis-lab/biobert-v1.1), optimized for sequence classification. To review or retrain the model, please see the `NLP_BioBert_PRETOX_REL.ipynb` notebook.\n  - **Dataset**: The model was trained on the [**pretoxtm-dataset**](https://huggingface.co/datasets/javicorvi/pretoxtm-dataset) from Hugging Face, which contains biomedical text excerpts labeled for toxicology relevance.\n\n-----\n\n## Project Structure\n\n```\nBioBert_app/\n├── data/                    # Dataset files\n├── model/                   # Trained model checkpoints\n├── images/                  # Project images and screenshots\n│   ├── accuracy.png\n│   └── streamlit_app_image.png\n├── venv/                    # Virtual environment (ignored by git)\n├── app.py                   # Streamlit application source code\n├── requirements.txt         # Required Python packages\n└── NLP_BioBert_PRETOX_REL.ipynb # Notebook with training \u0026 evaluation code\n```\n\n-----\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeetptl04%2Fpretox-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmeetptl04%2Fpretox-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeetptl04%2Fpretox-classifier/lists"}