{"id":22914544,"url":"https://github.com/5brian/aep-model","last_synced_at":"2025-07-25T14:09:59.188Z","repository":{"id":258895044,"uuid":"875335577","full_name":"5brian/aep-model","owner":"5brian","description":"Deep learning model using fine-tuned DistilBERT for classifying safety observations","archived":false,"fork":false,"pushed_at":"2024-11-30T05:05:30.000Z","size":6446,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-07T06:44:34.984Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/5brian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-19T17:47:53.000Z","updated_at":"2025-01-26T21:22:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"664395dd-1310-44d6-beb0-a5f9aa669790","html_url":"https://github.com/5brian/aep-model","commit_stats":null,"previous_names":["yoleuh/aep-model","5brian/aep-model"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5brian%2Faep-model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5brian%2Faep-model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5brian%2Faep-model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5brian%2Faep-model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/5brian","download_url":"https://codeload.github.com/5brian/aep-model/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246635951,"owners_count":20809331,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-14T05:15:37.443Z","updated_at":"2025-04-01T11:53:32.716Z","avatar_url":"https://github.com/5brian.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Safety Classification Project\n\nThis project uses machine learning to classify safety comments into high priority and standard priority categories. It fine-tunes a deep learning transformer-based neural network [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased), adapting it to the specific task of safety observation classification.\n\n## Project Structure\n\n- `safety_classification.py`: Script for training the model, evaluating its performance, and generating results for the entire dataset.\n- `classify_new_inputs.py`: Interactive script for classifying new safety comments using the trained model.\n- `requirements.txt`: List of Python packages required for the project.\n\n## Usage: Training on Your Own Data\n\n### Training the Model and Generating Results\n\n1. Prepare your data:\n\n   - Create a CSV file with at least two columns: one for the safety comments and one for the priority labels.\n   - Name your columns exactly as they are in the original dataset or update the column names in the `safety_classification.py` script.\n   - Move your CSV file to the `data/` directory.\n   - Rename it to `data.csv` or update the file path in `safety_classification.py`.\n\n2. Run the training and classification script:\n\n   ```\n   python3 safety_classification.py\n   ```\n\n3. This script will:\n\n   - Train the model on the included dataset\n   - Evaluate its performance\n   - Save the model and tokenizer in the `saved_model` and `saved_tokenizer` directories respectively\n   - Classify all comments in the original dataset\n   - Save the results (including classifications) to `results/output_data.csv`\n\n### Interactively Classifying New Inputs\n\n1. After training the model, you can use it to classify new safety comments interactively:\n\n   ```\n   python3 classify_new_inputs.py\n   ```\n\n2. This script will:\n\n   - Load the trained model and tokenizer\n   - Prompt you to enter safety comments\n   - Display the classification (High Priority or Standard Priority) and confidence score for each entered comment\n\n3. Enter your safety comments one at a time and press `Enter` to see the classification result.\n\n4. Type 'quit' to exit the program.\n\n## Our Training Data Results\n\n```\nEvaluation Results:\nAccuracy: 0.9910\nF1 Score: 0.9833\nPrecision: 0.9925\nRecall: 0.9742\n```\n\n## Notes\n\n- The scripts are set to use CPU for computations. If you have a GPU and want to use it, set USE_CPU to False in both scripts.\n\n  ```python\n  # Configurable variables\n  USE_CPU = False\n  ```\n\n- This project utilizes the DistilBERT base model (uncased) for sequence classification. DistilBERT is a smaller, faster version of BERT, developed by Hugging Face. DistilBERT is a transformers model, pretrained on the same corpus as BERT in a self-supervised fashion, using the BERT base model as a teacher. It's designed for tasks that use the whole sentence to make decisions, such as sequence classification, token classification, or question answering.\n\n## Troubleshooting\n\nIf you encounter any issues:\n\n- Ensure all dependencies are correctly installed.\n- Make sure you've run `safety_classification.py` before trying to use `classify_new_inputs.py`.\n- Training the model will take a while, and will be physically demanding on your machines RAM. If this is an issue, the MAX_STEPS variable can be decreased, or another model can be used.\n\n## Contributors\n\n- Reid Ammer (ammer.5@osu.edu)\n- Brian Tan (tan.1220@osu.edu)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F5brian%2Faep-model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F5brian%2Faep-model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F5brian%2Faep-model/lists"}