Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mirzaazwad/tymbert
TYMBert is our submission for NCIM 2025, a spam classifier that makes use of knowledge distillation to compress the model while preserving accuracy
https://github.com/mirzaazwad/tymbert
bert huggingface-transformers knowledge-distillation machine-learning matplotlib numpy pandas python3 scikit-learn tiny-bert torch
Last synced: 3 days ago
JSON representation
TYMBert is our submission for NCIM 2025, a spam classifier that makes use of knowledge distillation to compress the model while preserving accuracy
- Host: GitHub
- URL: https://github.com/mirzaazwad/tymbert
- Owner: mirzaazwad
- License: mit
- Created: 2025-02-16T23:15:15.000Z (4 days ago)
- Default Branch: master
- Last Pushed: 2025-02-16T23:35:33.000Z (4 days ago)
- Last Synced: 2025-02-17T00:20:34.022Z (4 days ago)
- Topics: bert, huggingface-transformers, knowledge-distillation, machine-learning, matplotlib, numpy, pandas, python3, scikit-learn, tiny-bert, torch
- Language: Jupyter Notebook
- Homepage:
- Size: 3.72 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Tymbert
## Table of Contents
- [Introduction](#introduction)
- [Prerequisites](#prerequisites)
- [Installation Steps](#installation-steps)
- [Updating the Environment](#updating-the-environment)
- [Deactivating and Removing the Environment](#deactivating-and-removing-the-environment)
- [Jupyter Use](#jupyter-use)
- [Datasets](#datasets)
- [Additional References](#additional-references)
- [License](#license)## Introduction
TYMBert is our submission for NCIM 2025, a spam classifier that makes use of knowledge distillation to compress the model while preserving accuracy
This repository provides a Conda environment configuration file (`environment.yml`) for setting up the `tymbert` environment. Follow the steps below to install and configure it correctly on your system.
## Prerequisites
- Install [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or [Anaconda](https://www.anaconda.com/products/distribution)
- Ensure Conda is added to your system's PATH## Installation Steps
1. **Clone the Repository**
```bash
git clone https://github.com/mirzaazwad/TYMBert.git
cd TYMBert
```2. **Create the Conda Environment**
Run the following command to create the `tymbert` environment from the `environment.yml` file:```bash
conda env create -f environment.yml
```3. **Update the Environment Prefix**
The `environment.yml` file may contain an absolute path under the `prefix` field, which may not match your system's Conda installation directory. To fix this:- Open the `environment.yml` file in a text editor
- Locate the `prefix:` field at the bottom of the file (if present)
- Change it to your own Conda environment path, which can be found using:
```bash
conda info --envs
```
- Alternatively, create the environment without using the prefix by running:
```bash
conda env create --name tymbert --file environment.yml
```4. **Activate the Environment**
```bash
conda activate tymbert
```5. **Verify Installation**
Check that the necessary dependencies are installed:
```bash
conda list
```## Updating the Environment
If you make changes to `environment.yml` and need to update the existing environment:
```bash
conda env update --name tymbert --file environment.yml --prune
```## Deactivating and Removing the Environment
To deactivate the environment:
```bash
conda deactivate
```To remove the environment completely:
```bash
conda env remove --name tymbert
```## Jupyter Use
After this environment is setup, use this environment as your kernel and you can use it via Jupyter Notebook or VSCode with the Jupyter extension.
## Datasets
| Dataset Name | Description | Link |
| --------------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------- |
| SPStudy | A dataset for spam research, containing various studies and data points. | [GitHub - SPStudy](https://github.com/smspamresearch/spstudy/tree/main) |
| SMS Spam Collection Dataset | A dataset containing SMS messages labeled as spam or ham. | [Kaggle - SMS Spam Collection](https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset) |## Additional References
Quantization Logic and Code was used with the help of [GitHub - BERT-Quantization](https://github.com/srimoyee1212/BERT-Quantization/tree/main) by srimoyee1212
## License
This project is licensed under the MIT License. See the `LICENSE` file for details.