https://github.com/awais-124/fine-tuning-distilbert

Repo contains python code for Fine-tuning DistillBert. Semester Project of AI Course.
https://github.com/awais-124/fine-tuning-distilbert

classification data-science distilbert fine-tuning hyperparameter-tuning preprocessing sentiment-analysis transformer

Last synced: 7 months ago
JSON representation

Repo contains python code for Fine-tuning DistillBert. Semester Project of AI Course.

Host: GitHub
URL: https://github.com/awais-124/fine-tuning-distilbert
Owner: awais-124
Created: 2025-01-31T16:56:31.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-01-31T17:02:20.000Z (8 months ago)
Last Synced: 2025-01-31T18:19:46.121Z (8 months ago)
Topics: classification, data-science, distilbert, fine-tuning, hyperparameter-tuning, preprocessing, sentiment-analysis, transformer
Language: Jupyter Notebook
Homepage:
Size: 1.46 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Fine-Tuning DistillBERT for Sentiment Analysis

## Overview

This repository contains a Jupyter Notebook for fine-tuning DistillBERT on a sentiment analysis dataset. The model is trained using TensorFlow and Hugging Face's `transformers` library to classify tweets into sentiment categories.

## Dataset

The dataset used for training is `Tweets.csv`, which contains airline-related tweets labeled with sentiment categories (`positive`, `neutral`, `negative`).

## Steps Covered

### 1. Data Preprocessing

- Load dataset (`Tweets.csv`)

- Check for missing values and class balance

- Convert text to lowercase

- Remove unnecessary columns

- Visualize word frequency using a Word Cloud

### 2. Tokenization

- Convert text into tokenized inputs (`input_ids`, `attention_mask`)

- Use Hugging Face `DistilBertTokenizer`

- Ensure proper padding and truncation

### 3. Feature Mapping

- Map tokenized inputs to a TensorFlow dataset format

- Prepare training and testing sets

### 4. Model Training

- Load `DistilBertForSequenceClassification`

- Define loss function and optimizer

- Train model using TensorFlow/Keras

### 5. Evaluation

- Predict sentiment on test data

- Compute accuracy, precision, recall, and F1-score

- Generate a classification report

## Requirements

To run this notebook, install the following dependencies:

```bash

pip install numpy pandas matplotlib seaborn nltk tensorflow transformers scikit-learn tqdm plotly

```

## Running the Notebook

1. Clone this repository:

```bash

git clone https://github.com/awais-124/fine-tuning-distilbert.git

cd fine-tuning-distilbert

```

2. Run the Jupyter Notebook:

```bash

jupyter notebook CODE.ipynb

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/awais-124/fine-tuning-distilbert

Awesome Lists containing this project

README