Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/furk4neg3/ibm_transformer-classification-nlp

An NLP project using transformers to classify documents by topic. Developed for IBM’s Generative AI Engineering with LLMs Specialization course, this repository demonstrates applying transformers for organizing large text archives into searchable categories
https://github.com/furk4neg3/ibm_transformer-classification-nlp

Last synced: about 2 months ago
JSON representation

An NLP project using transformers to classify documents by topic. Developed for IBM’s Generative AI Engineering with LLMs Specialization course, this repository demonstrates applying transformers for organizing large text archives into searchable categories

Awesome Lists containing this project

README

        

# Applying Transformers for Document Classification

This repository contains a project on using transformer-based models to classify text documents by topic, focusing on organizing archival news content for an improved search engine experience. Completed as part of IBM's Generative AI Engineering with LLMs Specialization course, this project explores text preprocessing, dataset handling, and model training using `torchtext` and transformer architectures.

## Overview

In this project, we use transformers to categorize historical news articles. The goal is to enable a content search engine to deliver relevant articles based on topic, streamlining the user experience and enhancing content management efficiency. Key aspects of this project include:
- Utilizing the `torchtext` library for text classification
- Preprocessing text data for model training
- Training a transformer model to predict article categories

## Table of Contents

1. [Project Context](#project-context)
2. [Transformer Model Application](#transformer-model-application)
3. [Data Preparation and Preprocessing](#data-preparation-and-preprocessing)
4. [Requirements](#requirements)
5. [References](#references)

## Project Context

Imagine an archive of historical documents at a newspaper or magazine company. Categorizing these documents automatically enhances accessibility and content delivery. This project builds an automated classification model for this purpose.

## Transformer Model Application

The transformer architecture excels in natural language processing tasks, especially classification. We implement a transformer-based model to assign articles to topic categories accurately.

## Data Preparation and Preprocessing

Using `torchtext`, this project demonstrates how to:
- Load and preprocess text data for classification tasks
- Tokenize and organize data efficiently for model training

## Requirements

- Python 3.7+
- PyTorch
- Torchtext

- ## References

- [IBM AI Engineering Professional Certificate](https://www.coursera.org/professional-certificates/ai-engineer?)
- [Generative AI Engineering with LLMs Specialization](https://www.coursera.org/specializations/generative-ai-engineering-with-llms)
- [Generative AI Language Modeling with Transformers](https://www.coursera.org/learn/generative-ai-language-modeling-with-transformers?specialization=generative-ai-engineering-with-llms)