Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/furk4neg3/ibm_transformer-classification-nlp
An NLP project using transformers to classify documents by topic. Developed for IBM’s Generative AI Engineering with LLMs Specialization course, this repository demonstrates applying transformers for organizing large text archives into searchable categories
https://github.com/furk4neg3/ibm_transformer-classification-nlp
Last synced: about 2 months ago
JSON representation
An NLP project using transformers to classify documents by topic. Developed for IBM’s Generative AI Engineering with LLMs Specialization course, this repository demonstrates applying transformers for organizing large text archives into searchable categories
- Host: GitHub
- URL: https://github.com/furk4neg3/ibm_transformer-classification-nlp
- Owner: furk4neg3
- Created: 2024-11-10T14:28:05.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-10T14:31:19.000Z (about 2 months ago)
- Last Synced: 2024-11-10T15:29:13.373Z (about 2 months ago)
- Language: Jupyter Notebook
- Size: 832 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Applying Transformers for Document Classification
This repository contains a project on using transformer-based models to classify text documents by topic, focusing on organizing archival news content for an improved search engine experience. Completed as part of IBM's Generative AI Engineering with LLMs Specialization course, this project explores text preprocessing, dataset handling, and model training using `torchtext` and transformer architectures.
## Overview
In this project, we use transformers to categorize historical news articles. The goal is to enable a content search engine to deliver relevant articles based on topic, streamlining the user experience and enhancing content management efficiency. Key aspects of this project include:
- Utilizing the `torchtext` library for text classification
- Preprocessing text data for model training
- Training a transformer model to predict article categories## Table of Contents
1. [Project Context](#project-context)
2. [Transformer Model Application](#transformer-model-application)
3. [Data Preparation and Preprocessing](#data-preparation-and-preprocessing)
4. [Requirements](#requirements)
5. [References](#references)## Project Context
Imagine an archive of historical documents at a newspaper or magazine company. Categorizing these documents automatically enhances accessibility and content delivery. This project builds an automated classification model for this purpose.
## Transformer Model Application
The transformer architecture excels in natural language processing tasks, especially classification. We implement a transformer-based model to assign articles to topic categories accurately.
## Data Preparation and Preprocessing
Using `torchtext`, this project demonstrates how to:
- Load and preprocess text data for classification tasks
- Tokenize and organize data efficiently for model training## Requirements
- Python 3.7+
- PyTorch
- Torchtext- ## References
- [IBM AI Engineering Professional Certificate](https://www.coursera.org/professional-certificates/ai-engineer?)
- [Generative AI Engineering with LLMs Specialization](https://www.coursera.org/specializations/generative-ai-engineering-with-llms)
- [Generative AI Language Modeling with Transformers](https://www.coursera.org/learn/generative-ai-language-modeling-with-transformers?specialization=generative-ai-engineering-with-llms)