https://github.com/kmock930/natural-language-processing

This project contains codes and paperwork based on the course CSI5386 at University of Ottawa (delivered by Professor Dr. Diana Inkpen).
https://github.com/kmock930/natural-language-processing

bert bigram-modeling corpus-linguistics distilbert fasttext-embeddings glove-embeddings hugging-face-transformers large-language-models lemmatizer logistic-regression macro-micro-f1 natural-language-processing paraphrase-minilm pos-tagging roberta-large sbert stopwords text-embedding-ada-002 universal-sentence-encoder word-tokenizer

Last synced: 3 months ago
JSON representation

This project contains codes and paperwork based on the course CSI5386 at University of Ottawa (delivered by Professor Dr. Diana Inkpen).

Host: GitHub
URL: https://github.com/kmock930/natural-language-processing
Owner: kmock930
Created: 2025-01-13T21:34:16.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-05-02T00:56:04.000Z (5 months ago)
Last Synced: 2025-05-02T01:34:50.300Z (5 months ago)
Topics: bert, bigram-modeling, corpus-linguistics, distilbert, fasttext-embeddings, glove-embeddings, hugging-face-transformers, large-language-models, lemmatizer, logistic-regression, macro-micro-f1, natural-language-processing, paraphrase-minilm, pos-tagging, roberta-large, sbert, stopwords, text-embedding-ada-002, universal-sentence-encoder, word-tokenizer
Language: Jupyter Notebook
Homepage:
Size: 39.5 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Natural Language Processing Work

## [Assignment 1](./Assignment%201/README.md) - Corpus analysis and sentence embeddings

![Assignment 1 Preview](asm1-preview.png)

## [Assignment 2](./Assignment%202/README.md) - Machine-Generated Text Detection

![Assignment 2 Preview](./Assignment%202/models_comparison.png)

## [Seminar Research](./Seminar%20Paper/Paper%20Presentation%20-%20Group%202.pdf) - Depression Detection

Given the rising popularity of social media, there is a risk of negative impacts such as cyberbullying, causing mental health distress to some users. As a result, we dived into an exploration of depression detection with the **DORIS framework** proposed by Lan X., Cheng Y., Sheng L., Gao C., and Li Y. It also forms a basis for our project which aims to perform a NLP-based model targetting suicide detection.

## [Project](./Project/README.md)

### Summary of Our Work

* [Project's Proposal](./Project/CSI5386_Natural_Language_Processing_Project_Proposal.pdf)

* [Presenting from the NLP's Perspective](./Project/Project%20Presentation%20-%20NLP%20Aspects.pdf)

Our project analyzes suicidal intentions from popular social media platforms, and trains the best model for suicidal detection. Here are the models that we've used. 

* [Our Report](./Project/CSI5386_NLP_Project_Report___Kelvin__Jenifer__Sabrina.pdf)

![Summary of Models](./Project/models_comparison.png)

### Baseline Model

![Project - Baseline Model](./Project/NLP%20Training/Results/baseline_auc_curve.png)

### Fine-Tuning a Deep Learning based Transformer - DistilBERT

![Project - Deep Learning based Fine-Tuning DistilBERT Model's Results](./Project/NLP%20Training/Results/Fine-tuned%20DistilBERT%20accuracy_fold_2.png)

### Added Custom Layers on top of Fine-Tuned DistilBERT

![Project - Deep Learning based Custom Layers](./Project/NLP%20Training/Results/Custom%20Layers_accuracy_fold_5.png)

![Project - Deep Learning based model resulting AUC](./Project/NLP%20Training/Results/model_2_deep_learning_auc_curve.png)

### LLM-based Model

![Project - LLM-Based Model - ROC-AUC](./Project/NLP%20Training/Results/model_3_deep_learning_auc_curve.png)

![Project - LLM-based Model - Confusion Matrix](./Project/NLP%20Training/Results/deepseek_confusion_matrix.png)

## Execution Guide

* [**TMUX**](tmux.md) for idling long executions

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kmock930/natural-language-processing

Awesome Lists containing this project

README