https://github.com/abhiram-kandiyana/error-explainer

Explanation of Programming Errors using Open-source LLMs
https://github.com/abhiram-kandiyana/error-explainer

code-llama few-shot-prompting finetuning-llms low-rank-adaptation text-alignment

Last synced: about 2 months ago
JSON representation

Explanation of Programming Errors using Open-source LLMs

Host: GitHub
URL: https://github.com/abhiram-kandiyana/error-explainer
Owner: Abhiram-kandiyana
License: gpl-3.0
Created: 2024-10-05T22:35:01.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-10-08T12:52:52.000Z (7 months ago)
Last Synced: 2025-01-31T15:42:01.032Z (4 months ago)
Topics: code-llama, few-shot-prompting, finetuning-llms, low-rank-adaptation, text-alignment
Language: Jupyter Notebook
Homepage:
Size: 946 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Error-Explainer

The efficiency of debugging is critically impacted by the quality of error log messages. The compiler

messages are mostly vague and sometimes misleading which leaves the programmer clueless about the

source of error. Most beginners spend hours trying to find and fix a bug due to this issue. This project explores

the application of large language models (LLMs) for enhancing debugging productivity through clearer and

more actionable error explanations. Utilizing the pretrained Code-LLaMA model, known for its adeptness in

programming and coding tasks, we investigate two primary approaches to improve explanation quality: finetuning and prompting strategies.

Keywords: LLMs, few-shot prompting, Text Alignment, Code-Llama, Low Rank Adaptation (LoRA)

# Overview

Our study introduces a custom error and alignment dataset tailored to refine the baseline capabilities of the Code-LLaMA model, aiming to produce more relevant and accurate

error explanations. We assess the effectiveness of each strategy using Perplexity (PPL), BERT-score, and

comprehensive human evaluation metrics. These metrics evaluate the clarity, relevance, and actionability

of the explanations, contributing to an empirical understanding of how LLMs can be optimized to support

software developers in debugging tasks. Our findings not only shed light on the comparative strengths and

limitations of fine-tuning versus prompting within the context of error explanation but also propose a

framework using few prompt strategies for further enhancement of automated debugging assistance tools

in software development environments.

# Results

## Test Results

| Strategy                       | BERT-score | Perplexity (PPL) | Human-Eval |

|---------------------------------|------------|------------------|------------|

| Zero-shot   | 0.82       | 9.6              | 28.67      |

| Fine-Tuning                     | 0.78       | 9.4              | 40.33      |

| Random-Prompting 4-shot         | 0.88       | 4.9              | 32.67      |

| Same-Class-Prompting 4-shot | 0.89   | 4        | 46.67  |

| Manual-Prompting 4-shot         | 0.39       | 2.7              | 18.33      |

## Human Evaluation Results

| Examiner  | Zero-shot | Finetuned | Random-Prompting 4-shot | Same-class-Prompting 4-shot | Manual-Prompting 4-shot | GPT-4 (baseline) |

|-----------|----------------------|-----------|-------------|-------------|-------------|-------|

| 1         | 21                   | 44        | 31          | 46          | 17          | 51    |

| 2 | 28                   | 34        | 35          | 47          | 17          | 49    |

| 3         | 37                   | 43        | 32          | 47          | 21          | 53    |

| **Average** | 28.67                | 40.33     | 32.67       | 46.67       | 18.33       | 51   |

**NOTE**: To read more about the results, metrics and methodology, please go to "./project report.pdf"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abhiram-kandiyana/error-explainer

Awesome Lists containing this project

README