An open API service indexing awesome lists of open source software.

https://github.com/rakshit-vasava/hcl-software

A project utilizing Large Language Models (LLMs) to detect software vulnerabilities and recommend contextual fixes.
https://github.com/rakshit-vasava/hcl-software

haystack hugging-face langchain openai-api owasp python

Last synced: 3 months ago
JSON representation

A project utilizing Large Language Models (LLMs) to detect software vulnerabilities and recommend contextual fixes.

Awesome Lists containing this project

README

          

# HCL-Software: Enhancing Software Security with LLMs

## 🚀 Executive Summary
This was my capstone project with HCL Software. Our project focuses on using **Large Language Models (LLMs)** to provide recommendations (code lines or contextual information) for vulnerabilities, enhancing software security and efficiency.

Traditional manual patching is time-consuming and error-prone. We introduce an accelerated approach using LLMs to provide accurate, context-specific recommendations, speeding up remediation while ensuring transparency through cited sources.

## 🔑 Key Outcomes
- Successfully built prototype models (QA, GPT-2, and ChatOpenAI).
- Models provide remediation advice and recommendations for code fixes.
- Our solution has commercial potential, reducing operational risks, enhancing security, and promoting safer digital environments.

## 🛠 Methods and Tools

### Datasets Used:
- **OWASP Cheat Sheets** (integrated from the OWASP GitHub repository).

### Analytical Models:
1. **QA Model** (Haystack, InMemoryDocumentStore, BM25 algorithm).
2. **GPT-2 Model** (Fine-tuned GPT-2, Hugging Face Transformers).
3. **ChatOpenAI Model** (Langchain, RAG, OpenAI API).

### Tools and Platforms:
- **Python**, **Google Colab**, **Hugging Face**, **Haystack**, **OpenAI API**, **GitHub**, **Google Drive**, **Visual Studio Code**.

## ⚙️ Results and Conclusions
Our models successfully identified software vulnerabilities and provided actionable advice:
- **QA Model**: Found vulnerability causes and offered patch suggestions.
- **GPT-2 Model**: Improved extended responses using fine-tuned datasets.
- **ChatOpenAI Model**: Delivered the most comprehensive and contextually relevant recommendations.

## 🌍 Business & Social Impact
- **Business**: Reduces the time and cost of securing software, increasing trust and reliability.
- **Social**: Contributes to a safer digital environment, protecting users from sophisticated cyber threats.

## 📄 Screenshots of Results

### QA Model Output:
- What is Cross Site Scripting?
QA Output

### GPT-2 Model Output:
- What is Cross Site Scripting?
QA Output

### ChatOpenAI Model Output:
- What is Cross Site Scripting? And how to solve it?
QA Output

- What is SQL Injection? And how to solve it?
QA Output

## 📋 How to Reproduce Results

### 1. QA Model
- **Step 1**: Install Haystack, create a DocumentStore, Retriever, and Reader.
- **Step 2**: Feed datasets ('causes', 'risks', and 'recommendations') into the model.
- **Step 3**: Query the system using prompts like "What is Cross-Site Scripting?"

### 2. GPT-2 Model
- **Step 1**: Fine-tune GPT-2 with the combined dataset (HCL + OWASP).
- **Step 2**: Set up a text generation pipeline using Hugging Face.
- **Step 3**: Query the system using prompts like "What is Cross-Site Scripting?"

### 3. ChatOpenAI Model
- **Step 1**: Create a database using Chroma and Langchain's RAG.
- **Step 2**: Query the model with prompts like "What is SQL injection? And how do I solve it?"

## 🛠 Tools to Run the Project
- Python, Hugging Face, Haystack, OpenAI API
- Google Colab for processing

## 🏗 Future Work
- Integrate the model into a chat-bot within coding environments to offer instant remediation advice to developers.
- Continuously train the model on new vulnerabilities.