Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/akashkobal/generative-ai-detection

It aims to detect whether a given text or prompt is authored by a human or generated by an AI model.
https://github.com/akashkobal/generative-ai-detection

akash-kobal axios css3 flask gen-ai-project generative-ai generative-ai-detection generative-ai-projects html5 javascript pyhton3 reactjs vercel

Last synced: about 2 months ago
JSON representation

It aims to detect whether a given text or prompt is authored by a human or generated by an AI model.

Awesome Lists containing this project

README

        

# Generative AI Detection

![demo image](https://github.com/AkashKobal/Generative-AI-Detection/blob/main/Screenshot%20(366).png)

## Overview

This project is a web application built using **ReactJS** for the frontend and **Python** for the backend. It aims to detect whether a given text or prompt is authored by a human or generated by an AI model. The application participates in the **Voight-Kampff Generative AI Authorship Verification 2024** challenge.

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Data](#data)
- [Evaluation Metrics](#evaluation-metrics)
- [Submission Guidelines](#submission-guidelines)
- [Contributing](#contributing)
- [License](#license)

## Features

- **AI vs Human Detection**: Detects whether a text is written by a human or generated by an AI.
- **Machine Learning Models**: Utilizes various machine learning models for classification.
- **ReactJS Frontend**: User-friendly interface built with ReactJS.
- **Python Backend**: Backend processing with Python to handle AI detection logic.

## Modules and Libraries:
### 1. Flask:
A lightweight Python web framework used to build web applications.
It provides features for routing HTTP requests and generating responses.
Flask-CORS:

A Flask extension that handles Cross-Origin Resource Sharing (CORS), enabling the server to respond to requests from different origins.

### 2. PyTorch:

A deep learning library used to load and run models. It is used here to run the GPT-2 model and perform computations such as calculating Perplexity.

### 3. Transformers (from Hugging Face):

The GPT2LMHeadModel and GPT2TokenizerFast classes from the transformers library are used to load the GPT-2 language model and tokenizer.
GPT-2 is a pre-trained language model that can generate and analyze text.

### 4. Regular Expressions (re):

Used for text processing tasks such as extracting valid characters or splitting sentences into lines.

### 5. OrderedDict (from collections):

Used to maintain the order of results in a dictionary, making it easier to track and display the analysis steps.

## Algorithm and Use Case:
The core functionality of the code revolves around using the GPT-2 model to calculate Perplexity and Burstiness of a given text. Here is how it works:

### 1. Perplexity:

A measure of how well a model predicts a given sentence. Lower perplexity indicates the text is more predictable and likely to be written by a human, while higher perplexity suggests it may have been generated by AI.
The algorithm uses the GPT-2 model to compute the Perplexity score by evaluating the negative log likelihood (NLL) of each word in the input text, which is then used to calculate Perplexity.

### 2. Burstiness:

This measures the variation in Perplexity across different lines in the input text. Higher variation or burstiness can indicate unusual text patterns, often found in AI-generated text.

### 3. Thresholding for AI vs. Human:

The code uses a threshold-based decision system to categorize the text:
If the Perplexity score is below a threshold (60), the text is likely AI-generated.
If it falls between 60 and 80, the text is deemed "most likely AI," but it needs more text for better judgment.
If the Perplexity is above 80, the text is likely to be human-written.

### 4. API Usage:

The code exposes a POST route ('/') where a user can submit a JSON payload with a text field. The system processes the text using the GPT2PPL class, calculates Perplexity and Burstiness, and then returns a response with a label ("AI-generated" or "Human-written") and the computed values.

### 5. Use Case:
The code can be used in scenarios where there is a need to determine whether a given text was written by a human or generated by an AI model. This can be useful for:

**Content authenticity checks:** To identify whether an article, blog post, or essay was written by a human or AI.

**AI detection in education:** To detect if students have submitted AI-generated text as their own work.

**Content moderation:** To flag AI-generated content in social media or online forums.

## Summary:
**Modules used:** Flask, Flask-CORS, PyTorch, transformers, regex, and OrderedDict.

**Algorithms used:** Perplexity (text predictability), Burstiness (variation in sentence predictability), and thresholding for labeling the text as AI or human-written.

**Use case:** AI vs. human text detection, content authenticity verification.

## Installation

To set up the project locally, follow these steps:

1. **Clone the repository**:
```bash
git clone https://github.com/AkashKobal/Generative-AI-Detection.git
cd Generative-AI-Detection
```

2. **Install frontend dependencies**:
```bash
cd frontend
npm install
```

3. **Install backend dependencies**:
```bash
cd ../server
pip install -r requirements.txt
```

4. **Start the backend server**:
```bash
python app.py
```

5. **Start the frontend server**:
```bash
npm start
```

## Usage

1. Open your browser and go to `http://localhost:3000`.
2. Upload a pair of texts (one human-written and one AI-generated).
3. Click on the **"Analyse"** button to see the results.

## Data

- The dataset used in this project consists of a collection of human-written and AI-generated texts.
- Texts are analyzed to calculate metrics like **Perplexity** and **Burstiness** to determine their likely origin (AI or human).

## Evaluation Metrics

- **Perplexity**: Measures how well the AI model predicts the next word in a text. Lower perplexity suggests human authorship, while higher perplexity suggests AI generation.
- **Burstiness**: Measures the variation in perplexity across different lines of text. Higher burstiness often indicates AI-generated text.

## Key Changes:
1. Reorganized the content under appropriate headings.
2. Added a **Data** section for clarity on the dataset.
3. Reformatting of steps in **Installation** to improve clarity.

## Contributing

Contributions are welcome! If you have suggestions for improvements or new features, please fork the repository and submit a pull request.

## License

This project is licensed under the **MIT License**. See the [LICENSE](https://github.com/AkashKobal/Generative-AI-Detection/blob/main/LICENSE) file for details.