https://github.com/vubacktracking/bert-faiss-qa-system

Q&A System using BERT and Faiss Vector Database
https://github.com/vubacktracking/bert-faiss-qa-system

bert distilbert faiss faiss-vector-database qa-system vector-database

Last synced: about 1 month ago
JSON representation

Q&A System using BERT and Faiss Vector Database

Host: GitHub
URL: https://github.com/vubacktracking/bert-faiss-qa-system
Owner: VuBacktracking
Created: 2024-04-09T03:13:31.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-05-21T08:33:19.000Z (about 1 year ago)
Last Synced: 2025-04-02T02:01:59.978Z (2 months ago)
Topics: bert, distilbert, faiss, faiss-vector-database, qa-system, vector-database
Language: Python
Homepage:
Size: 899 KB
Stars: 9
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Q&A System using BERT and Faiss Vector Database

---

### Table of Contents

- [Q\&A System using BERT and Faiss Vector Database](#qa-system-using-bert-and-faiss-vector-database)
- [Table of Contents](#table-of-contents)
- [Overview](#overview)
- [Features](#features)
- [Installation](#installation)
- [Requirements](#requirements)
- [Setup](#setup)
- [Usage](#usage)
- [Streamlit Web App Interface](#streamlit-web-app-interface)
- [How it Works](#how-it-works)
- [Demo](#demo)
- [Extractive Q\&A](#extractive-qa)
- [Closed Generative Q\&A](#closed-generative-qa)
- [Acknowledgments](#acknowledgments)

---

## Overview

This project is a Question & Answer system implemented using DistilBERT for text representation and Faiss (Facebook AI Similarity Search) for efficient similarity search in a vector database. The system is designed to provide accurate and relevant answers to user queries by searching through a large collection of documents.

workflow

## Features

- **DistilBERT-based Text Representation**: Utilizes the DistilBERT model to convert questions and documents into dense vector representations.

- **Faiss Vector Database**: Stores the vector representations of the documents for fast similarity search.

- **Efficient Retrieval**: Finds the most relevant documents to a given question by performing efficient similarity searches in the Faiss vector database.

---

## Installation

### Requirements

- Python 3.x
- PyTorch
- Transformers
- Faiss
- Streamlit (for the web-based interface)

### Setup

1. Clone the repository:

```bash
git clone https://github.com/VuBacktracking/bert-faiss-qa-sytem.git
```

2. Clone the repository:

```bash
pip install -r requirements.txt
```

3. Train and Download the DistilBERT model:

```bash
python3 trainer.py
```
**Note**:
You can check my model in the link: https://huggingface.co/vubacktracking/distilbert-base-uncased-finetuned-squad2

4. Build the Faiss vector database:

```bash
python3 faiss_index.py
```

workflow

---

## Usage

### Streamlit Web App Interface

```bash
streamlit run app.py
```

---

Open your web browser and navigate to `http://localhost:8501/` to use the web-based Q&A system.

## How it Works

1. **BERT Embeddings**:
- The preprocessed text is converted into vector embeddings using the DistilBERT model.

2. **Faiss Indexing**:
- The DistilBERT embeddings of the documents are indexed in the Faiss vector database.

3. **Query Processing**:
- When a user inputs a question, the question is converted into a DistilBERT embedding.
- Faiss is used to find the most similar embeddings (i.e., the most relevant documents) to the question embedding.

4. **Answer Extraction**:
- The relevant documents are ranked, and the most relevant answer passages are extracted and presented to the user.

---

## Demo

### Extractive Q&A

workflow

### Closed Generative Q&A

workflow

---

## Acknowledgments

- [Hugging Face Transformers](https://github.com/huggingface/transformers)
- [Facebook AI Similarity Search (Faiss)](https://github.com/facebookresearch/faiss)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vubacktracking/bert-faiss-qa-system

Awesome Lists containing this project

README