Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tech-c-p/conversai
ConversAI is an innovative conversational AI framework designed for intelligent text extraction and querying across various document formats and web content, leveraging advanced natural language processing techniques.
https://github.com/tech-c-p/conversai
beautifulsoup chatbot genai gradio groq langchain large-language-models llama3 mlops nlp ocr pymupdf python
Last synced: 26 days ago
JSON representation
ConversAI is an innovative conversational AI framework designed for intelligent text extraction and querying across various document formats and web content, leveraging advanced natural language processing techniques.
- Host: GitHub
- URL: https://github.com/tech-c-p/conversai
- Owner: Tech-C-P
- License: mit
- Created: 2024-10-16T06:27:10.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-10-16T07:35:44.000Z (3 months ago)
- Last Synced: 2024-10-19T06:09:26.628Z (3 months ago)
- Topics: beautifulsoup, chatbot, genai, gradio, groq, langchain, large-language-models, llama3, mlops, nlp, ocr, pymupdf, python
- Language: Python
- Homepage: https://huggingface.co/spaces/techconsptr/ConversAI, https://convers-ai-test.vercel.app/
- Size: 1.02 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π€π ConversAI
**[ConversAI](https://convers-ai-test.vercel.app/)** is an innovative conversational AI framework designed to empower users with intelligent interactions
across various document formats and web content. Utilizing advanced natural language processing (NLP) techniques,
ConversAI enables seamless text extraction and querying capabilities, making it an invaluable tool for researchers,
students, professionals, and anyone who regularly interacts with text-based information.## Explore Our Chatbot Solutions
For advanced, tailored chatbot solutions, please visit [ConversAI Website](https://convers-ai-test.vercel.app/).
We offer a unique and customizable chatbot solution tailored to meet your specific needs. If youβre interested in enhancing your customer engagement with our chatbot technology, please reach out to us via email or visit our website for more information.### Demo Video
https://github-production-user-asset-6210df.s3.amazonaws.com/45705878/376930530-b0cd1b42-a823-4577-8394-dee7d00ac12c.mp4?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20241016%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241016T070830Z&X-Amz-Expires=300&X-Amz-Signature=645688e7c16270ca684c8c31fc729322d9aa9e0264416402f0881d2bacdd9b78&X-Amz-SignedHeaders=host## π Introduction
In an era characterized by information overload, efficient data processing is crucial. ConversAI addresses this
challenge by leveraging state-of-the-art technologies to transform unstructured data into actionable insights. Whether
extracting meaningful information from PDFs, fetching transcripts from YouTube videos, or gathering data from multiple
web pages, ConversAI provides a user-friendly interface that simplifies these complex tasks.## ποΈ Architecture Overview
![PDF Upload](demo/conversai-architecture-diagram.png)
### Key Technologies Used:
- **Gradio**: For building the interactive user interface.
- **Langchain**: For chaining various data retrieval and processing tasks.
- **EasyOCR**: For optical character recognition in scannable PDFs.
- **Pymupdf**: For handling searchable PDF documents.
- **BeautifulSoup**: For web scraping and HTML parsing.With its modular design, ConversAI is not just a tool but a platform that can be extended and customized to fit diverse
user requirements.## β¨ Features
| Feature | Description |
|------------------------------|-----------------------------------------------------------------------------|
| **Text Extraction** | Extracts text from both searchable and scannable PDFs. |
| **Web Crawler** | Gathers and processes text from websites efficiently. |
| **YouTube Transcript Retrieval** | Fetches and processes video transcripts for easy querying. |
| **Conversational Interface** | Interactive Gradio UI for seamless user interaction. |
| **Configurable** | Easily customizable to fit specific use cases. |
| **Advanced Document Retrieval** | Utilizes sophisticated algorithms for improved accuracy and relevance. |
## π Benefits- **Time-Saving**: Automates text extraction, allowing users to focus on analysis rather than manual data entry.
- **Accuracy**: High-precision text extraction capabilities ensure that users receive reliable data.
- **User-Friendly**: Intuitive interface makes it accessible for users with varying technical skills.
- **Scalable**: Modular architecture allows for easy integration of additional features and functionalities.## π Getting Started
### ποΈ Prerequisites
Before running ConversAI, ensure you have the following dependencies installed:
```bash
apt-get update && apt-get upgrade -y
apt-get install poppler-utils -y
```Additionally, you need to set up your environment variables for the GROQ API:
1. Sign up on [Groq](https://www.groq.com) and obtain your API key.
2. Set the `GROQ_API_KEY` in your environment variables.### Installation
1. Clone the repository:
```bash
git https://github.com/Tech-C-P/ConversAI.git
cd ConversAI
```2. Install the required packages:
```bash
pip install -r requirements.txt
```### Running the Application
To launch the application, run the following command:
```bash
python app.py
```The Gradio interface will open in your default web browser.
### Important Configuration Note
In case a GPU is unavailable, please modify the `config.ini` file as follows:
- Under the `[EMBEDDINGS]` section, change:
```ini
device = cuda
```
to:
```ini
device = cpu
```- Under the `[EASYOCR]` section, change:
```ini
gpu = true
```
to:
```ini
gpu = false
```These adjustments will ensure that the application runs smoothly on CPU resources.
### π Important Note
After using the interface, be sure to click the "Clear" button to reset the fields. This is crucial because session
management has not been implemented in this version, and failing to clear inputs may lead to unintended data persistence
during subsequent interactions.## ποΈ Directory Structure
Here's a comprehensive view of the project's directory tree:
```
ConversAI/
βββ app.py # Main application file
βββ requirements.txt # Required Python packages
βββ src/ # Source code directory
β βββ components/ # Component modules
β β βββ loaders/ # Data loaders
β β β βββ pdfLoader.py
β β β βββ websiteCrawler.py
β β β βββ youtubeLoader.py
β β βββ rag/ # Retrieval-Augmented Generation components
β β β βββ RAG.py
β β βββ utils/ # Utility functions and classes
β β β βββ functions.py
β β β βββ exceptions.py
β β β βββ logging.py
β β βββ vectors/ # Vector storage and processing
β β βββ vectorstore.py
β βββ pipelines/ # Pipeline logic for data processing
β β βββ completePipeline.py
β βββ settings/ # Configuration files
β βββ config.ini
β βββ params.yaml
βββ README.md # Project documentation
```## π₯ Demo
### Text Input Interface
![Text Input](demo/TextInterface.png)
### PDF Upload Interface
![PDF Upload](demo/PDFInterface.png)
### Web Crawler Interface
![Web Crawler](demo/WebsiteInterface.png)
## π Conclusion
ConversAI is more than just a tool; itβs a comprehensive solution for managing and extracting insights from a multitude
of document formats and web sources. With its powerful capabilities and user-friendly interface, ConversAI is poised to
make information retrieval and processing easier and more efficient than ever before.Sure! Hereβs an updated section to include your contributions and acknowledgments:
---
## π€ Author Information
| Author | π§ Email | π» GitHub | π LinkedIn | π¦ Twitter | π Portfolio |
|------------------|-------------------------------------|-----------------------------------------------------|-------------------------------------------------------|---------------------------------------------------|------------------------------------------------------|
| **Rauhan Ahmed** | [email protected] | [RauhanAhmed](https://github.com/RauhanAhmed) | [Rauhan Ahmed](https://www.linkedin.com/in/rauhan-ahmed) | [@ahmed_rauh46040](https://x.com/ahmed_rauh46040) | [rauhanahmed.org](https://rauhanahmed.org) |
| **Ishwor Subedi**| [email protected] | [ishworrsubedii](https://github.com/ishworrsubedii) | [Ishwor Subedi](https://www.linkedin.com/in/ishworrsubedii) | [@ishworr__](https://x.com/ishworr_) | [ishwor-subedi.com.np](https://ishwor-subedi.com.np) |
| **Faham** | [email protected] | [Faham](https://github.com/iamfaham) | [Faham](https://www.linkedin.com/in/iamfaham) | [@faham_twitter](https://x.com/iamfaham) | [iamfaham.com](https://iamfaham.netlify.app) |## π Acknowledgments
This project was developed while working as an AI Engineer
at [Tech Consulting Partners](https://www.techconsultingpartners.com). I built ConversAI from scratch, implementing
advanced document retrieval methods, reranking techniques, hybrid search methodologies, multiple integrations with large
language models (LLMs), and lots of other complex functionalities.The backend includes user management features, sophisticated data storage solutions (including S3 storage management),
database management, and vector databases. The deployment strategy leverages robust APIs, Docker containers, CI/CD
practices, model monitoring, and cloud platform deployment.This open-source prototype serves as a stepping stone towards a more comprehensive project aimed at public good,
showcasing the immense potential of advanced AI technologies in everyday applications. I extend my heartfelt gratitude
to Tech Consulting Partners for entrusting me with this initiative and for their invaluable support throughout the
development process.## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
We hope you enjoy using ConversAI! For any questions or feedback, please reach out via the project repository or email.