https://github.com/smpy2002/curioveda---powered-by-ai
AI-Powered Chatbot for Scrapped article analysis, and question answering.(Click on below link to explore it)
https://github.com/smpy2002/curioveda---powered-by-ai
chatbot langchain nlp rag
Last synced: about 2 months ago
JSON representation
AI-Powered Chatbot for Scrapped article analysis, and question answering.(Click on below link to explore it)
- Host: GitHub
- URL: https://github.com/smpy2002/curioveda---powered-by-ai
- Owner: SMPY2002
- License: mit
- Created: 2024-12-31T13:37:57.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-11T11:29:22.000Z (over 1 year ago)
- Last Synced: 2025-02-23T15:15:35.456Z (over 1 year ago)
- Topics: chatbot, langchain, nlp, rag
- Language: Python
- Homepage: https://curioveda---powered-by-ai.streamlit.app/
- Size: 722 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CurioVeda - RAG-Based Chatbot for Efficient Analysis of Articles, News, Reports
## Overview
This project is my **final-year major project** that implements a chatbot capable of efficiently answering user queries based on articles, reports, and news scraped from user-provided URLs. The chatbot offers **multilingual support (Currently working on it)** and provides **graphical analysis(Currently working on it)** of tabular data to facilitate better insights.
The primary objective of this project is to significantly improve article analysis by extracting critical insights quickly and accurately. The project demonstrates advanced capabilities in web scraping, natural language processing (NLP), and data visualization.
---
## Key Features
1. **Efficient Web Scraping**:
- Scrapes content from user-provided URLs, including JavaScript-heavy websites, using Selenium.
- Extracts text from images in articles using OCR with Tesseract.
2. **Data Preprocessing**:
- Basic preprocessing of scraped data for optimal analysis.
- Recursive text splitting with LangChain for manageable chunk sizes.
3. **Generative AI for Querying**:
- Uses Google Generative AI embeddings to convert text into fixed-length vectors.
- Stores processed content in a **FAISS Vector Store** for efficient similarity searches.
- Employs LLaMA text generation model for accurate and context-relevant answers based on similarity search results.
4. **Multilingual Support**:
- Provides responses in multiple languages, enabling accessibility for diverse users.
5. **Graphical Analysis**:
- Analyzes tabular data and generates graphical visualizations to present insights in an intuitive format.
---
## Thought Process Behind the Project
This project was designed with the following considerations:
1. Enable efficient and comprehensive data extraction from user-provided URLs, including:
- Dynamic content (JavaScript-heavy websites).
- Embedded image-based text using OCR techniques.
2. Enhance the quality of responses by applying advanced preprocessing techniques and leveraging LangChain for effective text chunking.
3. Utilize powerful AI models (e.g., Google Generative AI embeddings and LLaMA) for robust similarity search and context-aware responses.
4. Empower users with multilingual interactions and graphical insights for tabular data, making the chatbot a versatile tool for analysis.
---
## Tech Stack
### **Languages and Frameworks**:
- Python
- Streamlit (for hosting and front-end interface)
### **Libraries**:
- **Web Scraping**: Selenium, BeautifulSoup
- **OCR**: Tesseract
- **NLP**: LangChain, LLaMA, FAISS Vector Store, Google Generative AI
- **Data Visualization**: Matplotlib, Seaborn, Plotly
### **Tools**:
- GitHub (Version Control)
- Streamlit Cloud (Hosting)
---
## Setup Instructions
1. **Clone the Repository**:
```bash
git clone https://github.com/SMPY2002/CurioVeda---Powered-by-AI.git
cd CurioVeda---Powered-by-AI
```
2. **Install Dependencies**:
```bash
pip install -r requirements.txt
```
3. **Run the Application**:
```bash
streamlit run app.py
```
**Preview**


5. **Usage**:
- Input the URLs containing articles/reports/news.
- Query the chatbot in your preferred language.
- View graphical insights for any tabular data provided.
---
## Future Scope
- Add support for more advanced AI models and embedding techniques.
- Expand multilingual capabilities to include more languages.
- Integrate real-time streaming data analysis.
- Enhance graphical analysis features to include predictive insights.
- Optimize the backend for faster query responses and lower resource usage.
---
## License
This project is licensed under the [MIT License](LICENSE). Feel free to use, modify, and distribute this project as per the license terms.
---
## Contact
For any queries or suggestions, please reach out via:
- Email:
- LinkedIn: [Shivam Pandey](https://www.linkedin.com/in/shivam-pandey1405)