https://github.com/codewithcharan/pharmaquery
Pharmaceutical Insight Retrieval System designed to help users gain meaningful insights from research papers and documents in the pharmaceutical domain.
https://github.com/codewithcharan/pharmaquery
chromadb embeddings google-generative-ai langchain pharmaceutical-science python3 research retrieval-augmented-generation semantic-search sentence-transformers
Last synced: 3 months ago
JSON representation
Pharmaceutical Insight Retrieval System designed to help users gain meaningful insights from research papers and documents in the pharmaceutical domain.
- Host: GitHub
- URL: https://github.com/codewithcharan/pharmaquery
- Owner: CodeWithCharan
- License: mit
- Created: 2025-01-10T20:57:47.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-01-27T18:20:53.000Z (8 months ago)
- Last Synced: 2025-04-20T16:56:19.398Z (6 months ago)
- Topics: chromadb, embeddings, google-generative-ai, langchain, pharmaceutical-science, python3, research, retrieval-augmented-generation, semantic-search, sentence-transformers
- Language: Jupyter Notebook
- Homepage:
- Size: 7.71 MB
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PharmaQuery
## Overview
PharmaQuery is an advanced Pharmaceutical Insight Retrieval System designed to help users gain meaningful insights from research papers and documents in the pharmaceutical domain.## PharmaQuery Architecture
## Demo
https://github.com/user-attachments/assets/c12ee305-86fe-4f71-9219-57c7f438f291## Features
- **Natural Language Querying**: Ask complex questions about the pharmaceutical industry and get concise, accurate answers.
- **Custom Database**: Upload your own research documents to enhance the retrieval system's knowledge base.
- **Similarity Search**: Retrieves the most relevant documents for your query using AI embeddings.
- **Streamlit Interface**: User-friendly interface for queries and document uploads.## Technologies Used
- **Programming Language**: [Python 3.10+](https://www.python.org/downloads/release/python-31011/)
- **Framework**: [LangChain](https://www.langchain.com/)
- **Database**: [ChromaDB](https://www.trychroma.com/)
- **Models**:
- Embeddings: [Google Gemini API (embedding-001)](https://ai.google.dev/gemini-api/docs/embeddings)
- Chat: [Google Gemini API (gemini-1.5-pro)](https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-pro)
- **PDF Processing**: [PyPDFLoader](https://python.langchain.com/docs/integrations/document_loaders/pypdfloader/)
- **Document Splitter**: [SentenceTransformersTokenTextSplitter](https://python.langchain.com/api_reference/text_splitters/sentence_transformers/langchain_text_splitters.sentence_transformers.SentenceTransformersTokenTextSplitter.html)## Requirements
1. **Clone the Repository**:
```bash
git clone https://github.com/CodeWithCharan/PharmaQuery.git
cd PharmaQuery
```2. **Install Dependencies**:
```bash
pip install -r requirements.txt
```3. **Set Up Environment Variables**:
Create a `.env` file in the project root directory with the following variables:```bash
GOOGLE_API_KEY="your_google_gemini_api_key"
````Note:` Replace `your_google_gemini_api_key` with actual key.
4. **Run the Application**:
```bash
streamlit run app.py
```5. **Use the Application**:
- Enter your query in the main interface.
- Optionally, upload research papers in the sidebar to enhance the database.