{"id":23286348,"url":"https://github.com/ansh-info/paperbrain","last_synced_at":"2026-04-07T16:32:15.575Z","repository":{"id":268797898,"uuid":"899214416","full_name":"ansh-info/PaperBrain","owner":"ansh-info","description":"An intelligent research assistant that combines vector search and LLMs to help you interact with your research papers through natural language queries and receive structured, context-aware responses.","archived":false,"fork":false,"pushed_at":"2024-12-19T00:55:37.000Z","size":1578,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-12T21:36:29.298Z","etag":null,"topics":["artificial-intelligence","docker","json","large-language-models","llama3","llm","machine-learning","meta","mistral","nomic-embed-text","ollama","ollama-api","openai","prompt-engineering","qdrant","query","rag","terminal","vector","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ansh-info.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-05T20:40:45.000Z","updated_at":"2024-12-19T00:55:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"5d3c68fa-47ca-42f8-bd77-93ea8cb94fd1","html_url":"https://github.com/ansh-info/PaperBrain","commit_stats":null,"previous_names":["ansh-info/paperbrain"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ansh-info%2FPaperBrain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ansh-info%2FPaperBrain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ansh-info%2FPaperBrain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ansh-info%2FPaperBrain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ansh-info","download_url":"https://codeload.github.com/ansh-info/PaperBrain/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247503133,"owners_count":20949383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","docker","json","large-language-models","llama3","llm","machine-learning","meta","mistral","nomic-embed-text","ollama","ollama-api","openai","prompt-engineering","qdrant","query","rag","terminal","vector","vector-database"],"created_at":"2024-12-20T02:11:34.090Z","updated_at":"2026-04-07T16:32:15.521Z","avatar_url":"https://github.com/ansh-info.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PaperBrain\n\nPaperBrain is an intelligent research paper Q\u0026A system that combines vector search and large language models to provide context-aware answers to research-related questions. It processes academic papers, understands their content, and generates structured, informative responses with proper citations and context.\n\n#### AI Response\n\n![AI Response](images/ai_response.png)\n\n## ✨ Key Features\n\n### Core Capabilities\n\n- **Smart Vector Search**: Utilizes Qdrant for semantic similarity search of research papers\n- **Intelligent Analysis**: Leverages LLaMA 3.2 for generating comprehensive, context-aware answers\n- **Structured Responses**: Provides organized output with:\n  - Main answer summary\n  - Key points from papers\n  - Paper citations and references\n  - Analysis limitations\n- **Duplicate Detection**: Intelligent tracking of shown papers to avoid repetition\n\n#### Paper Citations\n\n![Paper Citations](images/ai_papercitations.png)\n\n#### Paper Keybopints\n\n![Paper Keybopints](images/ai_keypoint.png)\n\n### Advanced Features\n\n- **Analytics Dashboard**: Track system usage, search patterns, and relevance metrics\n- **Conversation History**: Maintain records of previous queries and responses\n- **Relevance Scoring**: Clear explanation of paper matching with detailed relevance metrics\n- **Interactive Commands**: System controls for analytics, history, and paper tracking\n\n## 🛠️ Technology Stack\n\n- **Vector Store**: Qdrant for efficient similarity search\n- **Embeddings**: Nomic Embed Text for paper vectorization\n- **LLM Integration**: LLaMA 3.2 (1B parameter model) via Ollama\n- **Infrastructure**: Docker containerization\n- **Backend**: Async Python with modern libraries\n- **API Layer**: Async HTTP with HTTPX\n\n#### Qdrant Database\n\n![Qdrant Database](images/qdrant.png)\n\n#### Markdown to Vectors\n\n![Markdown to Vectors](images/vector.png)\n\n## 📋 Prerequisites\n\n```bash\n# System requirements\n- Python 3.9+\n- Docker\n- 4GB+ RAM for LLM operations\n- Disk space for paper storage\n```\n\n## Installation\n\n1. Clone the repository:\n\n```bash\ngit clone https://github.com/ansh-info/PaperBrain.git\ncd PaperBrain\n```\n\n2. Create a virtual environment:\n\n```bash\n# Using conda\nconda create --name PaperBrain python=3.11\nconda activate PaperBrain\n\n# Using venv\npython -m venv env\nsource env/bin/activate  # On Windows: .\\env\\Scripts\\activate\n```\n\n3. Install dependencies:\n\n```bash\npip install -r requirements.txt\n```\n\n4. Start required services:\n\n```bash\ndocker-compose up -d\n```\n\n5. Pull required models:\n\n```bash\n# If you want other models\ndocker exec ollama ollama pull llama3.2:1b\ndocker exec -it ollama ollama pull mistral\ndocker exec -it ollama ollama pull nomic-embed-text\n```\n\n## 💻 Usage\n\n### Paper Ingestion\n\n```bash\npython src/vector.py\n```\n\n- Place your markdown files in the `markdowns/` directory\n- System automatically processes and indexes papers\n- Handles duplicate detection and tracking\n\n### Query Interface\n\n```bash\npython src/llmquery.py     #Run src/query.py to query qdrant database(without llm)\n```\n\n### Available Commands\n\n- `quit` or `q`: Exit the program\n- `analytics`: Display system usage statistics\n- `clear`: Reset paper history\n- `history`: View recent questions and responses\n\n### Example Query Flow\n\n```\n\u003e What are the main approaches for discovering governing equations from data?\n\nThe system will provide:\n1. Main Answer: Comprehensive summary\n2. Key Points: Important findings\n3. Paper Citations: Relevant sources\n4. Limitations: Gaps in current knowledge\n5. Relevance Scores: Why papers were selected\n```\n\n#### Relevant Papers\n\n![Relevant Papers](images/ai_relevantpapers.png)\n\n## 📁 Project Structure\n\n```\nresearch-lens/\n├── docker-compose.yml\n├── requirements.txt\n├── README.md\n├── vector.py      # Paper ingestion and processing\n├── llmquery.py          # Main Q\u0026A interface\n├── query.py          # To query qdrant databse without llm\n├── markdowns/     # Paper storage directory\n└── processed_papers.json # Paper tracking database\n```\n\n## ⚙️ Configuration\n\nEnvironment variables for system configuration:\n\n```bash\nQDRANT_HOST=localhost    # Qdrant server host\nQDRANT_PORT=6333        # Qdrant server port\nOLLAMA_HOST=localhost   # Ollama server host\nOLLAMA_PORT=11434      # Ollama server port\n```\n\n## 🔄 Processing Pipeline\n\n1. **Paper Ingestion**:\n\n   - Reads markdown files from recommendations directory\n   - Generates embeddings using Nomic Embed Text\n   - Stores vectors and metadata in Qdrant\n   - Tracks processed papers to avoid duplicates\n\n2. **Query Processing**:\n\n   - Converts user query to vector\n   - Performs similarity search\n   - Retrieves relevant papers\n   - Generates structured LLM response\n\n3. **Response Generation**:\n   - Formats context for LLM\n   - Generates structured response\n   - Provides relevance explanations\n   - Maintains conversation history\n\n## 🎯 Future Roadmap\n\n- [ ] Export functionality (PDF, markdown)\n- [ ] Advanced paper filtering options\n- [ ] Citation network visualization\n- [ ] Multi-language support\n- [ ] Batch processing capabilities\n- [ ] API interface for integration\n- [ ] Enhanced analytics dashboard\n- [ ] Custom prompt templates\n\n## 🤝 Contributing\n\nContributions are welcome! Please:\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- Qdrant team for vector database\n- Ollama project for LLM interface\n- Nomic AI for embedding model\n- LLaMA team for the base model\n- The Markdowns were fetched using [literatureSurvey](https://github.com/VirtualPatientEngine/literatureSurvey)\n\n## 💡 Citation\n\nIf you use this project in your research, please cite:\n\n```bibtex\n@software{PaperBrain_2024,\n  author = {Ansh Kumar and Apoorva Gupta},\n  title = {PaperBrain: Intelligent Research Paper Q\u0026A System},\n  year = {2024},\n  url = {https://github.com/ansh-info/PaperBrain}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fansh-info%2Fpaperbrain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fansh-info%2Fpaperbrain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fansh-info%2Fpaperbrain/lists"}