Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aryanxxvii/codera
Hybrid-RAG on Code Files using LangChain, ChromaDB with Code Llama 7B
https://github.com/aryanxxvii/codera
Last synced: 7 days ago
JSON representation
Hybrid-RAG on Code Files using LangChain, ChromaDB with Code Llama 7B
- Host: GitHub
- URL: https://github.com/aryanxxvii/codera
- Owner: aryanxxvii
- Created: 2024-09-04T22:39:46.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-01-26T03:33:00.000Z (11 days ago)
- Last Synced: 2025-01-26T04:21:37.671Z (11 days ago)
- Language: Python
- Homepage:
- Size: 6.84 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Codera: Hybrid RAG for Code QnA
## Project Overview
Codera is a Hybrid Retrieval Augmented Generation (RAG) system built to help query code files efficiently. It uses LangChain for retrieval logic and Chroma as the vector-store to store document embeddings. Codera combines different search techniques to retrieve relevant code snippets based on queries.## Architecture
- **LangChain:** Handles the retrieval logic.
- **Chroma Vector-Store:** Stores document embeddings for vector-based search.
- **Hybrid Search System:**
- **HyDE Generation:** Helps generate more context for better search results.
- **Vector Retriever:** Fetches documents based on vector similarity.
- **BM25 Keyword Retriever:** Performs keyword-based search to supplement the vector search.
- **Code Llama 7B:** A large language model from HuggingFace, used to process and respond to code-related queries.The system also uses structured output for language detection to make sure the right approach is used based on the code’s language.
## Why This Architecture?
- **Hybrid Retrieval:** Using both vector-based and keyword-based retrieval improves the relevance of search results.
- **Efficient Querying:** Chroma helps retrieve documents quickly, even with larger codebases.
- **Code-Specific LLM:** Code Llama 7B is optimized for working with code, making it a good fit for handling code-related queries.## Use Case
Codera is useful for querying large codebases, understanding code, or finding specific functions and methods without manually searching through files.