Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/siddhesh-agarwal/straindb-rag
An RAG Model based on StrainsDB by Kenneth Reitz.
https://github.com/siddhesh-agarwal/straindb-rag
langchain openai python rag-model streamlit
Last synced: 24 days ago
JSON representation
An RAG Model based on StrainsDB by Kenneth Reitz.
- Host: GitHub
- URL: https://github.com/siddhesh-agarwal/straindb-rag
- Owner: Siddhesh-Agarwal
- Created: 2024-03-30T15:20:58.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-11-20T07:41:12.000Z (3 months ago)
- Last Synced: 2024-11-20T09:02:22.370Z (3 months ago)
- Topics: langchain, openai, python, rag-model, streamlit
- Language: Jupyter Notebook
- Homepage: https://straindb.streamlit.app/
- Size: 69.9 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# StrainDB RAG
An RAG Model based on [StrainsDB](https://strainsdb.org/). The Database for the project was provided by [Kenneth Reitz](https://github.com/kennethreitz).
___
## Steps involved
### Extracting and Validating the Data
- The SQLite DataBase consisted of 19 tables from which the `strains_strain` table was extracted.
- **Solution**: extract data using SQLite module and create a `Strain` class (that inherits `pydantic.BaseModel`) for data validation
- Then the table's data cells consisted of lists in string format which needed to be converted back
- **Solution**: `eval("[1, 2, 3]")` returns `[1, 2, 3]`
- The data needs to be saved.
- **Solution**: dump data in JSON format.### Data Tokenization
- The Data was embedded into ChromaDB (persistent client) using the `OpenAIEmbeddings` function.
- **Solution**: `OpenAIEmbeddings` function from the `langchain_openai` package.### Retrieval
- The data was retrieved from the ChromaDB using the `Chroma` class.