Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fabriciocarraro/ai_rag_pdf_search_in_multiple_documents_using_gemma_2_2b_on_colab
https://github.com/fabriciocarraro/ai_rag_pdf_search_in_multiple_documents_using_gemma_2_2b_on_colab
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/fabriciocarraro/ai_rag_pdf_search_in_multiple_documents_using_gemma_2_2b_on_colab
- Owner: fabriciocarraro
- Created: 2024-07-29T17:04:06.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-02T22:15:59.000Z (5 months ago)
- Last Synced: 2024-08-03T00:17:39.414Z (5 months ago)
- Language: Jupyter Notebook
- Size: 3.91 MB
- Stars: 5
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI RAG - PDF Search in multiple documents using Gemma 2 2B on Colab
This project demonstrates a pipeline for extracting, processing, and querying text data from PDF documents on Google Colab using natural language processing (NLP) techniques and Google's open-source model Gemma 2 2B. The system allows users to input a query, which is then answered based on the content of the PDFs.
## Features
- PDF Text Extraction: Extracts text from PDFs using PyPDF2.
- Text Chunking: Splits extracted text into manageable chunks.
- Embedding Generation: Uses SentenceTransformer to convert text chunks into embeddings.
- FAISS Indexing: Builds a FAISS index for efficient similarity search.
- Query Matching: Finds the most similar text chunks to a user query.
- Response Generation: Uses a transformer model to generate responses based on the most relevant chunks.## Parameters
The PDFs must be uploaded to a folder called "PDFs" inside your /content on Google Colab. If you run the code locally, make sure you change it to your desired path.