{"id":23410950,"url":"https://github.com/zezs/implementing-retrival-augmented-generation","last_synced_at":"2026-05-10T16:05:14.892Z","repository":{"id":250058116,"uuid":"832712570","full_name":"zezs/Implementing-Retrival-Augmented-Generation","owner":"zezs","description":"Learning and Implementing INGESTION, RETRIVAL-AUGMENTED-GENERATION. LLMS | PINECONE | LANGCHAIN | LANGSMITH |  ","archived":false,"fork":false,"pushed_at":"2024-07-29T19:58:58.000Z","size":1564,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-09T03:43:22.846Z","etag":null,"topics":["chains","embeddings","langchain-python","langsmith","llms","openai","pineconedb","prompt-engineering","python","vectordb"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zezs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-23T15:12:28.000Z","updated_at":"2024-07-29T19:59:01.000Z","dependencies_parsed_at":"2024-07-27T14:29:22.265Z","dependency_job_id":"63b0dddd-88ac-4225-9db7-051e343c7a71","html_url":"https://github.com/zezs/Implementing-Retrival-Augmented-Generation","commit_stats":null,"previous_names":["zezs/implementing-retrival-augmented-generation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zezs/Implementing-Retrival-Augmented-Generation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zezs%2FImplementing-Retrival-Augmented-Generation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zezs%2FImplementing-Retrival-Augmented-Generation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zezs%2FImplementing-Retrival-Augmented-Generation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zezs%2FImplementing-Retrival-Augmented-Generation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zezs","download_url":"https://codeload.github.com/zezs/Implementing-Retrival-Augmented-Generation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zezs%2FImplementing-Retrival-Augmented-Generation/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269447899,"owners_count":24418754,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-08T02:00:09.200Z","response_time":72,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chains","embeddings","langchain-python","langsmith","llms","openai","pineconedb","prompt-engineering","python","vectordb"],"created_at":"2024-12-22T17:53:43.860Z","updated_at":"2026-05-10T16:05:09.858Z","avatar_url":"https://github.com/zezs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Overview\n\n![image](https://github.com/user-attachments/assets/43eed5cc-7322-4b0f-9ea1-cfae47199d49)\n\nLoad -\u003e split -\u003e Embedd -\u003e store in vectoreBD\n- Loading the medium blog- (TextLoader)\n- Splitting the blog into smaller chunks- (TextSplitter)\n- Embed the chunks and get vectors- (OpenAIEmbeddings)\n- Store the embeddings in Pinecone vectorstore (PineconeVectorStore)\n\n\n![RAG steps](https://github.com/user-attachments/assets/3b972ad9-38c5-4761-a845-8ecde675a2f7)\nSource: AUTHOR\n\n\n## Why are TextLoaders needed ?\n- LLMs take text as input \n- But what if we want to process text from wahtsappp message, gogole drive, notion notebook, any pdf online etc...\n- All the above mentoned sources are basically text\n- But the come inn different format and have different semantic meaning\n- So doc loaders are classes jmplemetation on to process and load different data and make it digestable by the LLMs\n\n![image](https://github.com/user-attachments/assets/bd817abe-c80b-4e7b-823c-7903c98f8e8a)\n\nDescription: some of doc loaders provided by Langchain (source: Langchain official docs)\n\n## TextSplitters ?\n![image](https://github.com/user-attachments/assets/2312d8d4-55d6-427f-b004-31be56403d86)\n\nsource: Langchain official docs\n\n## Embeddings\nsentences(text) ----\u003e  [encoder/ embedding model] ----\u003e O O O O O (vector spaces)\n\n### This is what embeddings in vector space looks like\nvector spaces can be visualized in 2D or 3D for simplicity, Pinecone primarily operates in high-dimensional spaces to effectively handle the complexities of modern machine learning data.\n![image](https://github.com/user-attachments/assets/3c0d4c53-8b1b-436c-90b9-ef3ef091e685)\n\nSOURCE: AUTHOR\n\n- If the emneddings are placed closer that means they have similar sematic meaning and are related to each other\n- The red cluster is the query which is convertedd to text embedding(from text format)\n- The red cluster is then placed in vector space\n- The vector near the red vector(question) could have the answer to the query\n- Then the relevant vector are calculated using cosine/ euclidean formula\n- Shorter the distance more relevant the info to the question\n- Finally, the relevant vecotrs with significant semantic meaning are converted into text(context) \n-  text spilt into chunks and augmented into prompt\n-  BEFORE RETRIVAL: PROMPT -\u003e Query\n-  AFTER RETRIVAL: PROMPT -\u003e Query + Context(chunks)\n\n## LANGSMITH\nUnderstanding workflow and viewing logs with langsmith\n![image](https://github.com/user-attachments/assets/31896887-95f6-4ac3-8832-5afccc6f3af4)\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzezs%2Fimplementing-retrival-augmented-generation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzezs%2Fimplementing-retrival-augmented-generation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzezs%2Fimplementing-retrival-augmented-generation/lists"}