{"id":26506626,"url":"https://github.com/thijse/memoryvectorstore","last_synced_at":"2025-03-20T22:55:55.285Z","repository":{"id":192450152,"uuid":"686741179","full_name":"thijse/MemoryVectorStore","owner":"thijse","description":"Sample of implementing a simple in-memory vector store","archived":false,"fork":false,"pushed_at":"2023-12-02T20:49:51.000Z","size":3288,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-12-02T21:26:57.051Z","etag":null,"topics":["chatgpt-api","csharp","embeddings","large-language-models","openai","vector","vectorstore"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thijse.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-09-03T19:37:53.000Z","updated_at":"2023-12-01T14:47:07.000Z","dependencies_parsed_at":"2023-11-29T16:43:25.312Z","dependency_job_id":null,"html_url":"https://github.com/thijse/MemoryVectorStore","commit_stats":null,"previous_names":["thijse/memoryvectorstore"],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thijse%2FMemoryVectorStore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thijse%2FMemoryVectorStore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thijse%2FMemoryVectorStore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thijse%2FMemoryVectorStore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thijse","download_url":"https://codeload.github.com/thijse/MemoryVectorStore/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244706526,"owners_count":20496571,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt-api","csharp","embeddings","large-language-models","openai","vector","vectorstore"],"created_at":"2025-03-20T22:55:54.369Z","updated_at":"2025-03-20T22:55:55.276Z","avatar_url":"https://github.com/thijse.png","language":"C#","readme":"[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n\n# Memory Vector Store\nSample of implementing a simple in-memory vector store\n\nThe repository contains three main projects: \n- Memory Vector Store project, which focuses on storing vectors in memory;\n- Chunk Creator project, which extracts vectors from PDF files;\n- Sample Search project, which demonstrates how to perform similarity searches using the stored vectors. Each project has its own set of code and resources, allowing you to explore and understand the implementation details.\n\n## Code example\n\nFirst we need to make chunks of the original PDF and build the embedding vectors\n\n```cs\n//  OpenAI service that we are going to use for embedding   \n_openAiService = new OpenAIService(new OpenAiOptions()  {ApiKey = apiKey });\n\n// Set up a MemoryVector database, to be filled with chunks of documents\n// including an embedding vector of 1536 dimensions\n// Also included is a callback that embeds any text item into a vector\n_vectorCollection = new MemoryVectorDB.VectorDB\u003cChunk\u003e(1536, ChunkEmbedingAsync);\n\n// Get text fom pdf \n_document = PdfTextExtractor.GetText(documentPath);\n\n// Generate chunks of 200 words and an overlap of 100 words \n_chunkGenerator = new ChunkGenerator(200, 100, _document);\n\n// Loop through chunks\nforeach (var chunk in _chunkGenerator.GetChunk())\n{\n    // Add the source reference to the chunk\n    chunk.Source = documentPath;\n\n    // Add the chunk to the vector store\n    await _vectorCollection.AddAsync(chunk);\n\n    // We remove the text from the chunk to safe memory: \n    // we just need the vector, start index, length and source\n    // so we can recover the the chunk from the original document later\n    chunk.Text = null!;\n}\n```\n\nNow we can find the best matching chunks related to our query\n\n```cs\n// First we make a vector of the query like we have done for the chunks of the documents\nvar queryVector = query.GetVector();\n\n// Next find the 10 closest vectors to the query vector\nvar bestMatches = _vectorCollection.FindNearestSorted(queryVector, 10);\n\n// And here they are\nforeach (var item in bestMatches)\n{\n    ShowMatch(item.Value, queryVector);                    \n} \n```\nNote that the FindNearestSorted is just a brute-force comparison of the (normalized) dot products between the query vector and all chunk vectors. For larger vector stores,  a database should be used that implements an indexing system for efficient nearest neighbour searches  [using something like this library](https://github.com/curiosity-ai/hnsw-sharp)\n\nFinally we want a conversational network to interpret the chunks and answer the question\n\n```cs\n// Format the query to post to the LLM:\nqueryBuilder.AppendLine(\n$\"Answer the following query {query}. Only use the content below to construct the answer, \nuse the page numbers as reference. If no content is shown below or if it is not applicable,\nanswer: \\\"Sorry, I have no data on that\\\" \\n\\n\");\n\");\n\n// Insert the best matches into the same query\nforeach (var match in bestMatches)\n{\n    var chunk = match.Value;\n    queryBuilder.AppendLine(_document?.Text.Substring(chunk.StartCharNo, chunk.CharLength)+\"\\n\" ?? \"\");\n}\n\n// Ask OpenAIs ChatCompletion API to answer the query\nvar completionResult = await _openAiService.ChatCompletion.CreateCompletion(new ChatCompletionCreateRequest\n{\n    Messages = new List\u003cChatMessage\u003e\n    {\n        ChatMessage.FromSystem(\"Your are an AI assistant. The assistant is helpful, factual and friendly.\"), \n        ChatMessage.FromUser(queryBuilder.ToString()),\n    },\n    Model = Models.Gpt_3_5_Turbo,\n});\n```\n\nA query on the (free) book \"Robinson-Crusoe-in-Levels.PDF\" \n\n\u003eWhat has the book to say about Canibals and hiding bodies?\n\nthus results in an answer similar to (the response is not fully deterministic)\n\n\u003eAccording to the book, \"Robinson Crusoe,\" there are several references to cannibals and hiding bodies. On page 39, the protagonist and his companion, named Friday, come across the bodies of dead cannibals.\n\u003eInitially, Friday wants to eat the bodies, but the protagonist shows him that it is not appropriate. They proceed to bury the cannibals and collect their body parts, preparing a large fire to burn them.\n\u003e\n\u003eLater on page 39, after hiding the bodies in the forest, the protagonist takes Friday to his secret cave for safety. The protagonist describes Friday as a young, strong man with dark skin and long black hair.\n\u003e\n\u003eAnother mention of cannibals is on page 43, when the protagonist and Friday observe cannibals eating a prisoner and plan to rescue the remaining prisoner, who is European. They shoot at the cannibals and manage to save the prisoner.\n\u003e\n\u003eIn relation to hiding bodies, on page 39, the protagonist and Friday hide the bodies of the cannibals in the forest near the beach before moving deeper into the forest. It was crucial for them to hide the bodies to avoid attracting attention..`\n\n## Acknowledgements\n\n This code is strongly inspired by the blog post titled [\"Vector Search with C#: A Practical Approach for Small Datasets.\"](https://crispycode.net/vector-search-with-c-a-practical-approach-for-small-datasets/) \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthijse%2Fmemoryvectorstore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthijse%2Fmemoryvectorstore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthijse%2Fmemoryvectorstore/lists"}