{"id":41126006,"url":"https://github.com/dalager/raggedbooks","last_synced_at":"2026-01-22T18:02:53.042Z","repository":{"id":271140848,"uuid":"912389828","full_name":"dalager/raggedbooks","owner":"dalager","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-28T21:02:03.000Z","size":1269,"stargazers_count":0,"open_issues_count":5,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-28T22:19:55.190Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dalager.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-05T12:52:14.000Z","updated_at":"2025-01-28T21:02:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"7691f53e-18fe-46b2-a1a2-7e8b82828b6f","html_url":"https://github.com/dalager/raggedbooks","commit_stats":null,"previous_names":["dalager/raggedbooks"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dalager/raggedbooks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dalager%2Fraggedbooks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dalager%2Fraggedbooks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dalager%2Fraggedbooks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dalager%2Fraggedbooks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dalager","download_url":"https://codeload.github.com/dalager/raggedbooks/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dalager%2Fraggedbooks/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28667881,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T17:07:18.858Z","status":"ssl_error","status_checked_at":"2026-01-22T17:05:02.040Z","response_time":144,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-22T18:02:21.374Z","updated_at":"2026-01-22T18:02:53.004Z","avatar_url":"https://github.com/dalager.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RAGged books\n\nThis is a small POC project used for searching a local book collection with semantic search.\n\nIt demonstrates\n\n1. Extracting text from PDF files and creating a Vector store with embeddings\n2. Searching the vector store with a query and returning the most relevant results\n3. Using the RAG (Retrieval Augmented Generation) technique to provide groundings for the question to given to a LLM.\n4. Creating a simple desktop application for running the RAG tool as the contents need to stay on the local machine.\n\nOther more general tools and RAG frameworks exist that can do this, but this is an example of how to apply it to a specific use case.\n\n(**Note**: I have bought these books and the content is not removed from my computer. The embeddings are generated and stored locally on the same computer and not shared or exposed to the internet.)\n\n![Screenshot of WPF App](screenshot.png)\n\n## Requirements\n\n- .NET 9 (for running the CLI and the MAUI app)\n- Docker (for running Ollama and QDrant)\n- [.NET MAUI SDK](https://dotnet.microsoft.com/en-us/apps/maui) (for building the desktop app, install with `dotnet workload install maui-desktop`)\n\nOllama and Quadrant must be started in a docker container with the following commands:\n\n```bash\ndocker compose up -d\n```\n\nIf NOT running the GUI app, but only the CLI Execute these as well while the containers are running.\n\n```powershell\ndocker compose exec ollama sh -c 'ollama pull mxbai-embed-large'\ndocker compose exec ollama sh -c 'ollama pull qwen2:0.5b'\n```\n\n### Loading the Embedding and ChatCompletion models\n\nThe project uses two models, one for embeddings and one for chat completion.\n\nAt the moment, the configuration defaults to running both models locally and using the [`mxbai-embed-large`](https://ollama.com/library/mxbai-embed-large) and [`qwen2:0.5b`](https://ollama.com/library/qwen2:0.5b) models.\n\nThis strikes a balance between speed and quality, but you can change the models in the `Appsettings.json` file.\n\nWhen starting the desktop application, the configured models will be pulled in the background, but you can also do this manually:\n\n```bash\ndocker compose exec ollama sh -c 'ollama pull mxbai-embed-large'\ndocker compose exec ollama sh -c 'ollama pull qwen2:0.5b'\n```\n\nIf you want to use different models, you can change the `EmbeddingModel` in the `Appsettings.json` file -- note that if you change your embedding model you will have to re-import the books to get the correct embeddings and change the vector dimension in the Appsettings as well, as they differ from model to model.\n\nYou can also go for an Cloud Hosted ChatModel, see Appsettings\n\n## CLI Usage\n\n### Importing books\n\nTo import books from the `data` folder, run the following command:\n\n```bash\ndotnet run import-folder \"../../data\"\n```\n\nIt will\n\n- Extract the text and chapter structure from the books\n- Create embeddings vectors for chunked data\n- Store the chunks with embeddings in the QDrant vector database\n\nAdding the `-delete` flag to the command will delete the existing books from the QDrant store before importing the new ones.\n\n### Searching books\n\nTo search books, run the following command:\n\n```bash\ndotnet run search \"what is an ADR?\"\n```\n\nIt will give you the first result with the book title and the chapter and page where the search query was found.\n\n#### Show matching content\n\nIf you want to see the matching content, add the `-content` flag:\n\n```bash\ndotnet run search \"How do I define coupling vs cohesion?\" -content\n```\n\nIt will output something like this, where the `Key` property is the UUID of the chunk in the QDrant vector store:\n\n```plaintext\nSearch score: 0,8235746622085571\nKey: e96efea5-6cd2-4ef2-bf5e-168cfce20dd1\nBook: buildingmicroservices2ndedition\nChapter: Types of Coupling\nPage: 65\nContent:\nCoupling and cohesion are strongly related and, at some level at least, are arguably\nthe same in that both concepts describe the relationship between things. Cohesion\napplies to the relationship between things inside a boundary (a microservice in our\ncontext), whereas coupling describes the relationship between things across a bound‐\nary. There is no absolute best way to organize our code; coupling and cohesion are\njust one way to articulate the various trade-offs we make around where we group\ncode, and why. All we can strive to do is to find the right balance between these two\nideas, one that makes the most sense for your given context and the problems you are\ncurrently facing.\nRemember, the world isn't static—it's possible that as your system requirements\nchange, you'll find reasons to revisit your decisions. Sometimes parts of your system\nmay be going through so much change that stability might be impossible. We'll look\nat an example of this in Chapter 3 when I share the experiences of the product devel‐\nopment team behind Snap CI.\nTypes of Coupling\nYou could infer from the preceding overview above that all coupling is bad. That isn't\nstrictly true. Ultimately, some coupling in our system will be unavoidable. What we\nwant to do is reduce how much coupling we have.\n```\n\n(The matching book is \"Building Microservices\" by Sam Newman, \u003chttps://samnewman.io/books/building_microservices_2nd_edition/\u003e )\n\n#### Open the page in the browser\n\nIf you want to open the referenced page in the book, add the `-open` flag:\n\n```bash\ndotnet run search \"Should I mock a third party REST api during development?\" -open\n```\n\nIt will open the pdf file in Chrome with an appended `#page=123` anchor, which should take you to the correct page.\n\nThis last part requires you to have put the Chrome executable path in the `Appsettings.json` file.\n\n### RAG (Retrieval Augmented Generation)\n\nUsing the RAG technique, you can send the content blocks from the searchresults to a chat model as a grounding context or summarization scope if you like to think of it that way.\n\nYou just need to add the `-rag` flag to the search command:\n\n```bash\ndotnet run search \"Should I mock a third party REST api during development?\" -rag\n```\n\nThis will stitch the content blocks from the 5 best search results together and send that along to the chat model along with your question.\n\nThe answer will be printed to the console as markdown.\n\n## Running the desktop app\n\nThe app in the screenshot above is a [.NET MAUI app](https://dotnet.microsoft.com/en-us/apps/maui) - the current cross platform UI framework from Microsoft. It should be possible to run it on Windows and Mac OS, but my current workstation is windows, so I might need some help in getting it running on Mac OS.\n\nYou can run from within your IDE or with the following command:\n\n```bash\ndotnet run --project .\\RaggedBooks.MauiClient\\RaggedBooks.MauiClient.csproj --framework net9.0-windows10.0.19041.0\n```\n\nOn Mac OS, it might be possible to run it with this framework:\n\n```bash\ndotnet run --project .\\RaggedBooks.MauiClient\\RaggedBooks.MauiClient.csproj --framework net9.0-maccatalyst\n```\n\nHowever it is not tested on Mac OS yet, and the \"open pdf in browser\" feature will probably not work yet (see [issue #3](https://github.com/dalager/raggedbooks/issues/8))\n\nOn startup the app will prompt Ollama to pull the configured embedding and chat models, if not already present.\n\nIt will, however need the docker compose services running.\n\n## The required services\n\nThe easiest way to get the required services up and running is to use the provided docker-compose file.\n\n```bash\ndocker compose up -d\n```\n\nOtherwise you can do it manually:\n\n### QDrant\n\n```bash\ndocker run -d --name qdrant -p 6333:6333 -p 6334:6334 qdrant/qdrant:latest\n```\n\n\u003chttp://localhost:6333/dashboard#/welcome\u003e\n\n### Ollama\n\nDownload or use the docker image from \u003chttps://ollama.com/\u003e and pull the `nomic-embed-txt` model.\n\nWith docker: \u003chttps://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image\u003e\n\n### Cloud models\n\nIf you want to use a cloud hosted model and have a private deployment of the models in a geographically acceptable location (GDPR might be a thing), you can use an OpenAI model or another service that connects to Sematic Kernel.\n\nI have tried with an Azure OpenAI service and deployed a chat model there, and it works well.\n\n1. Create Azure resource group\n2. Add an Azure Open AI Service\n3. Deploy a gpt4o or similar chat completion model to the service\n\nNote that this approach is not the preferred way to use this tool as it exposes the content to the cloud, and the whole point of this experiment is to keep the content local.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdalager%2Fraggedbooks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdalager%2Fraggedbooks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdalager%2Fraggedbooks/lists"}