{"id":32721002,"url":"https://github.com/kiquetal/bootdev-rag-course","last_synced_at":"2026-05-12T23:36:31.396Z","repository":{"id":319329176,"uuid":"1078379559","full_name":"kiquetal/bootdev-rag-course","owner":"kiquetal","description":"Compaining files for the course \"Learn Retrieval Augmented Generation\" from boot.dev","archived":false,"fork":false,"pushed_at":"2025-11-01T15:13:29.000Z","size":50,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-02T20:08:56.217Z","etag":null,"topics":["bootdotdev","python","rag"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kiquetal.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-17T16:30:23.000Z","updated_at":"2025-11-01T15:13:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"354c7219-109d-4819-86e8-b860028595e7","html_url":"https://github.com/kiquetal/bootdev-rag-course","commit_stats":null,"previous_names":["kiquetal/bootdev-rag-course"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kiquetal/bootdev-rag-course","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiquetal%2Fbootdev-rag-course","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiquetal%2Fbootdev-rag-course/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiquetal%2Fbootdev-rag-course/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiquetal%2Fbootdev-rag-course/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kiquetal","download_url":"https://codeload.github.com/kiquetal/bootdev-rag-course/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kiquetal%2Fbootdev-rag-course/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32961785,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-12T23:30:32.555Z","status":"ssl_error","status_checked_at":"2026-05-12T23:30:18.191Z","response_time":102,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bootdotdev","python","rag"],"created_at":"2025-11-02T20:01:27.947Z","updated_at":"2026-05-12T23:36:31.392Z","avatar_url":"https://github.com/kiquetal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"### Course Learn Retrieval Augmented Generation\n\nThis repository contains the Hoopla project used throughout the Boot.dev RAG course. It includes a simple keyword search CLI and a minimal inverted index.\n\nQuick start:\n- Run a search: `python -m hoopla.cli.keyword_search_cli search \"your query\"`\n- Save an index with pickle (stdlib): see hoopla/README.md for a short example.\n\n## Features\n\n### Text Processing\n- Tokenization for improved search relevance\n- Stopwords filtering to improve search quality\n- Stemming using NLTK's PorterStemmer\n- Enhanced partial token matching\n\n### TF-IDF Implementation\n- Term Frequency (TF): Measures how frequently a term appears in a document\n  - Implemented using Counter collection for efficient token counting\n  - Normalizes text by removing punctuation and applying stemming\n  - Excludes stopwords for better relevance\n\n- Inverse Document Frequency (IDF): Measures how important a term is across all documents\n  - Calculated using the formula: log((N + 1) / (df + 1))\n  - N is the total number of documents\n  - df is the document frequency (number of documents containing the term)\n  - The +1 in formula provides smoothing to handle edge cases\n\n- TF-IDF Score: Combines TF and IDF to rank document relevance\n  - Higher scores indicate terms that are both frequent in a document and rare across all documents\n  - Used to identify distinctive terms in documents\n\n### Command Line Interface\n- Search documents: `uv run cli/keyword_search_cli.py search \"query\"`\n- Calculate term frequency: `uv run cli/keyword_search_cli.py tf doc_id term`\n- Get IDF score: `uv run cli/keyword_search_cli.py idf term`\n- Calculate TF-IDF: `uv run cli/keyword_search_cli.py tfidf doc_id term`\n\n### Inverted Index\n- Minimal InvertedIndex class for efficient search\n- Support for saving and loading index to disk using Python's pickle module\n- Document mapping for fast document retrieval\n\n## Project Setup\n\n### Setting up with uv (Recommended)\n\nuv is a fast Python package installer and virtual environment manager. Here's how to get started:\n\n1. **Install uv:**\n```bash\npip install uv\n```\n\n2. **Create and activate a virtual environment:**\n```bash\nuv venv\nsource .venv/bin/activate  # On Unix/macOS\n# OR\n.venv\\Scripts\\activate  # On Windows\n```\n\n3. **Install dependencies:**\n```bash\nuv pip install -r requirements.txt\n```\n\n4. **Add new dependencies:**\n```bash\nuv pip install package_name\nuv pip freeze \u003e requirements.txt\n```\n\n### Project Standards\n\n1. **Dependencies Management:**\n   - Use `requirements.txt` for package dependencies\n   - Use `uv.lock` for dependency locking (automatically managed by uv)\n   - Always update requirements.txt after adding new packages\n\n2. **Virtual Environment:**\n   - Never commit .venv directory\n   - Always use virtual environment when developing\n   - One virtual environment per project\n\n3. **Package Structure:**\n   - Keep code in the `hoopla/` directory\n   - Tests in `tests/` directory\n   - Configuration in project root\n\n4. **Code Style:**\n   - Follow PEP 8 guidelines\n   - Use type hints\n   - Document functions and classes\n   - Keep functions focused and small\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkiquetal%2Fbootdev-rag-course","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkiquetal%2Fbootdev-rag-course","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkiquetal%2Fbootdev-rag-course/lists"}