{"id":13584863,"url":"https://github.com/fynnfluegge/codeqai","last_synced_at":"2025-10-23T02:44:15.941Z","repository":{"id":196641731,"uuid":"692884559","full_name":"fynnfluegge/codeqai","owner":"fynnfluegge","description":"Local first semantic code search and chat | Leverage custom copilots with fine-tuning datasets from code in Alpaca, Conversational, Completion and Instruction format","archived":false,"fork":false,"pushed_at":"2025-02-16T10:33:29.000Z","size":575,"stargazers_count":475,"open_issues_count":11,"forks_count":49,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-05-11T00:32:27.811Z","etag":null,"topics":["codellama","faiss","gpt","huggingface","langchain","llama2","llamacpp","llm","ollama","openai","sentence-transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fynnfluegge.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-17T21:46:07.000Z","updated_at":"2025-05-05T02:28:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"33c3cc4e-78cb-413c-ac7d-3c08962ba7f0","html_url":"https://github.com/fynnfluegge/codeqai","commit_stats":{"total_commits":78,"total_committers":6,"mean_commits":13.0,"dds":"0.20512820512820518","last_synced_commit":"5d49c3e987d0b318ec341e48650f8a8ffcfb8afb"},"previous_names":["fynnfluegge/codeqai"],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fynnfluegge%2Fcodeqai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fynnfluegge%2Fcodeqai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fynnfluegge%2Fcodeqai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fynnfluegge%2Fcodeqai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fynnfluegge","download_url":"https://codeload.github.com/fynnfluegge/codeqai/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253659533,"owners_count":21943653,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codellama","faiss","gpt","huggingface","langchain","llama2","llamacpp","llm","ollama","openai","sentence-transformers"],"created_at":"2024-08-01T15:04:34.400Z","updated_at":"2025-10-23T02:44:10.902Z","avatar_url":"https://github.com/fynnfluegge.png","language":"Python","funding_links":[],"categories":["Python","openai"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# codeqai\n\n[![Build](https://github.com/fynnfluegge/codeqai/actions/workflows/build.yaml/badge.svg?branch=main)](https://github.com/fynnfluegge/codeqai/actions/workflows/build.yaml)\n[![Publish](https://github.com/fynnfluegge/codeqai/actions/workflows/publish.yaml/badge.svg)](https://github.com/fynnfluegge/codeqai/actions/workflows/publish.yaml)\n[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\nGenerate datasets from code for finetuning, search your codebase semantically or chat with your code from cli. Keep the vector database superfast up to date to the latest code changes.\n100% local support without any dataleaks.  \nBuilt with [langchain](https://github.com/langchain-ai/langchain), [treesitter](https://github.com/tree-sitter/tree-sitter), [sentence-transformers](https://github.com/UKPLab/sentence-transformers), [instructor-embedding](https://github.com/xlang-ai/instructor-embedding),\n[faiss](https://github.com/facebookresearch/faiss), [lama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://github.com/jmorganca/ollama), [Streamlit](https://github.com/streamlit/streamlit).\n\n\u003c/div\u003e\n\n## ✨ Features\n\n- 🗒️ \u0026nbsp;Finetuning dataset generation\n  - export in Alpaca, conversational, instruction or completionn format \n- 🔎 \u0026nbsp;Semantic code search\n- 💬 \u0026nbsp;GPT-like chat with your codebase\n- ⚙️ \u0026nbsp;Synchronize vector store and latest code changes with ease\n- 💻 \u0026nbsp;100% local embeddings and llms\n  - sentence-transformers, instructor-embeddings, llama.cpp, Ollama\n- 🌐 \u0026nbsp;OpenAI, Azure OpenAI and Anthropic\n- 🌳 \u0026nbsp;Treesitter integration\n\n\u003e [!NOTE]  \n\u003e There will be better results if the code is well documented. You might consider [doc-comments-ai](https://github.com/fynnfluegge/doc-comments.ai) for code documentation generation.\n\n## 🚀 Usage\n\n#### Export finetuning dataset from codebase in conversational format:\n```\ncodeqai dataset\n```\nExport in different format like Alpaca with:\n```\ncodeqai dataset --format alpaca\n```\nExport dataset with model distillation\n```\ncodeqai dataset --distillation doc\n```\n\n#### Start semantic search:\n\n```\ncodeqai search\n```\n\n\u003cdiv align=\"center\"\u003e\n  \n\u003cimg src=\"https://github.com/fynnfluegge/codeqai/assets/16321871/142576f6-a2d4-41b9-a353-d82da78bc3b8\" width=\"800\"\u003e\n\n\u003c/div\u003e\n\n#### Start chat dialog:\n\n```\ncodeqai chat\n```\n\n\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"https://github.com/fynnfluegge/codeqai/assets/16321871/84209b30-1940-4aa5-a9e2-03d699217adf\" width=\"800\"\u003e\n\n\u003c/div\u003e\n\n#### Synchronize vector store with current git checkout:\n\n```\ncodeqai sync\n```\n\n#### Start Streamlit app:\n\n```\ncodeqai app\n```\n\n\u003cdiv align=\"center\"\u003e\n  \n  \u003cimg src=\"https://github.com/fynnfluegge/codeqai/assets/16321871/3a9105f1-066a-4cbd-a096-c8a7bd2068d3\" width=\"800\"\u003e\n  \n\u003c/div\u003e\n\n\u003e [!NOTE]\n\u003e At first usage, the repository will be indexed with the configured embeddings model which might take a while.\n\n## 📋 Requirements\n\n- Python \u003e=3.9,\u003c3.12\n\n## 📦 Installation\n\nInstall in an isolated environment with `pipx`:\n\n```\npipx install codeqai\n```\n\n⚠ Make sure pipx is using Python \u003e=3.9,\u003c3.12.  \nTo specify the Python version explicitly with pipx, activate the desired Python version (e.g. with `pyenv shell 3.X.X`) and install with:\n\n```\npipx install codeqai --python $(which python)\n```\n\nIf you are still facing issues using pipx you can also install directly from source through PyPI with:\n\n```\npip install codeqai\n```\n\nHowever, it is recommended to use pipx to benefit from isolated environments for the dependencies.  \nVisit the [Troubleshooting](https://github.com/fynnfluegge/codeqai?tab=readme-ov-file#-troubleshooting) section for solutions of known issues during installation.\n\n\u003e [!NOTE]  \n\u003e Some packages are not installed by default. At first usage it is asked to install `faiss-cpu` or `faiss-gpu`. Faiss-gpu is recommended if the hardware supports CUDA 7.5+.\n\u003e If local embeddings and llms are used it will be further asked to install sentence-transformers, instructor or llama.cpp.\n\n## 🔧 Configuration\n\nAt first usage or by running\n\n```\ncodeqai configure\n```\n\nthe configuration process is initiated, where the embeddings and llms can be chosen.\n\n\u003e [!IMPORTANT]  \n\u003e If you want to change the embeddings model in the configuration later, delete the cached files in `~/.cache/codeqai`.\n\u003e Afterwards the vector store files are created again with the recent configured embeddings model. This is neccessary since the similarity search does not work if the models differ.\n\n## 🌐 Remote models\n\nIf remote models are used, the following environment variables are required.\nIf the required environment variables are already set, they will be used, otherwise you will be prompted to enter them which are then stored in `~/.config/codeqai/.env`.\n\n### OpenAI\n\n```bash\nexport OPENAI_API_KEY = \"your OpenAI api key\"\n```\n\n### Azure OpenAI\n\n```bash\nexport OPENAI_API_TYPE = \"azure\"\nexport AZURE_OPENAI_ENDPOINT = \"https://\u003cyour-endpoint\u003e.openai.azure.com/\"\nexport OPENAI_API_KEY = \"your Azure OpenAI api key\"\nexport OPENAI_API_VERSION = \"2023-05-15\"\n```\n\n### Anthropic\n\n```bash\nexport ANTHROPIC_API_KEY=\"your Anthropic api key\"\n```\n\n\u003e [!NOTE]  \n\u003e To change the environment variables later, update the `~/.config/codeqai/.env` manually.\n\n## 📚 Supported Languages\n\n- [x] Python\n- [x] Typescript\n- [x] Javascript\n- [x] Java\n- [x] Rust\n- [x] Kotlin\n- [x] Go\n- [x] C++\n- [x] C\n- [x] C#\n- [x] Ruby\n\n## 💡 How it works\n\nThe entire git repo is parsed with treesitter to extract all methods with documentations and saved to a local FAISS vector database with either sentence-transformers, instructor-embeddings or OpenAI's text-embedding-ada-002.  \nThe vector database is saved to a file on your system and will be loaded later again after further usage.\nAfterwards it is possible to do semantic search on the codebase based on the embeddings model.  \nTo chat with the codebase locally llama.cpp or Ollama is used by specifying the desired model.\nFor synchronization of recent changes in the repository, the git commit hashes of each file along with the vector Ids are saved to a cache.\nWhen synchronizing the vector database with the latest git state, the cached commit hashes are compared to the current git hash of each file in the repository.\nIf the git commit hashes differ, the related vectors are deleted from the database and inserted again after recreating the vector embeddings.\nUsing llama.cpp the specified model needs to be available on the system in advance.\nUsing Ollama the Ollama container with the desired model needs to be running locally in advance on port 11434.\nAlso OpenAI or Azure-OpenAI can be used for remote chat models.\n\n## ？FAQ\n\n### Where do I get models for llama.cpp?\n\nInstall the `huggingface-cli` and download your desired model from the model hub.\nFor example\n\n```\nhuggingface-cli download TheBloke/CodeLlama-13B-Python-GGUF codellama-13b-python.Q5_K_M.gguf\n```\n\nwill download the `codellama-13b-python.Q5_K_M` model. After the download has finished the absolute path of the model `.gguf` file is printed to the console.\n\n\u003e [!IMPORTANT]  \n\u003e `llama.cpp` compatible models must be in the `.gguf` format.\n\n## 🛟 Troubleshooting\n\n- ### During installation with `pipx`\n\n  ```\n  pip failed to build package: tiktoken\n\n  Some possibly relevant errors from pip install:\n    error: subprocess-exited-with-error\n    error: can't find Rust compiler\n  ```\n\n  Make sure the rust compiler is installed on your system from [here](https://www.rust-lang.org/tools/install).\n\n- ### During installation of `faiss`\n  ```\n  × Building wheel for faiss-cpu (pyproject.toml) did not run successfully.\n  │ exit code: 1\n  ╰─\u003e [12 lines of output]\n      running bdist_wheel\n      ...\n  note: This error originates from a subprocess, and is likely not a problem with pip.\n  ERROR: Failed building wheel for faiss-cpu\n  Failed to build faiss-cpu\n  ERROR: Could not build wheels for faiss-cpu, which is required to install pyproject.toml-based projects\n  ```\n  Make sure to have codeqai installed with Python \u003c3.12. There is no faiss wheel available yet for Python 3.12.\n\n## 🌟 Contributing\n\nIf you are missing a feature or facing a bug don't hesitate to open an issue or raise a PR.\nAny kind of contribution is highly appreciated!\n\nTo build and run the project in development mode make sure to have `conda`, `conda-lock` or `poetry` installed.\n\nBy using `conda` run:\n\n```\nconda env create -f environment.yml -n codeqai\n```\n\nor by using `conda-lock` run:\n\n```\nconda-lock install --name codeqai conda-\u003cYOUR_PLATFORM\u003e.lock\n```\n\nActivate the environment and install dependencies with:\n\n```\nconda activate codeqai \u0026\u0026 poetry install\n```\n\nBy using `poetry` run:\n\n```\npoetry install \u0026\u0026 poetry shell\n```\n\nRun e.g. `codeqai chat` within development environment with:\n\n```\npoetry run codeqai chat\n```\n\nRun tests with:\n\n```\npoetry run pytest -s -vv\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffynnfluegge%2Fcodeqai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffynnfluegge%2Fcodeqai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffynnfluegge%2Fcodeqai/lists"}