{"id":18428973,"url":"https://github.com/codefuse-ai/modelcache","last_synced_at":"2025-05-15T02:10:23.167Z","repository":{"id":205088146,"uuid":"712694669","full_name":"codefuse-ai/ModelCache","owner":"codefuse-ai","description":"A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.","archived":false,"fork":false,"pushed_at":"2025-01-03T08:11:25.000Z","size":2982,"stargazers_count":942,"open_issues_count":11,"forks_count":50,"subscribers_count":21,"default_branch":"main","last_synced_at":"2025-04-14T01:51:30.329Z","etag":null,"topics":["llm","semantic-cache"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codefuse-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-01T01:56:10.000Z","updated_at":"2025-04-14T01:33:46.000Z","dependencies_parsed_at":"2024-02-12T09:17:38.106Z","dependency_job_id":"5a4172bf-3af6-4651-938a-21d45ac1bbb6","html_url":"https://github.com/codefuse-ai/ModelCache","commit_stats":{"total_commits":116,"total_committers":5,"mean_commits":23.2,"dds":"0.22413793103448276","last_synced_commit":"cdba51fce0f262cb518d8d9a7fe397869b87b192"},"previous_names":["codefuse-ai/codefuse-modelcache","codefuse-ai/modelcache"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FModelCache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FModelCache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FModelCache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FModelCache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codefuse-ai","download_url":"https://codeload.github.com/codefuse-ai/ModelCache/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254259387,"owners_count":22040821,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","semantic-cache"],"created_at":"2024-11-06T05:15:11.137Z","updated_at":"2025-05-15T02:10:18.153Z","avatar_url":"https://github.com/codefuse-ai.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003e\nModelCache\n\u003c/h1\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cdiv align=\"center\"\u003e\n\u003ch4 align=\"center\"\u003e\n    \u003cp\u003e\n        \u003ca href=\"https://github.com/codefuse-ai/CodeFuse-ModelCache/blob/main/README_CN.md\"\u003e中文\u003c/a\u003e |\n     \u003cb\u003eEnglish\u003c/b\u003e\n    \u003c/p\u003e\n\u003c/h4\u003e\n\u003c/div\u003e\n\n## Contents\n\n- [Contents](#contents)\n- [News](#news)\n  - [Introduction](#introduction)\n- [Architecture](#architecture)\n- [Quick start](#quick-start)\n  - [Dependencies](#dependencies)\n  - [Start service](#start-service)\n    - [Start demo](#start-demo)\n    - [Service Startup With Docker-compose](#service-startup-with-docker-compose)\n    - [Start normal service](#start-normal-service)\n- [Visit the service](#visit-the-service)\n  - [Write cache](#write-cache)\n  - [Query cache](#query-cache)\n  - [Clear cache](#clear-cache)\n- [Function comparison](#function-comparison)\n- [Features](#features)\n- [Todo List](#todo-list)\n  - [Adapter](#adapter)\n  - [Embedding model\\\u0026inference](#embedding-modelinference)\n  - [Scalar Storage](#scalar-storage)\n  - [Vector Storage](#vector-storage)\n  - [Ranking](#ranking)\n  - [Service](#service)\n- [Acknowledgements](#acknowledgements)\n- [Contributing](#contributing)\n\n## News\n\n- 🔥🔥[2024.10.22] Added tasks for 1024 developer day.\n- 🔥🔥[2024.04.09] Added Redis Search to store and retrieve embeddings in multi-tenant. This can reduce the interaction time between Cache and vector databases to 10ms.\n- 🔥🔥[2023.12.10] Integrated LLM embedding frameworks such as 'llmEmb', 'ONNX', 'PaddleNLP', 'FastText', and the image embedding framework 'timm' to bolster embedding functionality.\n- 🔥🔥[2023.11.20] Integrated local storage, such as sqlite and faiss. This enables you to initiate quick and convenient tests.\n- [2023.08.26] codefuse-ModelCache...\n\n### Introduction\n\nCodefuse-ModelCache is a semantic cache for large language models (LLMs). By caching pre-generated model results, it reduces response time for similar requests and improves user experience. \u003cbr /\u003eThis project aims to optimize services by introducing a caching mechanism. It helps businesses and research institutions reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models.  Through open-source, we aim to share and exchange technologies related to large model semantic cache.\n\n## Architecture\n\n![modelcache modules](docs/modelcache_modules_20240409.png)\n\n## Quick start\n\nYou can find the start script in `flask4modelcache.py` and `flask4modelcache_demo.py`.\n\n- `flask4modelcache_demo.py`: A quick test service that embeds SQLite and FAISS. No database configuration required.\n- `flask4modelcache.py`: The standard service that requires MySQL and Milvus configuration.\n\n### Dependencies\n\n- Python: V3.8 or above\n- Package installation\n\n  ```shell\n  pip install -r requirements.txt \n  ```\n\n### Start service\n\n#### Start demo\n\n1. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place it in the `model/text2vec-base-chinese` folder.\n2. Start the backend service:\n\n  ```shell\n  cd CodeFuse-ModelCache\n  ```\n\n  ```shell\n  python flask4modelcache_demo.py\n  ```\n#### Service Startup With Docker-compose\n1. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place it in the `model/text2vec-base-chinese` folder.\n2. Configure docker network, only need to execute once\n```shell\ncd CodeFuse-ModelCache\n```\n```shell\ndocker network create modelcache\n```\n3. Execute the docker-compose command\n```shell\n# When the modelcache image does not exist locally for the first time, or when the Dockerfile is changed\ndocker-compose up --build\n\n# This is not the first run and the Dockerfile has not changed\ndocker-compose up\n```\n#### Start normal service\n\nBefore you start standard service, do these steps:\n\n1. Install MySQL and import the SQL file from `reference_doc/create_table.sql`.\n2. Install vector database Milvus.\n3. Configure database access in:\n   - `modelcache/config/milvus_config.ini`\n   - `modelcache/config/mysql_config.ini`\n4. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Put it in `model/text2vec-base-chinese`.\n5. Start the backend service:\n\n    ```bash\n    python flask4modelcache.py\n    ```\n\n## Visit the service\n\nThe service provides three core RESTful API functionalities: Cache-Writing, Cache-Querying, and Cache-Clearing.\n\n### Write cache\n\n```python\nimport json\nimport requests\nurl = 'http://127.0.0.1:5000/modelcache'\ntype = 'insert'\nscope = {\"model\": \"CODEGPT-1008\"}\nchat_info = [{\"query\": [{\"role\": \"system\", \"content\": \"You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems.\"}, {\"role\": \"user\", \"content\": \"你是谁?\"}],\n                  \"answer\": \"Hello, I am an intelligent assistant. How can I assist you?\"}]\ndata = {'type': type, 'scope': scope, 'chat_info': chat_info}\nheaders = {\"Content-Type\": \"application/json\"}\nres = requests.post(url, headers=headers, json=json.dumps(data))\n```\n\n### Query cache\n\n```python\nimport json\nimport requests\nurl = 'http://127.0.0.1:5000/modelcache'\ntype = 'query'\nscope = {\"model\": \"CODEGPT-1008\"}\nquery = [{\"role\": \"system\", \"content\": \"You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems.\"}, {\"role\": \"user\", \"content\": \"Who are you?\"}]\ndata = {'type': type, 'scope': scope, 'query': query}\n\nheaders = {\"Content-Type\": \"application/json\"}\nres = requests.post(url, headers=headers, json=json.dumps(data))\n```\n\n### Clear cache\n\n```python\nimport json\nimport requests\nurl = 'http://127.0.0.1:5000/modelcache'\ntype = 'remove'\nscope = {\"model\": \"CODEGPT-1008\"}\nremove_type = 'truncate_by_model'\ndata = {'type': type, 'scope': scope, 'remove_type': remove_type}\n\nheaders = {\"Content-Type\": \"application/json\"}\nres = requests.post(url, headers=headers, json=json.dumps(data))\n```\n\n## Function comparison\n\nWe've implemented several key updates to our repository. We've resolved network issues with Hugging Face and improved inference speed by introducing local embedding capabilities. Due to limitations in SqlAlchemy, we've redesigned our relational database interaction module for more flexible operations. We've added multi-tenancy support to ModelCache, recognizing the need for multiple users and models in LLM products. Lastly, we've made initial adjustments for better compatibility with system commands and multi-turn dialogues.\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth rowspan=\"2\"\u003eModule\u003c/th\u003e\n    \u003cth rowspan=\"2\"\u003eFunction\u003c/th\u003e\n\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003cth\u003eModelCache\u003c/th\u003e\n    \u003cth\u003eGPTCache\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"2\"\u003eBasic Interface\u003c/td\u003e\n    \u003ctd\u003eData query interface\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eData writing interface\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"3\"\u003eEmbedding\u003c/td\u003e\n    \u003ctd\u003eEmbedding model configuration\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLarge model embedding layer\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eBERT model long text processing\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"2\"\u003eLarge model invocation\u003c/td\u003e\n    \u003ctd\u003eDecoupling from large models\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eLocal loading of embedding model\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"2\"\u003eData isolation\u003c/td\u003e\n    \u003ctd\u003eModel data isolation\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eHyperparameter isolation\u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"3\"\u003eDatabases\u003c/td\u003e\n    \u003ctd\u003eMySQL\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMilvus\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eOceanBase\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"3\"\u003eSession management\u003c/td\u003e\n    \u003ctd\u003eSingle-turn dialogue\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eSystem commands\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMulti-turn dialogue\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"2\"\u003eData management\u003c/td\u003e\n    \u003ctd\u003eData persistence\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eOne-click cache clearance\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd rowspan=\"2\"\u003eTenant management\u003c/td\u003e\n    \u003ctd\u003eSupport for multi-tenancy\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMilvus multi-collection capability\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eOther\u003c/td\u003e\n    \u003ctd\u003eLong-short dialogue distinction\u003c/td\u003e\n    \u003ctd class=\"checkmark\"\u003e\u0026#9745; \u003c/td\u003e\n    \u003ctd\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n## Features\n\nIn ModelCache, we incorporated the core principles of GPTCache. ModelCache has four modules: adapter, embedding, similarity, and data_manager.\n\n- The adapter module orchestrates the business logic for various tasks, integrate the embedding, similarity, and data_manager modules.\n- The embedding module converts text into semantic vector representations, and transforms user queries into vectors.\n- The rank module ranks and evaluate the similarity of recalled vectors.\n- The data_manager module manages the databases.\n\nTo make ModelCache more suitable for industrial use, we made several improvements to its architecture and functionality:\n\n- [x] Architectural adjustment (lightweight integration): \n  - Embedded into LLM products using a Redis-like caching mode\n  - Provided semantic caching without interfering with LLM calls, security audits, and other functions\n  - Compatible with all LLM services\n- [x] Multiple model loading:\n  - Supported local embedding model loading, and resolved Hugging Face network connectivity issues\n  - Supported loading embedding layers from various pre-trained models\n- [x] Data isolation\n  - Environment isolation: Read different database configurations based on the environment. Isolate  development, staging, and production environments.\n  - Multi-tenant data isolation: Dynamically create collections based on models for data isolation, addressing data separation issues in multi-model/service scenarios within large language model products\n- [x] Supported system instruction: Adopted a concatenation approach to resolve issues with system instructions in the prompt paradigm.\n- [x] Long and short text differentiation: Long texts bring more challenges for similarity assessment. Added differentiation between long and short texts, allowing for separate threshold configurations.\n- [x] Milvus performance optimization: Adjusted Milvus consistency level to \"Session\" level for better performance.\n- [x] Data management:\n  - One-click cache clearing to enable easy data management after model upgrades.\n  - Recall of hit queries for subsequent data analysis and model iteration reference.\n  - Asynchronous log write-back for data analysis and statistics\n  - Added model field and data statistics field to enhance features\n\n## Todo List\n\n### Adapter\n\n- [ ] Register adapter for Milvus：Based on the \"model\" parameter in the scope, initialize the corresponding Collection and perform the load operation.\n\n### Embedding model\u0026inference\n\n- [ ] Inference Optimization: Optimizing the speed of embedding inference, compatible with inference engines such as FasterTransformer, TurboTransformers, and ByteTransformer.\n- [ ] Compatibility with Hugging Face models and ModelScope models, offering more methods for model loading.\n\n### Scalar Storage\n\n- [ ] Support MongoDB\n- [ ] Support ElasticSearch\n\n### Vector Storage\n\n- [ ] Adapts Faiss storage in multimodal scenarios.\n\n### Ranking\n\n- [ ] Add ranking model to refine the order of data after embedding recall.\n\n### Service\n\n- [ ] Supports FastAPI.\n- [ ] Add visual interface to offer a more direct user experience.\n\n## Acknowledgements\n\nThis project has referenced the following open-source projects. We would like to express our gratitude to the projects and their developers for their contributions and research.\u003cbr /\u003e[GPTCache](https://github.com/zilliztech/GPTCache)\n\n## Contributing\n\nModelCache is a captivating and invaluable project, whether you are an experienced developer or a novice just starting out, your contributions to this project are warmly welcomed. Your involvement in this project, be it through raising issues, providing suggestions, writing code, or documenting and creating examples, will enhance the project's quality and make a significant contribution to the open-source community.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodefuse-ai%2Fmodelcache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodefuse-ai%2Fmodelcache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodefuse-ai%2Fmodelcache/lists"}