{"id":28826457,"url":"https://github.com/elibutters/cascadeinference","last_synced_at":"2026-04-15T18:01:51.397Z","repository":{"id":299497299,"uuid":"1003195221","full_name":"elibutters/CascadeInference","owner":"elibutters","description":"Cascade based inference for LLMs","archived":false,"fork":false,"pushed_at":"2025-06-17T00:12:35.000Z","size":27,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-04T15:16:31.565Z","etag":null,"topics":["cascade","cascade-inference","chatgpt","claude","gemini","google","openai"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elibutters.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-16T19:16:14.000Z","updated_at":"2025-06-17T00:12:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"391c4202-8062-4e4b-88c6-fd2dd05f6b38","html_url":"https://github.com/elibutters/CascadeInference","commit_stats":null,"previous_names":["elibutters/cascadeinference"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/elibutters/CascadeInference","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elibutters%2FCascadeInference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elibutters%2FCascadeInference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elibutters%2FCascadeInference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elibutters%2FCascadeInference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elibutters","download_url":"https://codeload.github.com/elibutters/CascadeInference/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elibutters%2FCascadeInference/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31853279,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"ssl_error","status_checked_at":"2026-04-15T15:24:39.138Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cascade","cascade-inference","chatgpt","claude","gemini","google","openai"],"created_at":"2025-06-19T03:04:27.724Z","updated_at":"2026-04-15T18:01:51.378Z","avatar_url":"https://github.com/elibutters.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cascade Inference\n\nCascade based inference for large language models.\n\n## Installation\n\n```bash\npip install cascade-inference\n\n# To use semantic agreement, install the optional dependencies:\npip install cascade-inference[semantic]\n```\n\n## Basic Usage\n\n\u003e **💡 Pro-Tip:** It is highly recommended to use Level 1 client models from the same or similar model families (e.g., all Llama-based, all Qwen-based). This improves the reliability of the `semantic` agreement strategy. If you mix models from different families (like Llama and Gemini), consider lowering the `threshold` in the agreement strategy to account for stylistic differences.\n\nUsing the library is as simple as a standard OpenAI API call.\n\n```python\nfrom openai import OpenAI\nimport cascade\nimport os\n\n# Setup your clients\nclient = OpenAI(\n    base_url=\"https://openrouter.ai/api/v1\",\n    api_key=os.environ.get(\"OPENROUTER_API_KEY\"),\n)\n\n# Call the create function directly\nresponse = cascade.chat.completions.create(\n    # Provide the ensemble of fast clients\n    level1_clients=[\n        (client, \"meta-llama/llama-3.1-8b-instruct\"),\n        (client, \"google/gemini-flash-1.5\")\n    ],\n    # Provide the single, powerful client for escalation\n    level2_client=(client, \"openai/gpt-4o\"),\n    agreement_strategy=\"semantic\", # or \"strict\"\n    messages=[\n        {\"role\": \"user\", \"content\": \"What are the key differences between HBM3e and GDDR7 memory?\"}\n    ]\n)\n\n# The response object looks just like a standard OpenAI response\nprint(response.choices[0].message.content)\n```\n\n## Advanced Configuration\n\nFor more control, you can pass a dictionary to the `agreement_strategy` parameter. This allows you to fine-tune the agreement logic.\n\n### 1. Changing the Semantic Similarity Threshold\n\nYou can adjust how strictly the semantic comparison is applied. The `threshold` is a value between 0 and 1, where 1 is a perfect match. The default is `0.9`.\n\n```python\nresponse = cascade.chat.completions.create(\n    # ... clients and messages ...\n    agreement_strategy={\n        \"name\": \"semantic\",\n        \"threshold\": 0.95  # Require a 95% similarity match\n    },\n    # ...\n)\n```\n\n### 2. Using a Different Embedding Model\n\nThe default model is `sentence-transformers/all-MiniLM-L6-v2`, which is fast and lightweight. You can specify any other model compatible with the [**`FastEmbed`** library](https://qdrant.github.io/fastembed/examples/Supported_Models/).\n\nSome other excellent choices from the supported models list include:\n*   `nomic-ai/nomic-embed-text-v1.5`\n*   `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`: For multilingual use cases.\n\nThe library will automatically download and cache the new model on the first run.\n\n```python\nresponse = cascade.chat.completions.create(\n    # ... clients and messages ...\n    agreement_strategy={\n        \"name\": \"semantic\",\n        \"model_name\": \"BAAI/bge-base-en-v1.5\", # A larger, more powerful model\n        \"threshold\": 0.85 # It's good practice to adjust the threshold for a new model\n    },\n    # ...\n)\n```\n\n### 3. Using a Remote Embedding Model\n\nIf local embedding is too slow, you can use the `remote_semantic` strategy. This feature is optimized for the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index) and is the recommended way to perform remote comparisons.\n\n**Usage:**\nYou must provide a Hugging Face API key, which you can get for free from your account settings: [**huggingface.co/settings/tokens**](https://huggingface.co/settings/tokens).\n\nThe key can be passed directly via the `api_key` parameter or set as the `HUGGING_FACE_HUB_TOKEN` environment variable.\n\nThe default model is `sentence-transformers/all-mpnet-base-v2`, but you can easily use other models from the [**`sentence-transformers`**](https://huggingface.co/sentence-transformers) family on the Hub. We recommend the following models for the remote strategy:\n\n*   **Default \u0026 High-Quality:** `sentence-transformers/all-mpnet-base-v2`\n*   **Lightweight \u0026 Fast:** `sentence-transformers/all-MiniLM-L6-v2`\n*   **Multilingual:** `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`\n\n```python\nresponse = cascade.chat.completions.create(\n    # ... clients and messages ...\n    agreement_strategy={\n        \"name\": \"remote_semantic\",\n        \"model_name\": \"sentence-transformers/paraphrase-multilingual-mpnet-base-v2\", # A multilingual model\n        \"threshold\": 0.95,\n        \"api_key\": \"hf_YourHuggingFaceToken\" # Optional, can also be set via env variable\n    },\n    # ...\n)\n```\n\nYou can also point the strategy to a completely different API provider by overriding the `api_url`, but you may need to fork the `RemoteSemanticAgreement` class if the provider requires a different payload structure. ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felibutters%2Fcascadeinference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felibutters%2Fcascadeinference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felibutters%2Fcascadeinference/lists"}