{"id":19283032,"url":"https://github.com/chuloai/andromeda-chain","last_synced_at":"2025-04-22T01:32:23.435Z","repository":{"id":172063262,"uuid":"648534167","full_name":"ChuloAI/andromeda-chain","owner":"ChuloAI","description":"Serving hugging face guidance behind a server","archived":false,"fork":false,"pushed_at":"2023-06-14T20:23:08.000Z","size":56,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-13T02:17:05.579Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ChuloAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-02T07:38:35.000Z","updated_at":"2024-05-05T10:53:21.000Z","dependencies_parsed_at":"2023-07-09T15:33:21.602Z","dependency_job_id":null,"html_url":"https://github.com/ChuloAI/andromeda-chain","commit_stats":null,"previous_names":["chuloai/andromeda-chain","chuloai/guidance_server","chuloai/andromeda"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChuloAI%2Fandromeda-chain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChuloAI%2Fandromeda-chain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChuloAI%2Fandromeda-chain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChuloAI%2Fandromeda-chain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ChuloAI","download_url":"https://codeload.github.com/ChuloAI/andromeda-chain/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250162104,"owners_count":21385044,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T21:29:28.589Z","updated_at":"2025-04-22T01:32:23.427Z","avatar_url":"https://github.com/ChuloAI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Andromeda-Chain - mastering all the chains\n\nThis repository contains both a server and a client package.\n\nThe server is (not very creatively) called `guidance_server`.\nThe client is called `AndromedaChain`.\n\nWhy this package/library?\nThe [guidance](https://github.com/microsoft/guidance) is an awesome library, but has limited support through HTTP APIs like Oobabooga UI.\nSo I rolled my own HTTP server, which allows me to quickly prototype apps that use guidance templates.\n\nI originally created Oasis with a variant of this idea: https://github.com/paolorechia/oasis\n\n\n## Installation\n\n\n### AndromedaChain\n\n```bash\npip install andromeda-chain\n``` \n\n### Guidance Server\nServing the guidance library with local models behind a HTTP server.\n\nSupported methods:\n1. Hugging Face (16bit, 8bit, 4bit)\n2. GPTQ with or without CPU offload\n3. Experimental LLaMA CPP support based on the work of https://github.com/Maximilian-Winter\n\nThe server configuration is passed through enviroment variables, typically through the docker-compose file:\n \n```yaml\n    GENERAL_BASE_IMAGE: GPU\n    # CPP Model Example:\n    # GENERAL_MODEL_PATH: /models/open-llama-7B-open-instruct.ggmlv3.q4_0.bin\n    # GENERAL_TOKENIZER_PATH: /models/VMware_open-llama-7b-open-instruct\n    # GENERAL_LOADING_METHOD: CPP\n\n    # GPTQ Model Example:\n    GENERAL_MODEL_PATH: /models/vicuna-7B-1.1-GPTQ-4bit-128g\n    GENERAL_LOADING_METHOD: GPTQ\n\n    # HF Model Example\n    # GENERAL_MODEL_PATH: /models/VMware_open-llama-7b-open-instruct\n    # GENERAL_LOADING_METHOD: HUGGING_FACE\n\n    # Guidance Settings\n    GUIDANCE_AFTER_ROLE: \"|\u003e\"\n    GUIDANCE_BEFORE_ROLE: \"\u003c|\"\n\n    # Tokenizer Settings\n    TK_BOOL_USE_FAST: false\n\n    # HuggingFace\n    HF_BOOL_USE_8_BIT: true\n    HF_BOOL_USE_4_BIT: false\n    HF_DEVICE_MAP: auto\n\n    # GPTQ\n    GPTQ_INT_WBITS: 4\n    GPTQ_INT_GROUP_SIZE: 128\n    GPTQ_INT_PRE_LOADED_LAYERS: 20\n    GPTQ_DEVICE: \"cuda\"\n    GPTQ_BOOL_CPU_OFFLOADING: false\n\n    # LLaMA CPP\n    CPP_INT_N_GPU_LAYERS: 300\n    CPP_INT_N_THREADS: 12\n    CPP_BOOL_CACHING: false\n```\n\n#### Docker Container\n\n\nRequirements:\n1. docker-engine\n2. docker-compose v2\n\nIf using GPU also:\n\n3. nvidia-docker: https://github.com/NVIDIA/nvidia-docker\n\n\n\n##### Pulling the image\nYou can find the images tags in [Docker Hub](https://hub.docker.com/repository/docker/paolorechia/guidance_server/general)\nThe easiest way to pull it is to reuse the docker-compose file.\n\n```bash\ndocker-compose -f docker-compose.gpu.yaml up\n```\n\nOr use the CPU version\n\n```bash\ndocker-compose -f docker-compose.cpu.yaml up\n```\n\nNote that you still need to setup the model (see in usage section).\n\n*LICENSE NOTE*: The GPU image is based on `nvidia/cuda:12.1.1-runtime-ubuntu22.04`, which is subject to the proprietary NVIDIA license.\nWhile the software from Andromeda repository is open source, some layers of the docker container are not.\n\n\n#### Building\nJust use the appropriate bash script\n```bash\n./build_gpu.sh\n```\n\nOr:\n```bash\n./build_cpu.sh\n```\n\n\n## Usage\n\n1. Download a LLM model you want to use from Hugging Face.\n2. Create a 'models' directory locally, and save the model in there.\n3. Setup the environment variable `MODEL_PATH` in the `docker-compose.gpu` or `docker-compose.cpu` depending which one you want.\n4. Start the server.\n5. Use the Andromeda package to query the server.\n\n\n\n### Using Andromeda Package\n\n```python\nfrom andromeda_chain import AndromedaChain, AndromedaPrompt, AndromedaResponse\n\nchain = AndromedaChain()\n\nprompt = AndromedaPrompt(\n    name=\"hello\",\n    prompt_template=\"\"\"Howdy: {{gen 'expert_names' temperature=0 max_tokens=300}}\"\"\",\n    input_vars=[],\n    output_vars=[\"expert_names\"]\n)\n\nresponse: AndromedaResponse = chain.run_guidance_prompt(prompt)\n# Use the response\nprint(response.expanded_generation)\nprint(response.result_vars)\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchuloai%2Fandromeda-chain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchuloai%2Fandromeda-chain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchuloai%2Fandromeda-chain/lists"}