{"id":20162615,"url":"https://github.com/codeamt/ragrayagent","last_synced_at":"2025-07-18T05:08:58.411Z","repository":{"id":227240603,"uuid":"770687473","full_name":"codeamt/RagRayAgent","owner":"codeamt","description":null,"archived":false,"fork":false,"pushed_at":"2024-03-12T10:12:29.000Z","size":115,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-08T00:05:15.850Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codeamt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-12T01:25:29.000Z","updated_at":"2024-03-12T10:12:01.000Z","dependencies_parsed_at":"2024-03-12T11:05:56.766Z","dependency_job_id":null,"html_url":"https://github.com/codeamt/RagRayAgent","commit_stats":null,"previous_names":["codeamt/rag-agent-on-ray"],"tags_count":0,"template":false,"template_full_name":"codeamt/rag-ray-langchain-tf-template","purl":"pkg:github/codeamt/RagRayAgent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeamt%2FRagRayAgent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeamt%2FRagRayAgent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeamt%2FRagRayAgent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeamt%2FRagRayAgent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codeamt","download_url":"https://codeload.github.com/codeamt/RagRayAgent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeamt%2FRagRayAgent/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265703637,"owners_count":23814044,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T00:25:56.567Z","updated_at":"2025-07-18T05:08:58.386Z","avatar_url":"https://github.com/codeamt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RagRayAgent\n\n\n# Overview\n\nThis repository does the following:\n\n1. Fine tuning an LLM\n2. Populate a vector database with and embedding model, so able to query your context similarty in the vector database\n3. Fine tune with Ray framework\n4. Use CPU and GPU for fine tuning and serving\n5. Serve your fine tuned LLM as REST API.\n\n\n# Configurations\nPlease set the API keys accordingly save the content below in `llm_agent/.env`.\n\n```bash\nOPENAI_API_KEY=\nANYSCALE_API_KEY=\nOPENAI_API_BASE=\"https://api.endpoints.anyscale.com/v1\"\nANYSCALE_API_BASE=\"https://api.endpoints.anyscale.com/v1\"\nDB_CONNECTION_STRING=\"postgresql://testUser:testPassword@localhost:15432/testDB\"\nEMBEDDING_INDEX_DIR=/tmp/embedding_index_sql\nVECTOR_TABLE_NAME=document\nVECTOR_TABLE_DUMP_OUTPUT_PATH=/tmp/vector.document.dump.sql\nRAYDOCS_ROOT=/tmp/raydocs\nNUM_CPUS=14\nNUM_GPUS=1\nNUM_CHUNKS=5\nCHUNK_SIZE=500\nCHUNK_OVERLAP=50\nEMBEDDING_MODEL_NAME=\"thenlper/gte-base\"\nLLM_MODEL_NAME=meta-llama/Llama-2-70b-chat-hf\n\n#How much data should be fed for fine tuning\n#give a floating number between \u003e0.001 and 1 (1 included, which means use all the data for fine tuning)\nUSE_THIS_PORTION_OF_DATA=0.05\n\n```\n\n\n# Makefile Commands for Project Setup\n\n```bash\nmake scrape # Scrap the web page\n\nmake vectordb # Configure Postgres Vector DB\n\nmake postgres-client # Install Postgres Client\n\nThen, in a seperate terminal\nmake port-forward-postgres # Port Forward DB\n\nmake vector-support # Enable Vector Support\n\nmake vector-table # Create Vector Table\n\nmake embedding-table # Get Vector Table\n\n# result:\n               List of relations\n Schema |      Name       |   Type   |  Owner\n--------+-----------------+----------+----------\n public | document        | table    | testUser\n public | document_id_seq | sequence | testUser\n(2 rows)\n\nmake pods-preview # Get Pods\n\nmake install-pip-deps # Install Pip Dependencies\n```\n\n\n## Finetuning \n\nOnce Setup, the following commands enable finetuning on a ray cluster:\n\n```bash\nmake ray-cluster # Start Ray Cluster\n\nmake profile-ray-cluster # Profile Cluster\n\nmake finetune # Finetune LLM\n```\nAt the end you will see something like below:\n\n```bash\nThe default batch size for map_batches is rollout_fragment_length * num_envs.\n```\nwhich indicates that LLM fine tuning is done, vector db is populated, and a query is sent to LLM with the context identified by your vector DB.\n\n**Note:** My machine has 16 CPUs and 1 GPU, so I set up `NUM_CPUS` and `NUM_GPUs` accordingly. These numbers may differ according to your machine. The principle here is that you can not set up a number larger than existing resources (CPU and GPU).\n\nPleae note that we are using `thenlper/gte-base` as an embedding model, this is a relatively small model, you might like to change it. `LLM_MODEL_NAME` is  to `meta-llama/Llama-2-70b-chat-hf`, which is good for this setup, but again you might like to change it.\n\n\n#### Serving\n\n```bash\nmake dev-deploy\n\nmake test-query\n```\nShould yield: \n\n```bash\nb'\"{\\\\\"question\\\\\": \\\\\"What is the default batch size for map_batches?\\\\\", \\\\\"sources\\\\\": [\\\\\"https://docs.ray.io/en/master/rllib/rllib-training.html#specifying-rollout-workers\\\\\", \\\\\"https://docs.ray.io/en/master/rllib/rllib-training.html#specifying-rollout-workers\\\\\", \\\\\"https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.policy.Policy.compute_log_likelihoods.html#ray-rllib-policy-policy-policy-compute-log-likelihoods\\\\\", \\\\\"https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.policy.Policy.compute_log_likelihoods.html#ray-rllib-policy-policy-policy-compute-log-likelihoods\\\\\", \\\\\"https://docs.ray.io/en/master/rllib/rllib-algorithms.html#importance-weighted-actor-learner-architecture-impala\\\\\"], \\\\\"answer\\\\\": \\\\\" The default batch size for map_batches is rollout_fragment_length * num_envs.\\\\\", \\\\\"llm\\\\\": \\\\\"meta-llama/Llama-2-70b-chat-hf\\\\\"}\"'\n\n```\n\n## TODO\n* Spot Instance/Fleet Provisioning for Cost Effective Training\n* CUDA devcontainer configurations\n* Dockerfiles\n* Terraform Configuration for 3-Tier Cloud Deployment\n* linting, testing\n* Github Push/Pull Actions + CI/CD Building\n* Intergrating Other DB Backends\n* Quantization\n\n\n# References\n[1](https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1) A Comprehensive Guide for Building RAG-based LLM Applications (Part 1). Any Scale Blog.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodeamt%2Fragrayagent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodeamt%2Fragrayagent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodeamt%2Fragrayagent/lists"}