{"id":31760430,"url":"https://github.com/managedkaos/self-hosting-gen-ai","last_synced_at":"2025-10-09T21:29:27.181Z","repository":{"id":255498071,"uuid":"850200473","full_name":"managedkaos/self-hosting-gen-ai","owner":"managedkaos","description":"From Laptop to Cloud: Self-Hosting Gen AI for Privacy and Performance","archived":false,"fork":false,"pushed_at":"2025-02-27T05:19:56.000Z","size":583,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-11T08:45:54.833Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/managedkaos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-31T05:58:40.000Z","updated_at":"2025-02-27T05:19:59.000Z","dependencies_parsed_at":"2024-09-05T20:16:48.034Z","dependency_job_id":"786eba33-cd00-47a7-bca5-4a3c2fead2d1","html_url":"https://github.com/managedkaos/self-hosting-gen-ai","commit_stats":null,"previous_names":["managedkaos/self-hosting-gen-ai"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/managedkaos/self-hosting-gen-ai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/managedkaos%2Fself-hosting-gen-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/managedkaos%2Fself-hosting-gen-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/managedkaos%2Fself-hosting-gen-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/managedkaos%2Fself-hosting-gen-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/managedkaos","download_url":"https://codeload.github.com/managedkaos/self-hosting-gen-ai/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/managedkaos%2Fself-hosting-gen-ai/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279002049,"owners_count":26083285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-09T21:29:26.255Z","updated_at":"2025-10-09T21:29:27.174Z","avatar_url":"https://github.com/managedkaos.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# From Laptop to Cloud: Self-Hosting Gen AI for Privacy and Performance\n\n**What’s the history of generative artificial intelligence and how does it work?**\n\n- TLDR…\n  - Math\n  - Front-end application for human interface\n  - Back-end application for model management\n  - Back-end providers have EULA which may include access to requests and responses\n\n**OK. As technologists, how can we own the use and management of generative AI infrastructure today and in the future?**\n\n- [Let's get started](#A-Process-for-Self-Hosting-Generative-AI)\n- [Take a peek at the glossary when you need to ...](documents/0-GLOSSARY.md)\n\n## The Rise of Generative Artificial Intelligence\n\n|Role|Input/Response|\n| --- | --- |\n|**User**| help me complete the following sentence: \"In the fall of 2022, OpenAI introduced ...\"|\n|**System**| \"In the fall of 2022, OpenAI introduced ChatGPT, a powerful language model that quickly gained popularity for its ability to generate human-like text and assist with a wide range of tasks, from casual conversation to technical problem-solving.\"|\n|**User**| that's you, right?|\n|**System**| Yes, that's me! I'm ChatGPT, a language model developed by OpenAI, introduced in the fall of 2022. My purpose is to assist with a wide variety of tasks, including answering questions, providing explanations, generating creative content, and more.|\n\n### 1950 - 2015\n\n- [What is the history of artificial intelligence (AI)? (Tableau)](https://www.tableau.com/data-insights/ai/history)\n\n  \u003e Birth of AI: 1950-1956: Alan Turing published his work “Computer Machinery and Intelligence” which eventually became The Turing Test, which experts used to measure computer intelligence. The term “artificial intelligence” was coined and came into popular use.\n\n- [A Short History Of ChatGPT: How We Got To Where We Are Today (Forbes, May 19, 2023)](https://www.forbes.com/sites/bernardmarr/2023/05/19/a-short-history-of-chatgpt-how-we-got-to-where-we-are-today/)\n\n  \u003e OpenAI was founded in December 2015 by Sam Altman, Greg Brockman, Elon Musk, Ilya Sutskever, Wojciech Zaremba, and John Schulman. The founding team combined their diverse expertise in technology entrepreneurship, machine learning, and software engineering to create an organization focused on advancing artificial intelligence in a way that benefits humanity.\n\n\n### 2022\n- [Introducing ChatGPT, (OpenAI, November 30, 2022)](https://openai.com/index/chatgpt/)\n\n  \u003e We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.\n  \u003e\n  \u003e ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.\n  \u003e\n  \u003e We are excited to introduce ChatGPT to get users’ feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT is free. Try it now at chatgpt.com(opens in a new window).\n\n- [The advent of OpenAI’s ChatGPT may be the most important news event of 2022 (Forbes, December 12, 2022)](https://fortune.com/2022/12/12/openai-chatgpt-biggest-news-event-of-2022/)\n\n  \u003e Russia's invasion of Ukraine tops most lists of important news events in 2022, and President\nVolodymyr Zelensky is not only Time's Person of the Year, but also everyone else's. But the last\ncouple of weeks have convinced me the honor may belong to something else: Open AI's ChatGPT.\n\n- [A new AI chatbot might do your homework for you. But it's still not an A+ student (NPR, December 19, 2022)](https://www.npr.org/2022/12/19/1143912956/chatgpt-ai-chatbot-homework-academia)\n\n  \u003e After the developer OpenAI released the text-based system to the public last month, some educators have been sounding the alarm about the potential that such AI systems have to transform academia, for better and worse.\n  \u003e\n  \u003e \"AI has basically ruined homework,\" said Ethan Mollick, a professor at the University of Pennsylvania's Wharton School of Business.\n\n### 2023\n\n- [On ChatGPT’s one-year anniversary, it has more than 1.7 billion users—here’s what it may do next (CNBC, November 30, 2023)](https://www.cnbc.com/2023/11/30/chatgpts-one-year-anniversary-how-the-viral-ai-chatbot-has-changed.html)\n\n  \u003e Over the past year, people have used ChatGPT for all sorts of tasks, from writing emails to sprucing up resumes to even creating a six-figure business.\n  \u003e\n  \u003e As the ways people utilize the AI chatbot have changed, the technology itself has evolved too.\n\n- [The Year of ChatGPT and Living Generatively (Wired, December 1, 2023)](https://www.wired.com/story/plaintext-chatgpt-year-of-living-generatively/)\n\n  \u003e In November last year, OpenAI launched a \"low key research preview\" called ChatGPT. What happened next transformed the tech industry-and perhaps humanity's future.\n\n### 2024\n\n- [24 Top AI Statistics And Trends In 2024 (Forbes, Jun 15, 2024)](https://www.forbes.com/advisor/business/ai-statistics/)\n\n  \u003e AI Business Impacts: 43% of businesses are concerned about technology dependence\n  \u003e Forty-three percent of businesses are concerned about technology dependence, and an additional **35% worry about having the technical skills to use AI effectively. These concerns highlight the challenges that organizations face while adopting AI technologies.**\n\n## Privacy Concerns for Individuals and Businesses\n\n- [How your data is used to improve model performance (OpenAI)](https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance)\n\n  \u003e When you use our services for individuals such as ChatGPT or DALL•E, we may use your content to train our models.\n  \u003e\n  \u003e We don’t use content from our business offerings such as ChatGPT Team, ChatGPT Enterprise, and our API Platform to train our models. Please see our Enterprise Privacy page for information on how we handle business data.\n\n- [The tricky truth about how generative AI uses your data (Vox, July 27, 2023)](https://www.vox.com/technology/2023/7/27/23808499/ai-openai-google-meta-data-privacy-nope)\n\n# A Process for Self-Hosting Generative AI\n\n1. Tools\n2. Research \u0026 Experiment\n3. Deploy\n\n## 1. Tools\n\n- [An Open-Source Foundation Model](./1-MODELS.md)\n\n  \u003e A publicly available, pre-trained, large language model that can be hosted on our own compute platform\n\n- [Ollama](https://ollama.com/)\n\n  \u003e An open-source, software framework  that makes it easier to run large language models (LLMs) on a local computer through the use of Modelfiles.\n\n- [Open WebUI](https://docs.openwebui.com/)\n\n  \u003e An extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline.\n\n- Compute platform\n\n  \u003e The CPU or GPU enabled systems where models, Ollama, and Open WebUI will run\n\n## 2. Research and Experiment\n\n- [Ollama Model Library](https://ollama.com/library)\n\n  \u003e A collection of open-source FMs that are compatible with the Ollama hosting environment.\n\n  _Keep in mind the number of parameters used to train the model and how this affects the model's size in GB.  Generally speaking, the more parameters used to train the model, the larger the model will be.  Consider ~5 GB for 8 Billion parameters._\n\n### Model Playgrounds\n- [Cloudflare Workers AI LLM Playground](https://playground.ai.cloudflare.com/)\n\n  \u003e Free, no registration required\n  \u003e\n  \u003e Explore different Text Generation models.\n  \u003e\n  \u003e [Documentation](https://developers.cloudflare.com/workers-ai/models/) is useful for learning more about models and their capabilities.\n\n  ![Cloudflare Workers AI LLM Playground](images/cloudflare-workers-ai-playground-SCR-20240904-mjtv.png)\n\n- [Caylent AI Battleground](https://battleground.caylent.com/chat)\n\n  \u003e Free, requires registration\n  \u003e\n  \u003e Compare models by examining their performance\n\n- Others?\n\n**Experiment with a variety of prompts including:**\n\n- Explain and demonstrate well known algorithms and theorems that can be expressed as code.\n\n  ```\n  Describe the Fibonnacci sequence and use Python code for a practical example.\n  ```\n\n  ```\n  Explain the Pythagorean theorem and provide a demonstration using Python.\n  ```\n\n  ```\n  Describe how Shakespeare's most influential plays have been adapted into films.\n  ```\n\n- Analyze or process transcripts and other human generated content.\n\n  - Summarize online meeting transcripts\n  - Suggest the content for emails, articles, etc. given the context and audience\n\n**Note and Experiment with the controls including:**\n\n- System Message\n\n  \u003e Provides a role for the model and controls how the AI system handles the output when it encounters an error or an issue that prevents it from generating a response.\n\n- Maximum Output Length (Tokens)\n\n  \u003e Controls the maximum number of tokens that the AI system can generate in a single response. By limiting the maximum output length, the system can prevent truncation of the output or allow more space for long responses.\n  \u003e\n  \u003e _Note: If the response is truncated, use `continue` as the next prompt and optionally increase Maximum Output Length.  The model should continue responding to the initial prompt._\n\n- Temperature\n\n  \u003e Controls the randomness and/or creativity of the output. Lower temperatures provide less random output.  Higher temperatures provide output that is more random.  The temperature is related to the concept of [Boltzmann temperature in statistical mechanics](https://en.wikipedia.org/wiki/Boltzmann_distribution), where a higher temperature corresponds to more random and exploratory sampling from the probability distribution.\n\n## 3. Deploy\n\n- Pros and Upsides\n- Cons and Downsides\n\n### Ollama\n\n[Official installation instructions](https://ollama.com/download)\n\n- macOS: [Use the Brew package manager for easy installation.](https://formulae.brew.sh/formula/ollama)\n- Linux: One-line install `curl -fsSL https://ollama.com/install.sh | sh`\n- Windows: Currently in preview; Suggest using [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install)) and then use the Linux installation method\n\n### Open Web UI\n\n[Official installation instructions](https://docs.openwebui.com/getting-started/)\n\n- Use [Docker](https://www.docker.com/) to run the Open WebUI image as a container.\n\n  Supports managing the process as a service.\n\n  ```\n  if [ ! -d $(OPEN_WEBUI_HOME)/data ]; then mkdir -p $(OPEN_WEBUI_HOME)/data; fi\n\n\tdocker run --detach \\\n\t\t--network=\"host\" \\\n\t\t--volume $(OPEN_WEBUI_HOME)/data:/app/backend/data \\\n\t\t--env PORT=9595 \\\n\t\t--env OLLAMA_BASE_URL=http://localhost:11434 \\\n\t\t--restart always \\\n\t\t--name open-webui \\\n\t\tghcr.io/open-webui/open-webui:main || \\\n\tprintf \"http://localhost:9595\\n\\n\"\n\n  ```\n\n### Installing Ollama and Open Web UI on a Laptop _(With Demonstration)_\n\n#### Hardware and Operating System Requirements\n- CPU/GPU, RAM, and Disk\n- macOS\n- Linux\n- Windows (Using [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install))\n\n### Installing Ollama and Open Web UI on a Cloud Server _(With Demonstration)_\n\n#### Hardware and Operating System Requirements\n- CPU/GPU, RAM, and Disk\n- Linux using Amazon Machine Image ([Amazon Linux 2 AMI with NVIDIA TESLA GPU Driver Info](https://aws.amazon.com/marketplace/pp/prodview-64e4rx3h733ru?sr=0-1\u0026ref_=beagle\u0026applicationId=AWSMPContessa))\n- Costs\n\n### Advanced Hosting in the Cloud _(With Demonstration)_\n\n#### Hardware and Operating System Requirements\n- CPU/GPU, RAM, and Disk\n- Linux using Amazon Machine Image ([Amazon Linux 2 AMI with NVIDIA TESLA GPU Driver Info](https://aws.amazon.com/marketplace/pp/prodview-64e4rx3h733ru?sr=0-1\u0026ref_=beagle\u0026applicationId=AWSMPContessa))\n- Multiple Servers (For high availability and advanced model support)\n- Hosted Zone\n- SSL Certificates\n- Costs\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanagedkaos%2Fself-hosting-gen-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmanagedkaos%2Fself-hosting-gen-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanagedkaos%2Fself-hosting-gen-ai/lists"}