An open API service indexing awesome lists of open source software.

https://github.com/pathcosmos/machine-setting

Portable AI development environment system
https://github.com/pathcosmos/machine-setting

Last synced: 21 days ago
JSON representation

Portable AI development environment system

Awesome Lists containing this project

README

          

# Machine Setting

> ๐ŸŒ [English](README_EN.md) | **ํ•œ๊ตญ์–ด**

Portable AI development environment system. One command to set up Python + AI/ML packages + optional Node.js/Java on any machine, with automatic GPU/CPU detection and cross-machine sync.

**Supported platforms**: Linux (x86_64, NVIDIA CUDA) + macOS (Apple Silicon M1+, MPS) + Cloud/Container (Docker, K8s, AWS/GCP/Azure)
**Supported shells**: bash + zsh

---

## Table of Contents

- [Quick Start](#quick-start)
- [How It Works](#how-it-works)
- [Installation Flow](#installation-flow)
- [Installed Components](#installed-components)
- [Daily Usage](#daily-usage)
- [CLI Options](#cli-options)
- [Profiles](#profiles)
- [GPU Support](#gpu-support)
- [Pre-flight Check](#pre-flight-check)
- [Reinstallation](#reinstallation)
- [Uninstall](#uninstall)
- [Health Check & Recovery](#health-check--recovery)
- [Cross-Machine Sync](#cross-machine-sync)
- [Disk Health & Monitoring](#disk-health--monitoring)
- [Shell Integration Details](#shell-integration-details)
- [Directory Structure](#directory-structure)
- [State & Configuration Files](#state--configuration-files)
- [Troubleshooting](#troubleshooting)
- [Security](#security)

---

## Quick Start

```bash
# New machine setup (Linux or macOS)
git clone https://github.com/pathcosmos/machine-setting.git ~/machine_setting
cd ~/machine_setting && ./setup.sh

# Cloud/container setup (sudo ์—†์ด user-space๋งŒ ์„ค์น˜)
./setup.sh --cloud

# Activate AI environment
aienv
```

---

## How It Works

### Overview

`setup.sh`๋Š” 7๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๋™์ž‘ํ•˜๋ฉฐ, ๊ฐ ๋‹จ๊ณ„๋Š” **์ฒดํฌํฌ์ธํŠธ ์‹œ์Šคํ…œ**์œผ๋กœ ์ƒํƒœ๊ฐ€ ์ถ”์ ๋ฉ๋‹ˆ๋‹ค. ์„ค์น˜๊ฐ€ ์ค‘๊ฐ„์— ์‹คํŒจํ•ด๋„ ์™„๋ฃŒ๋œ ๋‹จ๊ณ„๋ฅผ ๊ฑด๋„ˆ๋›ฐ๊ณ  ์‹คํŒจ ์ง€์ ๋ถ€ํ„ฐ ์žฌ๊ฐœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

### Execution Flow

```
./setup.sh ์‹คํ–‰
โ”‚
โ”œโ”€ 1) Pre-flight Check (interactive ๋ชจ๋“œ)
โ”‚ ํ˜„์žฌ ์‹œ์Šคํ…œ ์ƒํƒœ๋ฅผ ์Šค์บ”ํ•˜๊ณ  ์–ด๋–ค ์ž‘์—…์ด ํ•„์š”ํ•œ์ง€ ํ‘œ์‹œ
โ”‚ ์‚ฌ์šฉ์ž๊ฐ€ ์„ค์น˜ ํ•ญ๋ชฉ์„ ์„ ํƒ/ํ•ด์ œ ๊ฐ€๋Šฅ
โ”‚
โ”œโ”€ 2) ์ด์ „ ์„ค์น˜ ์ƒํƒœ ํ™•์ธ
โ”‚ ~/.machine_setting/install.state ํŒŒ์ผ์—์„œ ์ด์ „ ์ง„ํ–‰ ์ƒํƒœ ์ฝ๊ธฐ
โ”‚ โ†’ ์ด์ „ ์‹คํŒจ ์žˆ์œผ๋ฉด: Resume / Reset / Cancel ๋ฉ”๋‰ด ํ‘œ์‹œ
โ”‚ โ†’ ๋ชจ๋‘ ์™„๋ฃŒ ์ƒํƒœ๋ฉด: Reinstall / Cancel ๋ฉ”๋‰ด ํ‘œ์‹œ
โ”‚
โ””โ”€ 3) 7๋‹จ๊ณ„ ์„ค์น˜ ํŒŒ์ดํ”„๋ผ์ธ ์‹คํ–‰
๊ฐ ๋‹จ๊ณ„๋งˆ๋‹ค checkpoint ๊ธฐ๋ก โ†’ ์‹คํŒจ ์‹œ ์ž๋™ rollback
```

### Checkpoint System

๋ชจ๋“  ์„ค์น˜ ์ƒํƒœ๋Š” `~/.machine_setting/install.state`์— ๊ธฐ๋ก๋ฉ๋‹ˆ๋‹ค:

```
STAGE_1_HARDWARE=done
STAGE_2_NVIDIA=done
STAGE_3_PYTHON=done
STAGE_4_VENV=in_progress โ† ์ด ๋‹จ๊ณ„์—์„œ ์‹คํŒจ
STAGE_5_NODE=pending
STAGE_6_JAVA=pending
STAGE_7_SHELL=pending
```

๊ฐ ๋‹จ๊ณ„๊ฐ€ ์‹คํŒจํ•˜๋ฉด:
1. ํ•ด๋‹น ๋‹จ๊ณ„์˜ ์ƒํƒœ๋ฅผ `failed`๋กœ ๊ธฐ๋ก
2. **์ž๋™ ๋กค๋ฐฑ** ์‹คํ–‰ (ํ•ด๋‹น ๋‹จ๊ณ„์—์„œ ์„ค์น˜ํ•œ ๊ฒƒ๋“ค ์ œ๊ฑฐ)
3. ๋‹ค์Œ ์‹คํ–‰ ์‹œ ์‹คํŒจ ์ง€์ ๋ถ€ํ„ฐ ์žฌ๊ฐœ ๊ฐ€๋Šฅ

---

## Installation Flow

### [1/7] Hardware Detection

์‹œ์Šคํ…œ ํ•˜๋“œ์›จ์–ด๋ฅผ ์ž๋™ ๊ฐ์ง€ํ•˜์—ฌ `~/.machine_setting_profile`์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

| ๊ฐ์ง€ ํ•ญ๋ชฉ | Linux | macOS |
|-----------|-------|-------|
| GPU | `lspci` + `nvidia-smi` (fallback) | `system_profiler` (Apple Silicon) |
| CUDA ๋ฒ„์ „ | `nvcc --version` / `nvidia-smi` | N/A (MPS ์‚ฌ์šฉ) |
| CPU/RAM | `/proc/cpuinfo`, `/proc/meminfo` | `sysctl` |
| NGC ์ปจํ…Œ์ด๋„ˆ | torch NV ๋ฒ„์ „ ์ฒดํฌ + `/opt/nvidia` ์กด์žฌ | N/A |
| Cloud/Container | Docker, K8s, cgroup, ํด๋ผ์šฐ๋“œ VM ๋ฒค๋”, sudo ๊ฐ€์šฉ์„ฑ | N/A |

> **์ปจํ…Œ์ด๋„ˆ GPU ๊ฐ์ง€:** `lspci`๊ฐ€ ์—†๋Š” ์ปจํ…Œ์ด๋„ˆ์—์„œ๋„ `nvidia-smi` fallback์œผ๋กœ GPU๋ฅผ ์ •์ƒ ๊ฐ์ง€ํ•ฉ๋‹ˆ๋‹ค.

๊ฐ์ง€ ๊ฒฐ๊ณผ์— ๋”ฐ๋ผ ์ตœ์  ํ”„๋กœํ•„์ด ์ž๋™ ์„ ํƒ๋ฉ๋‹ˆ๋‹ค:
- Apple Silicon โ†’ `mac-apple-silicon`
- NGC ์ปจํ…Œ์ด๋„ˆ โ†’ `ngc-container`
- Cloud/Container ํ™˜๊ฒฝ โ†’ `cloud-server`
- NVIDIA GPU โ†’ `gpu-workstation`
- RAM โ‰ฅ 32GB (GPU ์—†์Œ) โ†’ `cpu-server`
- RAM โ‰ฅ 8GB โ†’ `laptop`
- ๊ทธ ์™ธ โ†’ `minimal`

### [2/7] NVIDIA GPU Stack (Linux ์ „์šฉ)

์‹œ์Šคํ…œ ๋ ˆ๋ฒจ NVIDIA GPU ์†Œํ”„ํŠธ์›จ์–ด ์Šคํƒ์„ ์ž๋™์œผ๋กœ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. `scripts/install-nvidia.sh`๊ฐ€ 9๊ฐœ ์„œ๋ธŒ ์Šคํ…Œ์ด์ง€๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

**์ž๋™ ์Šคํ‚ต ์กฐ๊ฑด:** ๋น„-Linux OS, NVIDIA GPU ๋ฏธ๊ฐ์ง€, NGC ์ปจํ…Œ์ด๋„ˆ (์ด๋ฏธ ์„ค์น˜๋จ), `INSTALL_NVIDIA=false`

**GPU ํ‹ฐ์–ด ์ž๋™ ๋ถ„๋ฅ˜:**

| ํ‹ฐ์–ด | GPU ์˜ˆ์‹œ | ๋™์ž‘ |
|------|----------|------|
| Consumer | GeForce RTX 3090/4090 | ๊ธฐ๋ณธ ์„ค์น˜ (๋“œ๋ผ์ด๋ฒ„ + CUDA + cuDNN + NCCL) |
| Professional | RTX A6000, L40 | ๊ธฐ๋ณธ ์„ค์น˜ |
| Datacenter | A100, H100, H200, B200 | ๊ธฐ๋ณธ + ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ๋„๊ตฌ ์ž๋™ ํ™œ์„ฑํ™” |

**์„ค์น˜ ๊ตฌ์„ฑ์š”์†Œ:**

| ๊ตฌ์„ฑ์š”์†Œ | ์„ค๋ช… | ์„ค์ • |
|----------|------|------|
| NVIDIA Driver | `ubuntu-drivers` ์ž๋™ ์ถ”์ฒœ ๋˜๋Š” ์ˆ˜๋™ ๋ฒ„์ „ ์ง€์ • | `NVIDIA_DRIVER_VERSION` |
| CUDA Toolkit | `cuda-toolkit` ๋ฉ”ํƒ€ํŒจํ‚ค์ง€, `/usr/local/cuda` ์‹ฌ๋ณผ๋ฆญ ๋งํฌ | `NVIDIA_CUDA_VERSION` |
| cuDNN 9.x | DNN ๊ฐ€์† ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (`cudnn9-cuda-XX`) | ์ž๋™ |
| NCCL | ๋ฉ€ํ‹ฐ GPU ์ง‘ํ•ฉ ํ†ต์‹  (๋‹จ์ผ GPU๋ฉด ์Šคํ‚ต) | ์ž๋™ |
| Container Toolkit | Docker GPU ์ง€์› (Docker ๋ฏธ์„ค์น˜ ์‹œ ์Šคํ‚ต) | `NVIDIA_CONTAINER_TOOLKIT` |
| Enterprise Tools | DCGM, Fabric Manager, GDS, nvidia-peermem | `NVIDIA_ENTERPRISE` |
| System Utilities | numactl, hwloc, nvtop, lm-sensors, build-essential, cmake | `NVIDIA_SYSTEM_TOOLS` |
| Kernel Tuning | sysctl (vm.max_map_count, shmmax ๋“ฑ), memlock limits, CPU governor | `NVIDIA_KERNEL_TUNING` |

**Open vs Proprietary Kernel Modules:**
- `NVIDIA_OPEN_KERNEL=auto` (๊ธฐ๋ณธ): Turing+ (RTX 20xx ์ด์ƒ) โ†’ open, ๊ตฌํ˜• โ†’ proprietary
- `NVIDIA_OPEN_KERNEL=true`: ๊ฐ•์ œ open ์ปค๋„ ๋ชจ๋“ˆ
- `NVIDIA_OPEN_KERNEL=false`: ๊ฐ•์ œ proprietary ์ปค๋„ ๋ชจ๋“ˆ

**Secure Boot:** MOK (Machine Owner Key) ๋“ฑ๋ก ์•ˆ๋‚ด๊ฐ€ ์ž๋™์œผ๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

**์ปค๋„ ํŠœ๋‹ ์ƒ์„ธ:**
- `vm.max_map_count=1048576` (๋Œ€๊ทœ๋ชจ ๋ฉ”๋ชจ๋ฆฌ ๋งคํ•‘)
- RAM ๊ธฐ๋ฐ˜ ๋™์  `shmmax`/`shmall` ๊ณ„์‚ฐ
- `memlock unlimited` (GPU ๋ฉ”๋ชจ๋ฆฌ ์ž ๊ธˆ)
- TCP ๋ฒ„ํผ ์ตœ์ ํ™” (๋ถ„์‚ฐ ํ•™์Šต์šฉ)
- CPU governor โ†’ performance

### [3/7] Python Setup

[uv](https://github.com/astral-sh/uv)๋ฅผ ํ†ตํ•ด Python์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค (๊ธฐ๋ณธ: 3.12).

- uv๊ฐ€ ์—†์œผ๋ฉด ์ž๋™ ์„ค์น˜ (`curl -LsSf https://astral.sh/uv/install.sh | sh`)
- `uv python install 3.12` ์œผ๋กœ ๊ด€๋ฆฌํ˜• Python ์„ค์น˜
- ์‹œ์Šคํ…œ Python์— ์˜ํ–ฅ ์—†์Œ

### [4/7] AI Environment (Virtual Environment + Packages)

venv ์ƒ์„ฑ ํ›„ ํŒจํ‚ค์ง€ ๊ทธ๋ฃน๋ณ„๋กœ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

**venv ์œ„์น˜ ์˜ต์…˜:**
| ๋ชจ๋“œ | ๊ฒฝ๋กœ | ์šฉ๋„ |
|------|------|------|
| global (๊ธฐ๋ณธ) | `~/ai-env` | ๋ชจ๋“  ํ”„๋กœ์ ํŠธ์—์„œ ๊ณต์œ  |
| local | `./.venv` | ํ˜„์žฌ ํ”„๋กœ์ ํŠธ ์ „์šฉ |
| custom | ์‚ฌ์šฉ์ž ์ง€์ • ๊ฒฝ๋กœ | ํŠน์ • ํŒŒํ‹ฐ์…˜ ๋“ฑ |

**ํŒจํ‚ค์ง€ ๊ทธ๋ฃน:**
| ๊ทธ๋ฃน | ํŒŒ์ผ | ๋‚ด์šฉ |
|------|------|------|
| core | `requirements-core.txt` | transformers, accelerate, peft, wandb, numpy, mlflow, tensorboard, optuna ๋“ฑ |
| data | `requirements-data.txt` | pandas, polars, duckdb, SQLAlchemy, psycopg2, pypdf, openpyxl ๋“ฑ |
| web | `requirements-web.txt` | fastapi, uvicorn, httpx, gradio, cryptography ๋“ฑ |
| gpu | `requirements-gpu.txt` | torch+CUDA, triton, bitsandbytes, deepspeed, vllm, pynvml, nvitop ๋“ฑ |
| mps | `requirements-mps.txt` | torch (Apple Silicon MPS ํฌํ•จ) |
| cpu | `requirements-cpu.txt` | torch CPU-only ๋นŒ๋“œ |

GPU/MPS/CPU ํŒจํ‚ค์ง€๋Š” [1/7]์—์„œ ๊ฐ์ง€๋œ ํ•˜๋“œ์›จ์–ด์— ๋”ฐ๋ผ ์ž๋™์œผ๋กœ ํ•˜๋‚˜๋งŒ ์„ ํƒ๋ฉ๋‹ˆ๋‹ค.

**๋””์Šคํฌ ์š”๊ตฌ์‚ฌํ•ญ:** ์ตœ์†Œ 15GB ์—ฌ์œ  ๊ณต๊ฐ„ ๊ถŒ์žฅ (GPU ํŒจํ‚ค์ง€ ํฌํ•จ ์‹œ)

#### core โ€” ํ•ต์‹ฌ AI/ML (`requirements-core.txt`, ~160๊ฐœ)

| ์นดํ…Œ๊ณ ๋ฆฌ | ํŒจํ‚ค์ง€ | ์„ค๋ช… |
|----------|--------|------|
| **LLM Provider** | `anthropic`, `openai`, `google-generativeai` + Google API 7๊ฐœ | Claude, GPT, Gemini API |
| **LangChain** | `langchain`, `langchain-community`, `langchain-core`, `langchain-huggingface`, `langgraph`, `langgraph-checkpoint`, `langgraph-prebuilt`, `langgraph-sdk`, `langsmith` | ์—์ด์ „ํŠธ/์ฒด์ธ/๊ทธ๋ž˜ํ”„ |
| **HuggingFace** | `transformers`, `datasets`, `tokenizers`, `huggingface_hub`, `accelerate`, `peft`, `trl`, `sentence-transformers`, `safetensors`, `hf-xet`, `sentencepiece` | ๋ชจ๋ธ ํ•™์Šต/์ถ”๋ก /ํŒŒ์ธํŠœ๋‹ |
| **Classical ML** | `scikit-learn`, `scipy`, `xgboost`, `lightgbm`, `numba`, `llvmlite` | ์ „ํ†ต ๋จธ์‹ ๋Ÿฌ๋‹ |
| **Vector/Embedding** | `chromadb`, `faiss-cpu` | ๋ฒกํ„ฐ DB/์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ |
| **Visualization** | `matplotlib`, `seaborn`, `contourpy`, `cycler`, `fonttools`, `kiwisolver` | ์ฐจํŠธ/๊ทธ๋ž˜ํ”„ |
| **์‹คํ—˜ ๊ด€๋ฆฌ** | `wandb`, `mlflow`, `tensorboard`, `optuna` | ์‹คํ—˜ ์ถ”์ /ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ตœ์ ํ™” |
| **๊ณผํ•™ ๊ณ„์‚ฐ** | `sympy`, `mpmath`, `networkx`, `shapely`, `h5py` | ์ˆ˜ํ•™/๊ทธ๋ž˜ํ”„/์ง€๋ฆฌ/HDF5 |
| **์˜ค๋””์˜ค** | `pydub`, `ffmpy` | ์˜ค๋””์˜ค ์ฒ˜๋ฆฌ/๋ณ€ํ™˜ |
| **Testing** | `pytest`, `pytest-asyncio` | ๋‹จ์œ„/๋น„๋™๊ธฐ ํ…Œ์ŠคํŠธ |
| **NLP/ํ…์ŠคํŠธ** | `regex`, `python-bidi`, `Markdown`, `markdown-it-py`, `rich`, `Pygments` | ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ/๋ Œ๋”๋ง |
| **Utilities** | `pydantic`, `pydantic-settings`, `python-dotenv`, `click`, `typer`, `loguru`, `structlog`, `tqdm`, `coloredlogs`, `humanfriendly`, `prettytable`, `py-cpuinfo`, `tenacity`, `backoff` | ์„ค์ •/CLI/๋กœ๊น…/์žฌ์‹œ๋„ |
| **Async** | `anyio`, `aiofiles`, `aiohappyeyeballs`, `aiosignal`, `frozenlist`, `multidict`, `yarl` | ๋น„๋™๊ธฐ IO |
| **Serialization** | `typing_extensions`, `marshmallow`, `dataclasses-json`, `jsonschema`, `jsonpatch`, `PyYAML`, `ruamel.yaml` | ์Šคํ‚ค๋งˆ/์ง๋ ฌํ™”(YAML ์ฃผ์„ ๋ณด์กด) |
| **gRPC/Protobuf** | `grpcio`, `grpcio-status`, `proto-plus`, `protobuf` | RPC ํ†ต์‹  |
| **Monitoring** | `opentelemetry-api`, `opentelemetry-sdk`, `opentelemetry-exporter-otlp-proto-grpc` + 3๊ฐœ, `posthog` | ํ…”๋ ˆ๋ฉ”ํŠธ๋ฆฌ/๋ถ„์„ |
| **Infra** | `kubernetes`, `GitPython`, `APScheduler`, `psutil`, `watchdog` | K8s/Git/์Šค์ผ€์ค„๋ง |
| **Build** | `packaging`, `setuptools`, `wheel`, `build`, `ninja` | ํŒจํ‚ค์ง€ ๋นŒ๋“œ |

#### gpu โ€” NVIDIA GPU ์ „์šฉ (`requirements-gpu.txt`, ~15๊ฐœ)

| ์นดํ…Œ๊ณ ๋ฆฌ | ํŒจํ‚ค์ง€ | ์„ค๋ช… |
|----------|--------|------|
| **PyTorch** | `torch`, `torchaudio`, `torchvision`, `triton` | CUDA index URL๋กœ GPU ๋นŒ๋“œ ์„ค์น˜ |
| **CUDA Bindings** | `cuda-bindings`, `cuda-pathfinder` | CUDA Python ๋ฐ”์ธ๋”ฉ |
| **GPU ๊ฐ€์†** | `flash-attn`, `bitsandbytes`, `onnxruntime-gpu` | FlashAttention/์–‘์žํ™”/GPU ์ถ”๋ก  |
| **๋ถ„์‚ฐ ํ•™์Šต** | `deepspeed`, `pytorch-lightning`, `torchmetrics` | ๋ฉ€ํ‹ฐ GPU ํ•™์Šต |
| **GPU ๋ชจ๋‹ˆํ„ฐ๋ง** | `pynvml`, `gpustat`, `nvitop` | GPU ์‚ฌ์šฉ๋Ÿ‰/์˜จ๋„ ๋ชจ๋‹ˆํ„ฐ๋ง |
| **์ถ”๋ก  ์ตœ์ ํ™”** | `vllm`, `optimum` | LLM ๊ณ ์† ์„œ๋น™/๋ชจ๋ธ ์ตœ์ ํ™” |

> **์ฐธ๊ณ :** `nvidia-*` ๋Ÿฐํƒ€์ž„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” `torch` ์˜์กด์„ฑ์œผ๋กœ ์ž๋™ ํ•ด์†Œ๋˜๋ฏ€๋กœ ๋ชฉ๋ก์— ๋ฏธํฌํ•จ. ๊ณ ์ •ํ•˜๋ฉด CUDA ๋ฒ„์ „ ๊ฐ„ ํ˜ธํ™˜์„ฑ์ด ๊นจ์งˆ ์ˆ˜ ์žˆ์Œ.

#### data โ€” ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ (`requirements-data.txt`, ~40๊ฐœ)

| ์นดํ…Œ๊ณ ๋ฆฌ | ํŒจํ‚ค์ง€ | ์„ค๋ช… |
|----------|--------|------|
| **DataFrame** | `pandas`, `pyarrow`, `numpy`, `polars`, `duckdb`, `connectorx` | ๋ฐ์ดํ„ฐ ๋ถ„์„/์ฟผ๋ฆฌ ์—”์ง„ |
| **DB ๋“œ๋ผ์ด๋ฒ„** | `SQLAlchemy`, `alembic`, `psycopg2-binary`, `PyMySQL`, `oracledb`, `cx-Oracle`, `clickhouse-connect`, `clickhouse-driver` | PostgreSQL/MySQL/Oracle/ClickHouse |
| **๋ฌธ์„œ ์ฒ˜๋ฆฌ** | `pdfplumber`, `pdfminer.six`, `pypdf`, `pypdfium2`, `python-docx`, `python-pptx`, `openpyxl`, `xlsxwriter`, `lxml` | PDF/Word/Excel/PPT ํŒŒ์‹ฑ |
| **์ด๋ฏธ์ง€** | `pillow`, `opencv-python`, `opencv-python-headless`, `scikit-image`, `ImageIO` | ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ/CV |
| **OCR** | `easyocr`, `pytesseract` | ๊ด‘ํ•™ ๋ฌธ์ž ์ธ์‹ |
| **์Šคํฌ๋ž˜ํ•‘** | `beautifulsoup4` | HTML ํŒŒ์‹ฑ/์›น ์Šคํฌ๋ž˜ํ•‘ |
| **ํ†ต๊ณ„/์„ค๋ช…** | `statsmodels`, `shap` | ํ†ต๊ณ„ ๋ชจ๋ธ/์„ค๋ช… ๊ฐ€๋Šฅ์„ฑ |
| **์ง๋ ฌํ™”** | `orjson`, `ormsgpack`, `jsonlines`, `ujson` | ๊ณ ์† JSON/MsgPack |
| **๋ฉ”์‹œ์ง•** | `aiokafka`, `kafka-python`, `paho-mqtt` | Kafka/MQTT ์ŠคํŠธ๋ฆฌ๋ฐ |
| **ํด๋ผ์šฐ๋“œ** | `boto3`, `botocore`, `s3transfer` | AWS S3 ์Šคํ† ๋ฆฌ์ง€ |

์„ ํƒ ์„ค์น˜(ํ•„์š” ์‹œ): `paddleocr`, `paddlepaddle`, `paddlex`, `opencv-contrib-python`, `ultralytics` โ€” `requirements-data.txt` ํ•˜๋‹จ ์ฃผ์„ ์ฐธ๊ณ .

#### web โ€” ์›น/API (`requirements-web.txt`, ~25๊ฐœ)

| ์นดํ…Œ๊ณ ๋ฆฌ | ํŒจํ‚ค์ง€ | ์„ค๋ช… |
|----------|--------|------|
| **ํ”„๋ ˆ์ž„์›Œํฌ** | `fastapi`, `starlette`, `uvicorn`, `uvloop`, `Flask`, `Werkzeug`, `gradio`, `gradio_client` | REST API/UI ์„œ๋ฒ„ |
| **HTTP ํด๋ผ์ด์–ธํŠธ** | `httpx`, `httpx-sse`, `httpcore`, `requests`, `requests-oauthlib`, `aiohttp` | ๋™๊ธฐ/๋น„๋™๊ธฐ HTTP |
| **์„œ๋ฒ„ ์œ ํ‹ธ** | `h11`, `httptools`, `websockets`, `websocket-client`, `watchfiles`, `python-multipart` | ๊ณ ์„ฑ๋Šฅ ์„œ๋ฒ„/WebSocket |
| **ํ…œํ”Œ๋ฆฟ** | `Jinja2`, `MarkupSafe`, `itsdangerous` | HTML ๋ Œ๋”๋ง |
| **์ธ์ฆ/๋ณด์•ˆ** | `PyJWT`, `bcrypt`, `cryptography`, `pyOpenSSL` | JWT/์•”ํ˜ธํ™”/TLS |
| **๋ชจ๋‹ˆํ„ฐ๋ง** | `prometheus_client`, `sentry-sdk` | ๋ฉ”ํŠธ๋ฆญ/์—๋Ÿฌ ์ถ”์  |

#### cpu โ€” CPU ์ „์šฉ (`requirements-cpu.txt`, 4๊ฐœ)

| ํŒจํ‚ค์ง€ | ์„ค๋ช… |
|--------|------|
| `torch`, `torchaudio`, `torchvision` | CPU ๋นŒ๋“œ (PyTorch CPU index URL) |
| `onnxruntime` | CPU ์ถ”๋ก  ์—”์ง„ |

#### mps โ€” Apple Silicon (`requirements-mps.txt`, 4๊ฐœ)

| ํŒจํ‚ค์ง€ | ์„ค๋ช… |
|--------|------|
| `torch`, `torchaudio`, `torchvision` | ๊ธฐ๋ณธ PyPI ๋นŒ๋“œ (MPS ํฌํ•จ) |
| `onnxruntime` | CPU ์ถ”๋ก  ์—”์ง„ (macOS์šฉ GPU ๋ฒ„์ „ ์—†์Œ) |

**์„ค์น˜ ์กฐํ•ฉ ์˜ˆ์‹œ:** GPU ์›Œํฌ์Šคํ…Œ์ด์…˜ = `core` + `gpu` + `data` + `web` โ‰ˆ **240+๊ฐœ ํŒจํ‚ค์ง€**

### [5/7] Node.js (์„ ํƒ)

NVM (Node Version Manager)์„ ์„ค์น˜ํ•˜๊ณ , Node.js LTS๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

- ํ”„๋กœํ•„์— ๋”ฐ๋ผ ๊ธฐ๋ณธ ์„ ํƒ/๋ฏธ์„ ํƒ ๊ฒฐ์ •
- Interactive ๋ชจ๋“œ์—์„œ ์„ค์น˜ ์—ฌ๋ถ€๋ฅผ ๋ฌผ์–ด๋ด„
- Lazy loading: ์…ธ ์‹œ์ž‘ ์‹œ NVM์„ ๋กœ๋“œํ•˜์ง€ ์•Š๊ณ , `node`/`npm` ์ตœ์ดˆ ์‹คํ–‰ ์‹œ ๋กœ๋“œ

### [6/7] Java (์„ ํƒ)

SDKMAN์„ ์„ค์น˜ํ•˜๊ณ , Java 21 (LTS)์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

- ํ”„๋กœํ•„์— ๋”ฐ๋ผ ๊ธฐ๋ณธ ์„ ํƒ/๋ฏธ์„ ํƒ ๊ฒฐ์ •
- Lazy loading: `sdk`/`java` ์ตœ์ดˆ ์‹คํ–‰ ์‹œ ๋กœ๋“œ

### [7/7] Shell Integration

`.bashrc`์™€ `.zshrc`์— ๋ชจ๋“ˆ ์†Œ์‹ฑ ๋ธ”๋ก์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

```bash
# >>> machine_setting >>>
# Auto-source shell modules from machine_setting
for f in ~/machine_setting/shell/bashrc.d/[0-9]*.sh; do
[ -r "$f" ] && source "$f"
done
# Source machine-local secrets (never committed)
[ -r "$HOME/.bashrc.local" ] && source "$HOME/.bashrc.local"
# <<< machine_setting <<<
```

์ด ๋ธ”๋ก์€ ๋‹ค์Œ ์…ธ ๋ชจ๋“ˆ์„ ์ˆœ์„œ๋Œ€๋กœ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:

| ํŒŒ์ผ | ์—ญํ•  |
|------|------|
| `00-path.sh` | PATH ์„ค์ • (CUDA, Homebrew, uv, Maven) |
| `10-aliases.sh` | ๊ณตํ†ต ๋ณ„์นญ (์•„๋ž˜ ํ‘œ ์ฐธ์กฐ) |
| `20-env.sh` | ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์„ค์ • |
| `30-nvm.sh` | NVM lazy loader (`node`/`npm` ์ตœ์ดˆ ์‹คํ–‰ ์‹œ ๋กœ๋“œ) |
| `40-sdkman.sh` | SDKMAN lazy loader |
| `50-ai-env.sh` | `aienv` / `aienv-off` ํ•จ์ˆ˜ + ๋ฐฑ๊ทธ๋ผ์šด๋“œ ์—…๋ฐ์ดํŠธ ์ฒดํฌ |

#### ์‰˜ ๋ณ„์นญ ๋ชฉ๋ก (`10-aliases.sh`)

| ๋ณ„์นญ | ๋ช…๋ น | ์šฉ๋„ |
|------|------|------|
| `py` | `python3` | Python ์‹คํ–‰ |
| `pip` | `pip3` | pip ์‹คํ–‰ |
| `ipy` | `ipython` | IPython |
| `gs` | `git status` | Git ์ƒํƒœ |
| `gd` | `git diff` | Git ๋ณ€๊ฒฝ์‚ฌํ•ญ |
| `gl` | `git log --oneline -20` | ์ตœ๊ทผ ์ปค๋ฐ‹ 20๊ฐœ |
| `gp` | `git pull --rebase` | Git pull |
| `ms` | `cd ~/machine_setting` | ์ €์žฅ์†Œ ์ด๋™ |
| `mss` | `make status` | ๋™๊ธฐํ™” ์ƒํƒœ |
| `msu` | `make update` | ์—…๋ฐ์ดํŠธ |
| `msp` | `make push` | ํ‘ธ์‹œ |
| `gpustat` | `nvidia-smi --query-gpu=...` | GPU ์ƒํƒœ (Linux: nvidia-smi, macOS: ioreg) |

---

## Installed Components

์„ค์น˜ ํ›„ ์‹œ์Šคํ…œ์— ์ถ”๊ฐ€๋˜๋Š” ํ•ญ๋ชฉ ์ •๋ฆฌ:

### Files & Directories

| ๊ฒฝ๋กœ | ์„ค๋ช… | ์‚ญ์ œ ๋Œ€์ƒ |
|------|------|-----------|
| `~/machine_setting/` | ์ด ์ €์žฅ์†Œ ์ž์ฒด | `rm -rf ~/machine_setting` |
| `~/ai-env/` | Python venv (global ๋ชจ๋“œ) | `make uninstall` |
| `~/.local/bin/uv` | uv ํŒจํ‚ค์ง€ ๋งค๋‹ˆ์ € | ์ˆ˜๋™ ์‚ญ์ œ |
| `~/.local/share/uv/python/` | uv๊ฐ€ ๊ด€๋ฆฌํ•˜๋Š” Python ๋นŒ๋“œ | `make uninstall` |
| `~/.nvm/` | NVM + Node.js | `make uninstall` |
| `~/.sdkman/` | SDKMAN + Java | `make uninstall` |
| `~/.machine_setting/` | ์„ค์น˜ ์ƒํƒœ/์ฒดํฌํฌ์ธํŠธ/๋ฐฑ์—… | `make uninstall` |
| `~/.machine_setting_profile` | ํ•˜๋“œ์›จ์–ด ๊ฐ์ง€ ๊ฒฐ๊ณผ | `make uninstall` |
| `~/.bashrc.local` | ์‚ฌ์šฉ์ž ์‹œํฌ๋ฆฟ (์ž๋™ ์ƒ์„ฑ ํ…œํ”Œ๋ฆฟ) | **์ ˆ๋Œ€ ์‚ญ์ œ ์•ˆํ•จ** |
| `~/.zshrc.local` | zsh์šฉ ์‹œํฌ๋ฆฟ (bashrc.local ์‹ฌ๋ณผ๋ฆญ ๋งํฌ) | **์ ˆ๋Œ€ ์‚ญ์ œ ์•ˆํ•จ** |

### NVIDIA ์‹œ์Šคํ…œ ํŒŒ์ผ (Stage 2์—์„œ ์„ค์น˜)

| ๊ฒฝ๋กœ | ์„ค๋ช… | ์‚ญ์ œ ๋Œ€์ƒ |
|------|------|-----------|
| NVIDIA driver | `nvidia-driver-XXX` ํŒจํ‚ค์ง€ | `uninstall --component nvidia` |
| `/usr/local/cuda` | CUDA Toolkit ์‹ฌ๋ณผ๋ฆญ ๋งํฌ | `uninstall --component nvidia` |
| `cuda-toolkit` | CUDA ๊ฐœ๋ฐœ ๋„๊ตฌ | `uninstall --component nvidia` |
| `cudnn9-cuda-*` | cuDNN 9.x ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ | `uninstall --component nvidia` |
| `libnccl2`, `libnccl-dev` | NCCL ๋ฉ€ํ‹ฐ GPU ํ†ต์‹  | `uninstall --component nvidia` |
| `nvidia-container-toolkit` | Docker GPU ์ง€์› | `uninstall --component nvidia` |
| `/etc/sysctl.d/99-machine-setting-gpu.conf` | GPU ์ปค๋„ ํŒŒ๋ผ๋ฏธํ„ฐ | `uninstall --component nvidia` |
| `/etc/security/limits.d/99-machine-setting-gpu.conf` | memlock/nproc limits | `uninstall --component nvidia` |
| numactl, hwloc, nvtop, lm-sensors | ์‹œ์Šคํ…œ ์œ ํ‹ธ๋ฆฌํ‹ฐ | ์ˆ˜๋™ ์‚ญ์ œ |

### Shell RC Modifications

`.bashrc`์™€ `.zshrc`์— ๋งˆ์ปค ๋ธ”๋ก(`# >>> machine_setting >>>` ~ `# <<< machine_setting <<<`)์ด ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ์‚ญ์ œ ์‹œ ์ด ๋ธ”๋ก๋งŒ ์ œ๊ฑฐ๋˜๋ฉฐ, ์‚ฌ์šฉ์ž์˜ ๋‹ค๋ฅธ ์„ค์ •์€ ๋ณด์กด๋ฉ๋‹ˆ๋‹ค.

### Environment Variables (ํ™œ์„ฑํ™” ์‹œ)

| ๋ณ€์ˆ˜ | ๊ฐ’ | ์กฐ๊ฑด |
|------|------|------|
| `PATH` | `~/.local/bin`, CUDA ๊ฒฝ๋กœ ๋“ฑ ์ถ”๊ฐ€ | ํ•ญ์ƒ |
| `CUDA_HOME` | `/usr/local/cuda` | Linux + CUDA |
| `LD_LIBRARY_PATH` | CUDA lib64 ์ถ”๊ฐ€ | Linux + CUDA |
| `NVM_DIR` | `~/.nvm` | Node ์„ค์น˜ ์‹œ |
| `NVIDIA_TF32_OVERRIDE` | `1` | `aienv` ํ™œ์„ฑํ™” ์‹œ (Ampere+ GPU) |

---

## Daily Usage

```bash
aienv # Activate venv + background update check
aienv-off # Deactivate

make check # Verify environment (GPU, packages)
make push # Export packages + commit + push to remote
make update # Pull changes + notify if packages changed
make status # Show sync status
make export # Export current venv to requirements files
make doctor # Full health check
make recover # Auto-recover broken components
```

### ์ „์ฒด Make ํƒ€๊ฒŸ

| ํƒ€๊ฒŸ | ์„ค๋ช… |
|------|------|
| `make setup` | ์ „์ฒด ๋ถ€ํŠธ์ŠคํŠธ๋žฉ ์„ค์น˜ |
| `make plan` | Pre-flight check (์„ค์น˜ ๊ณ„ํš๋งŒ) |
| `make preflight` | Pre-flight check ํ›„ ์„ค์น˜ |
| `make dry-run` | ์ „์ฒด ์‹œ์Šคํ…œ dry-run ์ง„๋‹จ (7๋‹จ๊ณ„) |
| `make check` | AI ํ™˜๊ฒฝ ๊ฒ€์ฆ (GPU, ํŒจํ‚ค์ง€) |
| `make update` | ๋ฆฌ๋ชจํŠธ์—์„œ pull + ๋ณ€๊ฒฝ์‚ฌํ•ญ ์•Œ๋ฆผ |
| `make push` | ํŒจํ‚ค์ง€ export + commit + push |
| `make status` | ๋™๊ธฐํ™” ์ƒํƒœ ํ™•์ธ |
| `make export` | venv โ†’ requirements ํŒŒ์ผ export |
| `make venv` | venv ์ƒ์„ฑ/์—…๋ฐ์ดํŠธ |
| `make venv-local` | ํ”„๋กœ์ ํŠธ ๋กœ์ปฌ venv ์ƒ์„ฑ |
| `make detect` | ํ•˜๋“œ์›จ์–ด ๊ฐ์ง€ ์‹คํ–‰ |
| `make secrets` | ์‹œํฌ๋ฆฟ ๋ˆ„์ถœ ์Šค์บ” |
| `make doctor` | ๊ฑด๊ฐ• ์ฒดํฌ |
| `make recover` | ์ž๋™ ๋ณต๊ตฌ |
| `make verify` | ํŒจํ‚ค์ง€ ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์ฆ |
| `make uninstall` | ๋Œ€ํ™”ํ˜• ์‚ญ์ œ |
| `make uninstall-dry` | ์‚ญ์ œ ๋ฏธ๋ฆฌ๋ณด๊ธฐ |
| `make reset` | ์ƒํƒœ ์ดˆ๊ธฐํ™” ํ›„ ์ฒ˜์Œ๋ถ€ํ„ฐ |
| `make cloud` | Cloud/container ์„ค์น˜ (user-space only, sudo ๋ถˆํ•„์š”) |
| `make gpu-extras` | GPU ๋ณด์กฐ๋งŒ ์„ค์น˜ (์‹œ์Šคํ…œ ์œ ํ‹ธ + ์ปค๋„ ํŠœ๋‹, sudo). ๋“œ๋ผ์ด๋ฒ„/CUDA๋Š” ์ด๋ฏธ ์žˆ์„ ๋•Œ ์‚ฌ์šฉ |
| `make gpu-doctor` | GPU ์ „์šฉ ๊ฑด๊ฐ• ์ง„๋‹จ |
| `make gpu-persist-fix` | GPU ์•ˆ์ •์„ฑ ์˜๊ตฌ ์ˆ˜์ • (sudo ํ•„์š”) |
| `make gpu-persist-check` | GPU ์•ˆ์ •์„ฑ ์ˆ˜์ • ์ƒํƒœ ํ™•์ธ (sudo ๋ถˆํ•„์š”) |

### `aienv` ๋™์ž‘ ์ƒ์„ธ

1. `~/ai-env/bin/activate` ์‹คํ–‰ (venv ํ™œ์„ฑํ™”)
2. `NVIDIA_TF32_OVERRIDE=1` ์„ค์ • (Ampere+ GPU์—์„œ FP32 ์—ฐ์‚ฐ ~2x ๊ฐ€์†)
3. Cloud/container์—์„œ nvcc ์—†์œผ๋ฉด `DS_BUILD_OPS=0` ์ž๋™ ์„ค์ • (DeepSpeed JIT ์ปดํŒŒ์ผ ๋น„ํ™œ์„ฑ)
4. **๋ฐฑ๊ทธ๋ผ์šด๋“œ ์—…๋ฐ์ดํŠธ ์ฒดํฌ** ์‹œ์ž‘:
- 24์‹œ๊ฐ„๋งˆ๋‹ค `git fetch origin main` ์‹คํ–‰
- ๋กœ์ปฌ๊ณผ ๋ฆฌ๋ชจํŠธ๊ฐ€ ๋‹ค๋ฅด๋ฉด ์—…๋ฐ์ดํŠธ ์•Œ๋ฆผ ์ถœ๋ ฅ
- ์™„์ „ํžˆ ๋ฐฑ๊ทธ๋ผ์šด๋“œ๋กœ ์‹คํ–‰๋˜์–ด ์…ธ ์†๋„์— ์˜ํ–ฅ ์—†์Œ

### `make check` ์ถœ๋ ฅ ์˜ˆ์‹œ

```
=== AI Environment Check ===
Venv: /root/ai-env
Python: Python 3.12.13

Installed packages: 379

--- Core Packages ---
OK transformers 4.57.6
OK datasets 4.6.1
OK accelerate 1.12.0
...

--- GPU Stack ---
torch 2.10.0+cu128 (CUDA 12.8, 1 GPU(s), NVIDIA H100 PCIe)
cuDNN 91002 (enabled=True)
NCCL 2.27.5

--- GPU Packages ---
bitsandbytes 0.49.2
triton 3.6.0
vllm 0.17.1
deepspeed 0.18.8

--- GPU Functional Test ---
matmul (512x512): OK
cuDNN conv2d: OK
GPU memory: 39.4 GB
Compute cap: (9, 0)

--- Environment ---
Container: yes
nvcc: stub (runtime only, no JIT compile)
Display: headless (no libGL)
```

---

## CLI Options

### Interactive (๊ธฐ๋ณธ)

```bash
./setup.sh
```

๋ชจ๋“  ๋‹จ๊ณ„์—์„œ ์˜ต์…˜์„ ๋ฌผ์–ด๋ด…๋‹ˆ๋‹ค (Python ๋ฒ„์ „, venv ์œ„์น˜, Node/Java ์„ค์น˜ ์—ฌ๋ถ€). Pre-flight check๊ฐ€ ๋จผ์ € ์‹คํ–‰๋˜์–ด ํ˜„์žฌ ์ƒํƒœ์™€ ํ•„์š”ํ•œ ์ž‘์—…์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

### Non-interactive

```bash
# ์ „์ฒด ์ง€์ •
./setup.sh --python 3.12 --venv global --node --java

# ํ”„๋กœํ•„ ์‚ฌ์šฉ
./setup.sh --profile gpu-workstation
./setup.sh --profile mac-apple-silicon

# ์„ ํƒ์  ์„ค์น˜
./setup.sh --no-node --no-java --venv local

# Custom venv ๊ฒฝ๋กœ
./setup.sh --venv /data/ai-env
```

### Dry-Run ์ง„๋‹จ

```bash
# ์ „์ฒด ์‹œ์Šคํ…œ dry-run (7๊ฐœ ์ „ ๋‹จ๊ณ„ ์ง„๋‹จ)
./setup.sh --dry-run
make dry-run

# ํŠน์ • ๋‹จ๊ณ„๋งŒ ์ง„๋‹จ
./scripts/dry-run.sh --stage nvidia
./scripts/dry-run.sh --stage python

# ํ”„๋กœํ•„ ๊ธฐ๋ฐ˜ ์ง„๋‹จ
./scripts/dry-run.sh --profile gpu-workstation

# JSON ์ถœ๋ ฅ (์Šคํฌ๋ฆฝํŒ…์šฉ)
./scripts/dry-run.sh --json
```

Dry-run์€ ์‹ค์ œ ์„ค์น˜ ์—†์ด 7๊ฐœ ์ „ ๋‹จ๊ณ„๋ฅผ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค:
- ํ˜„์žฌ ์„ค์น˜ ์ƒํƒœ ๋ฐ ๋ฒ„์ „ ๊ฐ์ง€
- ์„ค์น˜/์—…๊ทธ๋ ˆ์ด๋“œ/์Šคํ‚ต ์•ก์…˜ ํ”Œ๋žœ
- ์ถฉ๋Œ ๋ฐ ํ˜ธํ™˜์„ฑ ๊ฒ€์‚ฌ (CUDAโ†”PyTorch, Pythonโ†”venv ๋“ฑ)
- ๋””์Šคํฌ ์‚ฌ์šฉ๋Ÿ‰ ๋ฐ ์˜ˆ์ƒ ์„ค์น˜ ์‹œ๊ฐ„
- ์ฐจ๋‹จ ์ด์Šˆ ๋ฐœ๊ฒฌ ์‹œ exit code 1 ๋ฐ˜ํ™˜

### Pre-flight & Planning

```bash
# ์„ค์น˜ ๊ณ„ํš๋งŒ ํ™•์ธ (์‹ค์ œ ์„ค์น˜ ์•ˆํ•จ)
./setup.sh --plan
make plan

# Pre-flight check ํ›„ ์„ ํƒ์  ์„ค์น˜
./setup.sh --preflight
make preflight

# ์ง์ ‘ ์‹คํ–‰ (์ถ”๊ฐ€ ์˜ต์…˜)
./scripts/preflight.sh --check-only # ์ƒํƒœ ํ™•์ธ๋งŒ (= --plan)
./scripts/preflight.sh --quiet # ๋น„๋Œ€ํ™”ํ˜• (๊ณ„ํš ํŒŒ์ผ ์ž‘์„ฑ ํ›„ ์ข…๋ฃŒ)
./scripts/preflight.sh --profile gpu-workstation # ํŠน์ • ํ”„๋กœํ•„ ๊ธฐ์ค€ ๊ฒ€์‚ฌ
```

### Resume & Recovery

```bash
# ์ด์ „ ์‹คํŒจ ์ง€์ ๋ถ€ํ„ฐ ์žฌ๊ฐœ
./setup.sh --resume

# ์ƒํƒœ ์ดˆ๊ธฐํ™” ํ›„ ์ฒ˜์Œ๋ถ€ํ„ฐ
./setup.sh --reset

# ํŠน์ • ๋‹จ๊ณ„๋ถ€ํ„ฐ ์‹œ์ž‘ (์ด์ „ ๋‹จ๊ณ„๋Š” ์™„๋ฃŒ ์ฒ˜๋ฆฌ)
./setup.sh --from 4 # Stage 4 (venv)๋ถ€ํ„ฐ
./setup.sh --from 7 # Stage 7 (shell)๋งŒ

# ๊ฑด๊ฐ• ์ฒดํฌ
./setup.sh --doctor

# ์ž๋™ ๋ณต๊ตฌ
./setup.sh --recover
```

### ์ „์ฒด ์˜ต์…˜ ์š”์•ฝ

| Flag | ์„ค๋ช… |
|------|------|
| `--python ` | Python ๋ฒ„์ „ (๊ธฐ๋ณธ: 3.12) |
| `--venv ` | `global` / `local` / `` |
| `--node` / `--no-node` | Node.js ์„ค์น˜/๋ฏธ์„ค์น˜ |
| `--java` / `--no-java` | Java ์„ค์น˜/๋ฏธ์„ค์น˜ |
| `--profile ` | ํ”„๋กœํ•„ ์‚ฌ์šฉ |
| `--cloud` | Cloud/container ๋ชจ๋“œ (๋“œ๋ผ์ด๋ฒ„/CUDA/์ปค๋„ ํŠœ๋‹ ์Šคํ‚ต) |
| `--dry-run` | ์ „์ฒด ์‹œ์Šคํ…œ dry-run ์ง„๋‹จ (7๋‹จ๊ณ„) |
| `--plan` | Pre-flight check๋งŒ ์‹คํ–‰ |
| `--preflight` | Pre-flight check ํ›„ ์„ค์น˜ |
| `--resume` | ์‹คํŒจ ์ง€์ ๋ถ€ํ„ฐ ์žฌ๊ฐœ |
| `--reset` | ์ƒํƒœ ์ดˆ๊ธฐํ™” ํ›„ ์ฒ˜์Œ๋ถ€ํ„ฐ |
| `--from ` | Stage N (1-7)๋ถ€ํ„ฐ ์‹œ์ž‘ |
| `--doctor` | ๊ฑด๊ฐ• ์ฒดํฌ |
| `--recover` | ์ž๋™ ๋ณต๊ตฌ |
| `--uninstall` | ์‚ญ์ œ (์ถ”๊ฐ€ ํ”Œ๋ž˜๊ทธ ๊ฐ€๋Šฅ) |
| `--gpu-doctor` | GPU ์ „์šฉ ๊ฑด๊ฐ• ์ง„๋‹จ |
| `--gpu-persist-fix` | GPU ์•ˆ์ •์„ฑ ์˜๊ตฌ ์ˆ˜์ • (sudo ํ•„์š”) |

---

## Profiles

| Profile | Platform | GPU Backend | NVIDIA Stage | Node | Java | Packages |
|---------|----------|-------------|-------------|------|------|----------|
| gpu-enterprise | Linux | CUDA (Enterprise) | Full + DCGM/FM/GDS | No | No | core+data+web+gpu |
| ngc-container | NGC/Linux | CUDA (NV symlink) | Skip (pre-installed) | No | No | core+data+web+nv-link |
| gpu-workstation | Linux | CUDA | Full (consumer) | Yes | Yes | core+data+web+gpu |
| cloud-server | Cloud/Container | CUDA (ํ˜ธ์ŠคํŠธ ์ œ๊ณต) | Skip (user-space only) | Yes | Yes | core+data+web+gpu |
| mac-apple-silicon | macOS | MPS | Skip (N/A) | Yes | No | core+data+web+mps |
| cpu-server | Linux | None | Skip (no GPU) | Yes | Yes | core+data+web+cpu |
| laptop | Any | None | Skip (no GPU) | Yes | No | core+data+web+cpu |
| minimal | Any | None | Skip | No | No | core only |

### Machine-specific ์„ค์ •

`config/machine.conf`๋ฅผ ๋งŒ๋“ค์–ด ๊ธฐ๋ณธ ์„ค์ •์„ ์˜ค๋ฒ„๋ผ์ด๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (`.gitignore`์— ํฌํ•จ):

```bash
cp config/machine.conf.example config/machine.conf
# ํŽธ์ง‘: Python ๋ฒ„์ „, Node/Java ์„ค์น˜ ์—ฌ๋ถ€, ํŒจํ‚ค์ง€ ๊ทธ๋ฃน ๋“ฑ
```

---

## GPU Support

| Platform | GPU | Backend | ์ž๋™ ๊ฐ์ง€ ๋ฐฉ๋ฒ• |
|----------|-----|---------|---------------|
| NGC container | NVIDIA | CUDA (NV custom build symlink) | torch ๋ฒ„์ „ ์ฒดํฌ |
| Cloud/Container | NVIDIA | CUDA (ํ˜ธ์ŠคํŠธ ๋“œ๋ผ์ด๋ฒ„ ์‚ฌ์šฉ) | Docker/K8s/VM ๋ฒค๋” ๊ฐ์ง€ + nvidia-smi |
| Linux | NVIDIA | CUDA (cu131, cu130, cu126 ๋“ฑ) | lspci + nvcc |
| macOS arm64 | Apple Silicon | MPS (Metal) | uname -m |
| Any | None | CPU fallback | ์ž๋™ |

### NVIDIA System-Level Install (Stage 2)

[2/7] ๋‹จ๊ณ„์—์„œ ๋‹ค์Œ ์‹œ์Šคํ…œ ๋ ˆ๋ฒจ NVIDIA ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์ž๋™ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค:

```bash
# ์ž๋™ ๋ชจ๋“œ (๊ธฐ๋ณธ) โ€” GPU ๊ฐ์ง€ ํ›„ ์ตœ์  ๊ตฌ์„ฑ ์ž๋™ ์„ค์น˜
./setup.sh

# ์ˆ˜๋™: NVIDIA ์Šคํฌ๋ฆฝํŠธ ์ง์ ‘ ์‹คํ–‰
./scripts/install-nvidia.sh # ์ „์ฒด ์ž๋™
./scripts/install-nvidia.sh --driver-only # ๋“œ๋ผ์ด๋ฒ„๋งŒ
./scripts/install-nvidia.sh --no-driver # ๋“œ๋ผ์ด๋ฒ„ ์ œ์™ธ (CUDA/cuDNN/NCCL๋งŒ)
./scripts/install-nvidia.sh --extras-only # ๋ณด์กฐ๋งŒ (์‹œ์Šคํ…œ ์œ ํ‹ธ + ์ปค๋„ ํŠœ๋‹, ๋“œ๋ผ์ด๋ฒ„/CUDA ์Šคํ‚ต). make gpu-extras์™€ ๋™์ผ
./scripts/install-nvidia.sh --enterprise # ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ๋„๊ตฌ ํฌํ•จ
./scripts/install-nvidia.sh --dry-run # ์„ค์น˜ ๋ฏธ๋ฆฌ๋ณด๊ธฐ (์‹ฌ์ธต ์ง„๋‹จ)
./scripts/install-nvidia.sh --uninstall # NVIDIA ์Šคํƒ ์ „์ฒด ์ œ๊ฑฐ

# ์„ธ๋ถ€ ์„ ํƒ ์„ค์น˜
./scripts/install-nvidia.sh --no-cuda # CUDA ์ œ์™ธ (cuDNN/NCCL๋„ ์ œ์™ธ)
./scripts/install-nvidia.sh --no-cudnn # cuDNN๋งŒ ์ œ์™ธ
./scripts/install-nvidia.sh --no-nccl # NCCL๋งŒ ์ œ์™ธ
./scripts/install-nvidia.sh --no-container-toolkit # Docker GPU ์ง€์› ์ œ์™ธ
./scripts/install-nvidia.sh --no-system-tools # ์‹œ์Šคํ…œ ์œ ํ‹ธ๋ฆฌํ‹ฐ ์ œ์™ธ
./scripts/install-nvidia.sh --no-kernel-tuning # ์ปค๋„/sysctl ์ตœ์ ํ™” ์ œ์™ธ

# ๋ฒ„์ „ ์ง€์ •
./scripts/install-nvidia.sh --driver-version 570 # ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „ ์ง€์ •
./scripts/install-nvidia.sh --cuda-version 13-0 # CUDA ๋ฒ„์ „ ์ง€์ •
./scripts/install-nvidia.sh --open-kernel # open ์ปค๋„ ๋ชจ๋“ˆ ๊ฐ•์ œ
./scripts/install-nvidia.sh --proprietary # proprietary ์ปค๋„ ๋ชจ๋“ˆ ๊ฐ•์ œ
```

**NVIDIA ์„ค์ • ์˜ต์…˜** (`config/default.conf` ๋˜๋Š” `config/machine.conf`):

| ์„ค์ • | ๊ธฐ๋ณธ๊ฐ’ | ์„ค๋ช… |
|------|--------|------|
| `INSTALL_NVIDIA` | `true` | NVIDIA ์Šคํ…Œ์ด์ง€ ์ „์ฒด ํ™œ์„ฑํ™”/๋น„ํ™œ์„ฑํ™” |
| `NVIDIA_DRIVER_VERSION` | `""` (์ž๋™) | ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „ (๋น„์–ด์žˆ์œผ๋ฉด ์ถ”์ฒœ ๋ฒ„์ „ ์ž๋™ ์„ ํƒ) |
| `NVIDIA_CUDA_VERSION` | `"13-0"` (CUDA 13.0) | CUDA ๋ฒ„์ „. ๊ตฌํ˜• GPU๋Š” machine.conf์—์„œ `"12-6"` ๋“ฑ ์ง€์ • |
| `NVIDIA_OPEN_KERNEL` | `auto` | open/proprietary ์ปค๋„ ๋ชจ๋“ˆ ์„ ํƒ |
| `NVIDIA_ENTERPRISE` | `false` | ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ๋„๊ตฌ (DCGM, FM, GDS, peermem) |
| `NVIDIA_NO_DRIVER` | `false` | ๋“œ๋ผ์ด๋ฒ„ ์„ค์น˜ ์Šคํ‚ต |
| `NVIDIA_CONTAINER_TOOLKIT` | `true` | Docker GPU ์ง€์› |
| `NVIDIA_SYSTEM_TOOLS` | `true` | ๋นŒ๋“œ ๋„๊ตฌ, ๋ชจ๋‹ˆํ„ฐ๋ง ๋„๊ตฌ |
| `NVIDIA_KERNEL_TUNING` | `true` | ์ปค๋„/sysctl ์ตœ์ ํ™” |

### CUDA ๋ฒ„์ „ ๋งค์นญ (Python ํŒจํ‚ค์ง€)

๊ฐ์ง€๋œ CUDA ๋ฒ„์ „์— ๋”ฐ๋ผ PyTorch index URL์ด ์ž๋™ ์„ ํƒ๋ฉ๋‹ˆ๋‹ค (`config/gpu-index-urls.conf`):

```
cu131=https://download.pytorch.org/whl/cu131
cu130=https://download.pytorch.org/whl/cu130
cu126=https://download.pytorch.org/whl/cu126
cu124=https://download.pytorch.org/whl/cu124
cu121=https://download.pytorch.org/whl/cu121
cpu=https://download.pytorch.org/whl/cpu
```

๊ฐ์ง€๋œ CUDA suffix๊ฐ€ ๋ชฉ๋ก์— ์—†์œผ๋ฉด, ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋‚ฎ์€ ๋ฒ„์ „์œผ๋กœ ์ž๋™ fallback๋ฉ๋‹ˆ๋‹ค.

### NGC Container Mode

NGC ์ปจํ…Œ์ด๋„ˆ์ฒ˜๋Ÿผ ์‹œ์Šคํ…œ์— NV ์ปค์Šคํ…€ ๋นŒ๋“œ(torch, flash_attn, transformer_engine)๊ฐ€ ์ด๋ฏธ ์„ค์น˜๋œ ํ™˜๊ฒฝ์—์„œ๋Š” PyPI์—์„œ ๋‹ค์‹œ ๋ฐ›์ง€ ์•Š๊ณ  ์‹ฌ๋ณผ๋ฆญ ๋งํฌ๋กœ venv์— ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค:

```bash
# ์ž๋™ ๊ฐ์ง€ (NGC ์ปจํ…Œ์ด๋„ˆ๋ฉด ์ž๋™ ์„ ํƒ)
./setup.sh

# ์ˆ˜๋™ ์ง€์ •
./setup.sh --profile ngc-container
scripts/setup-venv.sh --nv-link
```

**์‹ฌ๋ณผ๋ฆญ ๋งํฌ ๋Œ€์ƒ ํŒจํ‚ค์ง€:** torch, torchvision, torchaudio, triton, flash_attn, transformer_engine

๋™์ž‘ ๋ฐฉ์‹:
1. ์‹œ์Šคํ…œ site-packages ๊ฒฝ๋กœ ๊ฐ์ง€ (์˜ˆ: `/usr/local/lib/python3.12/dist-packages`)
2. ๋Œ€์ƒ ํŒจํ‚ค์ง€ ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ venv์˜ site-packages์— ์‹ฌ๋ณผ๋ฆญ ๋งํฌ
3. `.dist-info` ๋””๋ ‰ํ† ๋ฆฌ๋„ ํ•จ๊ป˜ ๋งํฌ (pip์ด ํŒจํ‚ค์ง€๋ฅผ ์ธ์‹ํ•˜๋„๋ก)

> **Python ๋ฒ„์ „ ๋ถˆ์ผ์น˜ ๊ฐ์ง€:** ์‹œ์Šคํ…œ Python(์˜ˆ: 3.10)๊ณผ venv Python(์˜ˆ: 3.12)์˜ ๋ฒ„์ „์ด ๋‹ค๋ฅด๋ฉด NV link๊ฐ€ ์ž๋™์œผ๋กœ ์Šคํ‚ต๋˜๊ณ , GPU ํŒจํ‚ค์ง€๋ฅผ pip์œผ๋กœ ์ง์ ‘ ์„ค์น˜ํ•˜๋Š” fallback์ด ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.

### Cloud/Container Mode

Docker, Kubernetes, ํด๋ผ์šฐ๋“œ VM ๋“ฑ ํ˜ธ์ŠคํŠธ๊ฐ€ GPU ๋“œ๋ผ์ด๋ฒ„๋ฅผ ์ œ๊ณตํ•˜๋Š” ํ™˜๊ฒฝ์—์„œ user-space ๋„๊ตฌ๋งŒ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

```bash
# ์ž๋™ ๊ฐ์ง€ (์ปจํ…Œ์ด๋„ˆ/ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์ด๋ฉด ์ž๋™ ํ™œ์„ฑํ™”)
./setup.sh

# ๋ช…์‹œ ์ง€์ •
./setup.sh --cloud
make cloud
```

**์ž๋™ ๊ฐ์ง€ ์กฐ๊ฑด:**
- `/.dockerenv` ๋˜๋Š” `/run/.containerenv` ์กด์žฌ
- cgroup์— `docker`/`containerd`/`kubepods` ๋งˆ์ปค
- `KUBERNETES_SERVICE_HOST` ํ™˜๊ฒฝ๋ณ€์ˆ˜
- DMI ๋ฒค๋”๊ฐ€ AWS/GCP/Azure/DigitalOcean/Vultr/Oracle
- sudo ๋ช…๋ น์ด ์—†๊ฑฐ๋‚˜ ๊ถŒํ•œ ์—†์Œ

**Cloud ๋ชจ๋“œ์—์„œ ์Šคํ‚ต:**
- NVIDIA ๋“œ๋ผ์ด๋ฒ„/CUDA Toolkit ์„ค์น˜
- ์ปค๋„ ํŠœ๋‹ (sysctl, limits)
- ์‹œ์Šคํ…œ ํŒจํ‚ค์ง€ (apt-get)

**Cloud ๋ชจ๋“œ์—์„œ ์„ค์น˜:**
- Hardware Detection (์ฝ๊ธฐ ์ „์šฉ, ํ˜ธ์ŠคํŠธ GPU ์ธ์‹)
- Python (uv, user-space)
- AI venv (GPU ํŒจํ‚ค์ง€ ํฌํ•จ โ€” ํ˜ธ์ŠคํŠธ CUDA ๋Ÿฐํƒ€์ž„ ์‚ฌ์šฉ)
- Node.js (nvm), Java (sdkman)
- Shell integration

**Cloud ํ™˜๊ฒฝ ์ถ”๊ฐ€ ๋Œ€์‘:**
- **headless ์ปจํ…Œ์ด๋„ˆ**: `libGL` ์—†์œผ๋ฉด `opencv-python` ๋Œ€์‹  `opencv-python-headless`๋งŒ ์„ค์น˜
- **nvcc ์—†๋Š” ๋Ÿฐํƒ€์ž„ ์ปจํ…Œ์ด๋„ˆ**: nvcc stub ์ž๋™ ์ƒ์„ฑ (DeepSpeed import ํ˜ธํ™˜)
- **`aienv` ํ™œ์„ฑํ™” ์‹œ**: nvcc ์—†์œผ๋ฉด `DS_BUILD_OPS=0` ์ž๋™ ์„ค์ •

### GPU Persistence & Stability

PCIe ์ „์› ๊ด€๋ฆฌ๋กœ ์ธํ•œ Xid 79 "GPU has fallen off the bus" ๋ฌธ์ œ๋ฅผ ์˜๊ตฌ์ ์œผ๋กœ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

```bash
# ์ƒํƒœ ํ™•์ธ (sudo ๋ถˆํ•„์š”)
./scripts/gpu-persist-fix.sh --check
make gpu-persist-check

# ์˜๊ตฌ ์ˆ˜์ • ์ ์šฉ (6๊ฐœ ํ•ญ๋ชฉ)
sudo ./scripts/gpu-persist-fix.sh
make gpu-persist-fix

# ๋ณ€๊ฒฝ ๋ฏธ๋ฆฌ๋ณด๊ธฐ
sudo ./scripts/gpu-persist-fix.sh --dry-run

# ๋ณ€๊ฒฝ ๋˜๋Œ๋ฆฌ๊ธฐ
sudo ./scripts/gpu-persist-fix.sh --revert
```

**์ ์šฉ๋˜๋Š” 6๊ฐ€์ง€ ์ˆ˜์ •:**

| # | ํ•ญ๋ชฉ | ์„ค๋ช… |
|---|------|------|
| 1 | GRUB | PCIe ASPM ๋น„ํ™œ์„ฑํ™” + GPU ๋™์  ์ „์› ๊ด€๋ฆฌ ๋” |
| 2 | udev | NVIDIA GPU PCIe power/control์„ 'on'์œผ๋กœ ๊ฐ•์ œ |
| 3 | modprobe | NVreg_DynamicPowerManagement=0x00 ์„ค์ • |
| 4 | nvidia-persistenced | GPU persistence daemon ํ™œ์„ฑํ™” |
| 5 | GPU watchdog | 5๋ถ„๋งˆ๋‹ค GPU ์ƒํƒœ ์ ๊ฒ€ systemd timer |
| 6 | PCIe power service | ๋ถ€ํŒ… ์‹œ power/control=on ๊ฐ•์ œ ์ ์šฉ |

> **์ฐธ๊ณ :** PCIe power ์„ค์ •์€ ์ฆ‰์‹œ ์ ์šฉ๋˜๋ฉฐ, ๋‚˜๋จธ์ง€๋Š” ๋ฆฌ๋ถ€ํŠธ ํ›„ ์™„์ „ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

---

## Pre-flight Check

`./setup.sh --plan` ๋˜๋Š” `make plan`์œผ๋กœ ์‹ค์ œ ์„ค์น˜ ์—†์ด ํ˜„์žฌ ์‹œ์Šคํ…œ ์ƒํƒœ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

```
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘ Pre-flight System Check โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

System: Ubuntu 22.04.5 LTS / AMD EPYC 7763 (128 cores) / 512GB RAM / 2847GB free
GPU: NVIDIA A100-SXM4-80GB / CUDA 12.6 (cu126)
Profile: gpu-workstation

# Component Current Status Proposed Action
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
* 1 Hardware Profile not generated โ†’ INSTALL
Generate ~/.machine_setting_profile
* 2 NVIDIA GPU Stack driver 535 / no CUDA โ†’ INSTALL
Install CUDA toolkit, cuDNN, NCCL, system tools
3 Python 3.12 3.12.8 installed + uv 0.5.14 (ok)
* 4 AI Environment not created โ†’ INSTALL
Create ~/ai-env + install [core data web + GPU]
5 Node.js v22.12.0 (NVM) (ok)
* 6 Java 21 not installed โ†’ INSTALL
Install SDKMAN + Java 21
7 Shell Integration configured (.bashrc .zshrc) (ok)
```

Interactive ๋ชจ๋“œ์—์„œ๋Š” ํ•ญ๋ชฉ๋ณ„๋กœ ํ† ๊ธ€ํ•˜์—ฌ ์›ํ•˜๋Š” ๊ฒƒ๋งŒ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

---

## Reinstallation

### ์ „์ฒด ์žฌ์„ค์น˜

```bash
# ๋ฐฉ๋ฒ• 1: ์ƒํƒœ ๋ฆฌ์…‹ ํ›„ ์žฌ์„ค์น˜
./setup.sh --reset

# ๋ฐฉ๋ฒ• 2: make ์‚ฌ์šฉ
make reset
```

์ด ๋ช…๋ น์€ `~/.machine_setting/install.state` ํŒŒ์ผ์„ ์ดˆ๊ธฐํ™”ํ•˜๊ณ , ๋ชจ๋“  ๋‹จ๊ณ„๋ฅผ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋‹ค์‹œ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ ์„ค์น˜๋œ ์ปดํฌ๋„ŒํŠธ(venv, Python ๋“ฑ)๋Š” ๊ฐ ๋‹จ๊ณ„์—์„œ "์ด๋ฏธ ์กด์žฌ" ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•˜์—ฌ ์žฌ์ƒ์„ฑ ์—ฌ๋ถ€๋ฅผ ๋ฌผ์–ด๋ด…๋‹ˆ๋‹ค.

### ํŠน์ • ๋‹จ๊ณ„๋งŒ ์žฌ์„ค์น˜

```bash
# Stage 4 (venv)๋ถ€ํ„ฐ ์žฌ์„ค์น˜ โ€” Stage 1~3์€ ๊ฑด๋„ˆ๋œ€
./setup.sh --from 4

# Stage 7 (shell integration)๋งŒ ์žฌ์„ค์น˜
./setup.sh --from 7
```

### venv๋งŒ ์žฌ์ƒ์„ฑ

```bash
# ๊ธฐ์กด venv ์‚ญ์ œ ํ›„ ์žฌ์ƒ์„ฑ (ํŒจํ‚ค์ง€ ์ „์ฒด ์žฌ์„ค์น˜)
rm -rf ~/ai-env
make venv

# ๋˜๋Š” ์Šคํฌ๋ฆฝํŠธ ์ง์ ‘ ์‹คํ–‰ (์ „์ฒด ์˜ต์…˜)
scripts/setup-venv.sh --global --python 3.12
scripts/setup-venv.sh --local # ํ”„๋กœ์ ํŠธ ๋กœ์ปฌ .venv
scripts/setup-venv.sh --path /custom/path # ์ปค์Šคํ…€ ๊ฒฝ๋กœ
scripts/setup-venv.sh --profile gpu-workstation # ํ”„๋กœํ•„ ์ง€์ •
scripts/setup-venv.sh --nv-link # NGC ์ปจํ…Œ์ด๋„ˆ์šฉ (์‹œ์Šคํ…œ ํŒจํ‚ค์ง€ ์‹ฌ๋ณผ๋ฆญ ๋งํฌ)
```

### ํŒจํ‚ค์ง€๋งŒ ์—…๋ฐ์ดํŠธ

```bash
# ๋ฆฌ๋ชจํŠธ์—์„œ ์ตœ์‹  requirements ๊ฐ€์ ธ์™€์„œ venv ์—…๋ฐ์ดํŠธ
make update

# ์ˆ˜๋™์œผ๋กœ venv์— ํŒจํ‚ค์ง€ ์žฌ์„ค์น˜
scripts/setup-venv.sh
```

---

## Uninstall

### Interactive ๋ชจ๋“œ (๊ธฐ๋ณธ)

```bash
make uninstall
# ๋˜๋Š”
./scripts/uninstall.sh
```

์„ค์น˜๋œ ์ปดํฌ๋„ŒํŠธ ๋ชฉ๋ก์„ ๋ณด์—ฌ์ฃผ๊ณ , ํ† ๊ธ€ ๋ฐฉ์‹์œผ๋กœ ์‚ญ์ œํ•  ํ•ญ๋ชฉ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค:

```
=== Machine Setting Uninstall ===

Components found:
[1] โœ“ NVIDIA stack (driver 560.35.03, CUDA, cuDNN, tools)
[2] โœ“ AI Virtual Environment (~/ai-env, 12G)
[3] โœ“ Python via uv (1.8G)
[4] โœ“ NVM + Node.js (287M)
[5] Java/SDKMAN (not installed)
[6] โœ“ Shell integration (.bashrc .zshrc)
[7] โœ“ Config & state files

Toggle numbers to select/deselect, 'a' for all, Enter to proceed:
```

### ์ „์ฒด ์‚ญ์ œ

```bash
# ๋ชจ๋“  ์ปดํฌ๋„ŒํŠธ ์‚ญ์ œ (ํ™•์ธ ํ•„์š”: 'UNINSTALL' ์ž…๋ ฅ)
./scripts/uninstall.sh --all

# config/state๋Š” ์œ ์ง€ํ•˜๊ณ  ๋Ÿฐํƒ€์ž„๋งŒ ์‚ญ์ œ
./scripts/uninstall.sh --all --keep-config
```

### ํŠน์ • ์ปดํฌ๋„ŒํŠธ๋งŒ ์‚ญ์ œ

```bash
# venv์™€ Node.js๋งŒ ์‚ญ์ œ
./scripts/uninstall.sh --component venv,node

# NVIDIA ์Šคํƒ๋งŒ ์‚ญ์ œ
./scripts/uninstall.sh --component nvidia

# ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ปดํฌ๋„ŒํŠธ: nvidia, venv, python, node, java, shell, config
```

### Dry-run (์‚ญ์ œ ๋ฏธ๋ฆฌ๋ณด๊ธฐ)

```bash
make uninstall-dry
# ๋˜๋Š”
./scripts/uninstall.sh --dry-run
```

### ์™„์ „ ์‚ญ์ œ

uninstall ํ›„์—๋„ `~/machine_setting` ์ €์žฅ์†Œ ์ž์ฒด๋Š” ๋‚จ์•„์žˆ์Šต๋‹ˆ๋‹ค. ์™„์ „ํžˆ ์ œ๊ฑฐํ•˜๋ ค๋ฉด:

```bash
./scripts/uninstall.sh --all
rm -rf ~/machine_setting
```

**์ฃผ์˜:** `~/.bashrc.local`๊ณผ `~/.zshrc.local`์€ ์‚ฌ์šฉ์ž ์‹œํฌ๋ฆฟ ํŒŒ์ผ์ด๋ฏ€๋กœ ์ ˆ๋Œ€ ์ž๋™ ์‚ญ์ œ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

---

## Health Check & Recovery

### Doctor (๊ฑด๊ฐ• ์ฒดํฌ)

```bash
make doctor
# ๋˜๋Š”
./scripts/doctor.sh
```

๋‹ค์Œ ํ•ญ๋ชฉ์„ ์ ๊ฒ€ํ•ฉ๋‹ˆ๋‹ค:

| ์ฒดํฌ ํ•ญ๋ชฉ | ํ™•์ธ ๋‚ด์šฉ |
|-----------|-----------|
| Disk space | venv ๊ฒฝ๋กœ์— 1GB ์ด์ƒ ์—ฌ์œ  |
| Hardware profile | `~/.machine_setting_profile` ์กด์žฌ ๋ฐ ์œ ํšจ์„ฑ |
| Cloud environment | Cloud/container ํ™˜๊ฒฝ ๊ฐ์ง€ (Docker, K8s, ํด๋ผ์šฐ๋“œ VM, sudo ๊ฐ€์šฉ์„ฑ) |
| NVIDIA driver | ๋“œ๋ผ์ด๋ฒ„ ๋กœ๋“œ ์ƒํƒœ, `nvidia-smi` ๋™์ž‘ ํ™•์ธ |
| CUDA toolkit | `nvcc` ์กด์žฌ ๋ฐ ๋ฒ„์ „ (stub vs ์‹ค์ œ nvcc ๊ตฌ๋ถ„) |
| cuDNN | dpkg ์„ค์น˜ ์ƒํƒœ + **torch.backends.cudnn ๋Ÿฐํƒ€์ž„ ๊ฒ€์ฆ** |
| NCCL | dpkg ์„ค์น˜ ์ƒํƒœ + **torch.cuda.nccl ๋Ÿฐํƒ€์ž„ ๊ฒ€์ฆ** |
| GPU kernel tuning | sysctl ํŒŒ๋ผ๋ฏธํ„ฐ ์ ์šฉ ์—ฌ๋ถ€ (cloud์—์„œ๋Š” ์ž๋™ skip) |
| GPU persistence | gpu-persist-fix 6๊ฐœ ํ•ญ๋ชฉ ์ ์šฉ ์ƒํƒœ |
| CPU frequency | CPU ์ฃผํŒŒ์ˆ˜ ์Šค๋กœํ‹€๋ง ๊ฐ์ง€ |
| Memory | ๋ฉ”๋ชจ๋ฆฌ ์—ฌ์œ ์œจ + swap ์ƒํƒœ |
| Disk SMART health | ๋””์Šคํฌ SMART ๊ฑด๊ฐ• ์ƒํƒœ (smartmontools ํ•„์š”) |
| uv | uv ์„ค์น˜ ๋ฐ ๋ฒ„์ „ |
| Python | uv๋กœ ๊ด€๋ฆฌ๋˜๋Š” Python ์กด์žฌ |
| Virtual environment | venv ๋””๋ ‰ํ† ๋ฆฌ, bin/python, bin/activate ์กด์žฌ |
| Key packages | **26๊ฐœ** ํ•ต์‹ฌ ํŒจํ‚ค์ง€ import ๊ฒ€์ฆ (torch, transformers, anthropic, fastapi, pandas ๋“ฑ) |
| GPU packages | **7๊ฐœ** GPU ์ „์šฉ ํŒจํ‚ค์ง€ (vllm, deepspeed, bitsandbytes, pytorch_lightning ๋“ฑ) |
| GPU functional | **GPU ์—ฐ์‚ฐ ํ…Œ์ŠคํŠธ** โ€” matmul, cuDNN conv2d, NCCL ๊ฐ€์šฉ์„ฑ |
| Node.js | NVM + Node ์„ค์น˜ ์ƒํƒœ (์„ค์น˜ ์„ ํƒ ์‹œ) |
| Java | SDKMAN + Java ์„ค์น˜ ์ƒํƒœ (์„ค์น˜ ์„ ํƒ ์‹œ) |
| Shell integration | .bashrc/.zshrc์— ๋งˆ์ปค ๋ธ”๋ก ์กด์žฌ |
| Platform | Xcode CLT (macOS) |

์ถœ๋ ฅ ์˜ˆ์‹œ (Cloud/Container ํ™˜๊ฒฝ):

```
=== Machine Setting Doctor ===

[OK] Disk space (1497GB free)
[OK] Hardware profile
[OK] Cloud environment (container detected)
[OK] NVIDIA driver (535.129.03, NVIDIA H100 PCIe)
[WARN] CUDA Toolkit (stub nvcc 12.2 โ€” runtime only, no JIT compile)
[OK] cuDNN (system: 8.9.6, torch: enabled v91002)
[OK] NCCL (system: 2.19.3, torch: 2.27.5)
[SKIP] GPU kernel tuning (cloud/container โ€” managed by host)
[OK] GPU persistence (all 6 fixes in place)
[OK] CPU frequency (2899MHz / 3500MHz, 82%)
[OK] Memory (57GB free, 91%)
[SKIP] Disk SMART health (smartmontools not installed โ€” apt install smartmontools)
[OK] uv (uv 0.10.10)
[OK] Python (Python 3.12.13)
[OK] Virtual environment (~/ai-env, 379 packages)
[OK] Key packages (OK 26/26)
[OK] GPU packages (OK 7/7)
[OK] GPU functional (OK (NVIDIA H100 PCIe, cuDNN 91002, NCCL ok))
[SKIP] Node.js (not installed)
[SKIP] Java (not installed)
[OK] Shell integration (.bashrc)
[OK] Platform (Linux)

Summary: 12 ok, 0 failed, 0 warnings, 4 skipped
All checks passed!
```

### Auto-recover (์ž๋™ ๋ณต๊ตฌ)

```bash
# ๋ชจ๋“  ์‹คํŒจ ํ•ญ๋ชฉ ์ž๋™ ๋ณต๊ตฌ
make recover
# ๋˜๋Š”
./scripts/doctor.sh --recover

# ํŠน์ • ์ปดํฌ๋„ŒํŠธ๋งŒ ๋ณต๊ตฌ
./scripts/doctor.sh --recover nvidia
./scripts/doctor.sh --recover python
./scripts/doctor.sh --recover venv
./scripts/doctor.sh --recover shell
```

์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ณต๊ตฌ ๋Œ€์ƒ: `disk`, `hardware`, `nvidia`, `gpu_persistence`, `uv`, `python`, `venv`, `packages`, `node`, `java`, `shell`, `platform`

๊ฐ ์ปดํฌ๋„ŒํŠธ๋ณ„ ๋ณต๊ตฌ ๋™์ž‘:

| ์ปดํฌ๋„ŒํŠธ | ๋ณต๊ตฌ ๋™์ž‘ |
|----------|-----------|
| hardware | `detect-hardware.sh` ์žฌ์‹คํ–‰ |
| nvidia | `install-nvidia.sh` ์žฌ์‹คํ–‰ (๋“œ๋ผ์ด๋ฒ„, CUDA, cuDNN, NCCL) |
| gpu_persistence | `gpu-persist-fix.sh` ์‹คํ–‰ (6๊ฐœ ํ•ญ๋ชฉ ์ ์šฉ) |
| uv | uv ์žฌ์„ค์น˜ (`curl ... \| sh`) |
| python | uv๊ฐ€ ์—†์œผ๋ฉด ๋จผ์ € ์„ค์น˜, ๊ทธ ํ›„ `uv python install` |
| venv | venv ์žฌ์ƒ์„ฑ + ํŒจํ‚ค์ง€ ์žฌ์„ค์น˜ |
| packages | venv ์ „์ฒด ์žฌ์„ค์น˜ (= venv ๋ณต๊ตฌ) |
| node | NVM + Node.js ์žฌ์„ค์น˜ |
| java | SDKMAN + Java ์žฌ์„ค์น˜ |
| shell | `install-shell.sh` ์žฌ์‹คํ–‰ |
| platform | macOS: Xcode CLT ์•ˆ๋‚ด |
| disk | ์ˆ˜๋™ ์ •๋ฆฌ ์•ˆ๋‚ด |

### Package Verification (ํŒจํ‚ค์ง€ ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์ฆ)

```bash
make verify
# ๋˜๋Š”
./scripts/doctor.sh --verify-packages
```

requirements ํŒŒ์ผ์— ๋ช…์‹œ๋œ ํŒจํ‚ค์ง€๊ฐ€ ๋ชจ๋‘ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค:

```
=== Package Verification ===

Missing packages (required but not installed):
- some-package

Extra packages (installed but not in requirements): 43
(This is normal โ€” they may be transitive dependencies)

Result: 1 missing package(s)
Run './scripts/doctor.sh --recover venv' to install missing packages.
```

---

## Cross-Machine Sync

์—ฌ๋Ÿฌ ๋จธ์‹ ์—์„œ ๋™์ผํ•œ ํŒจํ‚ค์ง€ ๊ตฌ์„ฑ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•œ Git ๊ธฐ๋ฐ˜ ๋™๊ธฐํ™” ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.

### Push (ํ˜„์žฌ ๋จธ์‹  โ†’ ๋ฆฌ๋ชจํŠธ)

```bash
make push
```

๋™์ž‘:
1. ํ™œ์„ฑํ™”๋œ venv์—์„œ ํ˜„์žฌ ํŒจํ‚ค์ง€ ๋ชฉ๋ก์„ requirements ํŒŒ์ผ๋กœ export
2. ๋ณ€๊ฒฝ์‚ฌํ•ญ `git add -A`
3. ์ž๋™ ์ปค๋ฐ‹ ๋ฉ”์‹œ์ง€ ์ƒ์„ฑ: `update: sync from at `
4. `git pull --rebase` ํ›„ `git push`

### Pull (๋ฆฌ๋ชจํŠธ โ†’ ํ˜„์žฌ ๋จธ์‹ )

```bash
make update
```

๋™์ž‘:
1. `git pull --rebase`
2. requirements ํŒŒ์ผ ๋ณ€๊ฒฝ ์—ฌ๋ถ€ ๊ฐ์ง€
3. ๋ณ€๊ฒฝ๋˜์—ˆ์œผ๋ฉด `scripts/setup-venv.sh` ์‹คํ–‰ ์•ˆ๋‚ด ์ถœ๋ ฅ

### Status

```bash
make status
```

๋กœ์ปฌ ๋ณ€๊ฒฝ์‚ฌํ•ญ, ๋ฆฌ๋ชจํŠธ ๋Œ€๋น„ ahead/behind ์ปค๋ฐ‹ ์ˆ˜, ๋งˆ์ง€๋ง‰ ์ปค๋ฐ‹ ์ •๋ณด๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

### Export

```bash
make export
```

ํ˜„์žฌ venv์˜ ํŒจํ‚ค์ง€๋ฅผ ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ requirements ํŒŒ์ผ๋กœ ๋ถ„๋ฅ˜/exportํ•ฉ๋‹ˆ๋‹ค:
- GPU ํŒจํ‚ค์ง€ โ†’ `requirements-gpu.txt`
- Data ํŒจํ‚ค์ง€ โ†’ `requirements-data.txt`
- Web ํŒจํ‚ค์ง€ โ†’ `requirements-web.txt`
- ๋‚˜๋จธ์ง€ โ†’ `requirements-core.txt`
- CPU/MPS ํŒŒ์ผ์€ ์ˆ˜๋™ ๊ด€๋ฆฌ

---

## Disk Health & Monitoring

NAS/์„œ๋ฒ„ ๋””์Šคํฌ ๊ฑด๊ฐ• ์ƒํƒœ๋ฅผ ์ ๊ฒ€ํ•˜๋Š” ์œ ํ‹ธ๋ฆฌํ‹ฐ ์Šคํฌ๋ฆฝํŠธ ๋ชจ์Œ์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  ์Šคํฌ๋ฆฝํŠธ๋Š” **์ฝ๊ธฐ ์ „์šฉ** (๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ ์—†์Œ)์ด๋ฉฐ, `smartmontools`์™€ `e2fsprogs`๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

```bash
# SMART ์ƒ์„ธ ์ˆ˜์ง‘ (์ „ ๋””์Šคํฌ)
sudo ./scripts/disk-check-smart.sh [์ถœ๋ ฅ๋””๋ ‰ํ† ๋ฆฌ]

# SMART Extended Self-Test ์‹œ์ž‘ (๋ณ‘๋ ฌ, ์ˆ˜ ์‹œ๊ฐ„ ์†Œ์š”)
sudo ./scripts/disk-check-smart-long.sh

# ๋ฐฐ๋“œ์„นํ„ฐ ๊ฒ€์‚ฌ (๋ณ‘๋ ฌ read-only, ์ˆ˜ ์‹œ๊ฐ„~์ˆ˜์‹ญ ์‹œ๊ฐ„)
sudo ./scripts/disk-check-badblocks.sh [์ถœ๋ ฅ๋””๋ ‰ํ† ๋ฆฌ]

# ๋ฐฐ๋“œ์„นํ„ฐ ๊ฒ€์‚ฌ ์ง„ํ–‰๋ฅ  ๋ชจ๋‹ˆํ„ฐ๋ง
./scripts/disk-check-progress.sh [์ถœ๋ ฅ๋””๋ ‰ํ† ๋ฆฌ]
watch -n 60 ./scripts/disk-check-progress.sh # 1๋ถ„๋งˆ๋‹ค ์ž๋™ ๊ฐฑ์‹ 

# .badblocks ํŒŒ์ผ์„ 512๋ฐ”์ดํŠธ ์„นํ„ฐ ๊ตฌ๊ฐ„์œผ๋กœ ๋ณ€ํ™˜ (ํŒŒํ‹ฐ์…˜ ์„ค๊ณ„์šฉ)
./scripts/disk-badblocks-to-sectors.sh [์„นํ„ฐ์—ฌ์œ ]
```

| ์Šคํฌ๋ฆฝํŠธ | ์šฉ๋„ | sudo |
|----------|------|------|
| `disk-check-smart.sh` | SMART ์ƒ์„ธ ์ˆ˜์ง‘ + ์š”์•ฝ (Health, Reallocated, Pending) | Yes |
| `disk-check-smart-long.sh` | SMART Extended Self-Test ๋ณ‘๋ ฌ ์‹คํ–‰ | Yes |
| `disk-check-badblocks.sh` | ๋ณ‘๋ ฌ read-only ๋ฐฐ๋“œ์„นํ„ฐ ๊ฒ€์‚ฌ | Yes |
| `disk-check-progress.sh` | ๋ฐฐ๋“œ์„นํ„ฐ ๊ฒ€์‚ฌ ์ง„ํ–‰๋ฅ  ํŒŒ์‹ฑ/ํ‘œ์‹œ | No |
| `disk-badblocks-to-sectors.sh` | badblocks ๊ฒฐ๊ณผ๋ฅผ ์„นํ„ฐ ๊ตฌ๊ฐ„์œผ๋กœ ๋ณ€ํ™˜ | No |

---

## Shell Integration Details

### Lazy Loading

NVM๊ณผ SDKMAN์€ **lazy loading** ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„๋˜์–ด ์…ธ ์‹œ์ž‘ ์‹œ๊ฐ„์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค:

```bash
# 30-nvm.sh: node/npm ์ตœ์ดˆ ์‹คํ–‰ ์‹œ์—๋งŒ NVM ๋กœ๋“œ
for cmd in nvm node npm npx; do
eval "${cmd}() { unset -f nvm node npm npx; _load_nvm; ${cmd} \"\$@\"; }"
done
```

์‹ค์ œ `node --version`์„ ์ฒ˜์Œ ์‹คํ–‰ํ•˜๋ฉด ๊ทธ๋•Œ NVM์ด ๋กœ๋“œ๋˜๊ณ , ์ดํ›„์—๋Š” ์ง์ ‘ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

### Background Update Check

`aienv` ์‹คํ–‰ ์‹œ ๋ฐฑ๊ทธ๋ผ์šด๋“œ์—์„œ ์—…๋ฐ์ดํŠธ ์ฒดํฌ๊ฐ€ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค:

1. `~/.last-update-check` ํƒ€์ž„์Šคํƒฌํ”„ ํ™•์ธ
2. 24์‹œ๊ฐ„ ์ด๋‚ด๋ฉด skip
3. `git fetch origin main --quiet` (๋ฐฑ๊ทธ๋ผ์šด๋“œ)
4. ๋กœ์ปฌ โ‰  ๋ฆฌ๋ชจํŠธ์ด๋ฉด ์—…๋ฐ์ดํŠธ ์•Œ๋ฆผ ์ถœ๋ ฅ

### Secrets

`~/.bashrc.local`(๋˜๋Š” `~/.zshrc.local`)์— API ํ‚ค ๋“ฑ ์‹œํฌ๋ฆฟ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค:

```bash
# ~/.bashrc.local (example)
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export WANDB_API_KEY="..."
```

์ด ํŒŒ์ผ์€ ์…ธ ์‹œ์ž‘ ์‹œ ์ž๋™์œผ๋กœ source๋˜๋ฉฐ, **์ ˆ๋Œ€ Git์— ์ปค๋ฐ‹๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค**.

---

## Directory Structure

```
machine_setting/
โ”œโ”€โ”€ setup.sh # Single-entry bootstrap (7-stage pipeline)
โ”œโ”€โ”€ Makefile # make setup/update/push/status/doctor/uninstall
โ”œโ”€โ”€ config/
โ”‚ โ”œโ”€โ”€ default.conf # Default settings (Python 3.12, Node LTS, Java 21)
โ”‚ โ”œโ”€โ”€ machine.conf.example # Machine-specific override template
โ”‚ โ””โ”€โ”€ gpu-index-urls.conf # PyTorch CUDA index URL mapping
โ”œโ”€โ”€ packages/
โ”‚ โ”œโ”€โ”€ requirements-core.txt # Platform-independent AI/ML
โ”‚ โ”œโ”€โ”€ requirements-gpu.txt # NVIDIA CUDA packages
โ”‚ โ”œโ”€โ”€ requirements-mps.txt # Apple Silicon MPS packages
โ”‚ โ”œโ”€โ”€ requirements-cpu.txt # CPU-only fallback
โ”‚ โ”œโ”€โ”€ requirements-data.txt # Data/DB packages
โ”‚ โ””โ”€โ”€ requirements-web.txt # Web/API packages
โ”œโ”€โ”€ scripts/
โ”‚ โ”œโ”€โ”€ detect-hardware.sh # GPU/CUDA/MPS/RAM/CPU detection
โ”‚ โ”œโ”€โ”€ install-nvidia.sh # NVIDIA driver/CUDA/cuDNN/NCCL/enterprise tools
โ”‚ โ”œโ”€โ”€ install-python.sh # uv + Python install
โ”‚ โ”œโ”€โ”€ setup-venv.sh # venv creation + package install
โ”‚ โ”œโ”€โ”€ install-node.sh # NVM + Node.js
โ”‚ โ”œโ”€โ”€ install-java.sh # SDKMAN + Java
โ”‚ โ”œโ”€โ”€ lib-checkpoint.sh # Checkpoint/rollback library (7-stage)
โ”‚ โ”œโ”€โ”€ dry-run.sh # ์ „์ฒด ์‹œ์Šคํ…œ dry-run ์ง„๋‹จ (7๋‹จ๊ณ„)
โ”‚ โ”œโ”€โ”€ preflight.sh # Pre-flight system check (NVIDIA ํฌํ•จ)
โ”‚ โ”œโ”€โ”€ doctor.sh # Health check & recovery (NVIDIA ์ฒดํฌ ํฌํ•จ)
โ”‚ โ”œโ”€โ”€ uninstall.sh # Component uninstaller (NVIDIA ํฌํ•จ)
โ”‚ โ”œโ”€โ”€ sync.sh # Git sync (push/pull/status)
โ”‚ โ”œโ”€โ”€ export-packages.sh # venv โ†’ requirements export
โ”‚ โ”œโ”€โ”€ check-env.sh # AI environment verification
โ”‚ โ”œโ”€โ”€ check-secrets.sh # Secret leak scanner
โ”‚ โ”œโ”€โ”€ gpu-doctor.sh # GPU ์ „์šฉ ๊ฑด๊ฐ• ์ง„๋‹จ (6๊ฐœ ์„น์…˜ + summary ๋ชจ๋“œ)
โ”‚ โ”œโ”€โ”€ gpu-persist-fix.sh # GPU ์•ˆ์ •์„ฑ ์˜๊ตฌ ์ˆ˜์ • (GRUB, udev, modprobe, persistenced, watchdog, PCIe)
โ”‚ โ”œโ”€โ”€ disk-check-smart.sh # SMART ์ƒ์„ธ ์ˆ˜์ง‘
โ”‚ โ”œโ”€โ”€ disk-check-smart-long.sh # SMART Extended Self-Test
โ”‚ โ”œโ”€โ”€ disk-check-badblocks.sh # ๋ณ‘๋ ฌ ๋ฐฐ๋“œ์„นํ„ฐ ๊ฒ€์‚ฌ
โ”‚ โ”œโ”€โ”€ disk-check-progress.sh # ๋ฐฐ๋“œ์„นํ„ฐ ๊ฒ€์‚ฌ ์ง„ํ–‰๋ฅ  ๋ชจ๋‹ˆํ„ฐ
โ”‚ โ””โ”€โ”€ disk-badblocks-to-sectors.sh # badblocksโ†’์„นํ„ฐ ๊ตฌ๊ฐ„ ๋ณ€ํ™˜
โ”œโ”€โ”€ shell/
โ”‚ โ”œโ”€โ”€ install-shell.sh # Shell RC installer
โ”‚ โ””โ”€โ”€ bashrc.d/ # Modular shell config (bash + zsh)
โ”‚ โ”œโ”€โ”€ 00-path.sh # PATH (CUDA, Homebrew, uv)
โ”‚ โ”œโ”€โ”€ 10-aliases.sh # Common aliases
โ”‚ โ”œโ”€โ”€ 20-env.sh # Environment variables
โ”‚ โ”œโ”€โ”€ 30-nvm.sh # NVM lazy loader
โ”‚ โ”œโ”€โ”€ 40-sdkman.sh # SDKMAN lazy loader
โ”‚ โ”œโ”€โ”€ 50-ai-env.sh # aienv/aienv-off + update check
โ”‚ โ””โ”€โ”€ 90-local.sh.example # Secrets template
โ”œโ”€โ”€ profiles/ # Pre-configured machine profiles
โ”‚ โ”œโ”€โ”€ gpu-enterprise.conf # A100/H100/B200 + enterprise tools (DCGM, FM)
โ”‚ โ”œโ”€โ”€ gpu-workstation.conf
โ”‚ โ”œโ”€โ”€ cloud-server.conf
โ”‚ โ”œโ”€โ”€ mac-apple-silicon.conf
โ”‚ โ”œโ”€โ”€ ngc-container.conf
โ”‚ โ”œโ”€โ”€ cpu-server.conf
โ”‚ โ”œโ”€โ”€ laptop.conf
โ”‚ โ””โ”€โ”€ minimal.conf
โ””โ”€โ”€ docs/ # System documentation
โ””โ”€โ”€ package-candidates-analysis.md # ํŒจํ‚ค์ง€ ํ›„๋ณด ๋ถ„์„ (์ถ”๊ฐ€ ์ถ”์ฒœ/์„ ํƒ/๋น„์ถ”์ฒœ)
```

---

## State & Configuration Files

### ๋Ÿฐํƒ€์ž„ ์ƒํƒœ ํŒŒ์ผ (Git ์™ธ๋ถ€)

| ํŒŒ์ผ | ์œ„์น˜ | ์šฉ๋„ |
|------|------|------|
| `install.state` | `~/.machine_setting/` | 7๋‹จ๊ณ„ ์„ค์น˜ ์ง„ํ–‰ ์ƒํƒœ (STAGE_1~7) |
| `backups/` | `~/.machine_setting/backups/` | .bashrc/.zshrc ์ž๋™ ๋ฐฑ์—… (์…ธ ํ†ตํ•ฉ ์„ค์น˜/์—…๋ฐ์ดํŠธ ์‹œ ํƒ€์ž„์Šคํƒฌํ”„๋ณ„ ์ƒ์„ฑ) |
| `.machine_setting_profile` | `~/` | ํ•˜๋“œ์›จ์–ด ๊ฐ์ง€ ๊ฒฐ๊ณผ |
| `.last-update-check` | ์ €์žฅ์†Œ ๋‚ด | ๋งˆ์ง€๋ง‰ ์—…๋ฐ์ดํŠธ ์ฒดํฌ ํƒ€์ž„์Šคํƒฌํ”„ |
| `.preflight_plan` | `env/` | Pre-flight ๊ณ„ํš (์ž„์‹œ, ์„ค์น˜ ํ›„ ์‚ญ์ œ) |

### ์„ค์ • ํŒŒ์ผ

| ํŒŒ์ผ | ์œ„์น˜ | ์šฉ๋„ | Git ํฌํ•จ |
|------|------|------|----------|
| `default.conf` | `config/` | ๊ธฐ๋ณธ ์„ค์ • | Yes |
| `machine.conf` | `config/` | ๋จธ์‹ ๋ณ„ ์˜ค๋ฒ„๋ผ์ด๋“œ | No (.gitignore) |
| `gpu-index-urls.conf` | `config/` | CUDAโ†’PyTorch URL ๋งคํ•‘ | Yes |
| `*.conf` | `profiles/` | ํ”„๋ฆฌ์…‹ ํ”„๋กœํ•„ | Yes |
| `.bashrc.local` | `~/` | ์‚ฌ์šฉ์ž ์‹œํฌ๋ฆฟ | No |

---

## Troubleshooting

ํ™˜๊ฒฝ ๊ตฌ์„ฑ ์ค‘ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด [docs/troubleshooting.md](docs/troubleshooting.md)๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

### ๋น ๋ฅธ ์ง„๋‹จ

```bash
# ์ „์ฒด ๊ฑด๊ฐ• ์ฒดํฌ
make doctor

# ํŒจํ‚ค์ง€ ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์ฆ
make verify

# ์‹œ์Šคํ…œ ์ƒํƒœ ํ™•์ธ (์„ค์น˜ ์•ˆํ•จ)
make plan

# ํ™˜๊ฒฝ ์ƒ์„ธ ํ™•์ธ (GPU, ํŒจํ‚ค์ง€ ๋ฒ„์ „)
make check
```

### ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ

| ์ฆ์ƒ | ํ•ด๊ฒฐ |
|------|------|
| `aienv: command not found` | `source ~/.bashrc` ๋˜๋Š” ์ƒˆ ํ„ฐ๋ฏธ๋„ ์—ด๊ธฐ |
| `No venv at ~/ai-env` | `make venv` ๋˜๋Š” `./setup.sh --from 4` |
| GPU๊ฐ€ ๊ฐ์ง€๋˜์ง€ ์•Š์Œ | `make detect` ํ›„ `make doctor` |
| ํŒจํ‚ค์ง€ import ์‹คํŒจ | `make verify` โ†’ `make recover` |
| ์„ค์น˜ ์ค‘๊ฐ„์— ์‹คํŒจ | `./setup.sh --resume` |
| ์…ธ ์„ค์ •์ด ๊นจ์ง | `./scripts/doctor.sh --recover shell` (๋ฐฑ์—…์—์„œ ๋ณต์›) |
| GPU ๋ฒ„์Šค ์ดํƒˆ (Xid 79) | `sudo ./scripts/gpu-persist-fix.sh` ์‹คํ–‰ ํ›„ ๋ฆฌ๋ถ€ํŠธ |
| GPU ์ƒํƒœ ๋ถˆ์•ˆ์ • | `./scripts/gpu-doctor.sh` ๋กœ ์ง„๋‹จ |
| `make venv`๊ฐ€ core ๋‹จ๊ณ„ ์งํ›„ ์ฆ‰์‹œ ์ข…๋ฃŒ | ์˜› 6-stage state ํŒŒ์ผ ์ž”์žฌ. `mv ~/.machine_setting/install.state{,.bak}` ํ›„ ์žฌ์‹คํ–‰ ([7-A](docs/troubleshooting.md#7-a-์ฒดํฌํฌ์ธํŠธ-state-ํŒŒ์ผ-์Šคํ‚ค๋งˆ-mismatch--make-venv-์ฆ‰์‹œ-์ข…๋ฃŒ-stage-rename-์ž”์žฌ)) |
| cx-Oracle `pkg_resources` ๋นŒ๋“œ ์‹คํŒจ | setuptools 70+ ๋ถ„๋ฆฌ ์ด์Šˆ. ์ž๋™ fix๋จ โ€” ์ˆ˜๋™์‹œ `uv pip install "setuptools<70" wheel` ํ›„ `--no-build-isolation` ([7-B](docs/troubleshooting.md#7-b-cx-oracle-๋นŒ๋“œ-์‹คํŒจ--modulenotfounderror-no-module-named-pkg_resources)) |
| ๊ฒ€์ฆ ๋‹จ๊ณ„๊ฐ€ `torch: not installed` ์˜คํƒ | bash quote escape ๋ฒ„๊ทธ. ์ž๋™ fix๋จ โ€” ์‹ค์ œ ์„ค์น˜ ์—ฌ๋ถ€๋Š” `~/ai-env/bin/python -c 'import torch'`๋กœ ํ™•์ธ ([7-C](docs/troubleshooting.md#7-c-setup-venvsh-๊ฒ€์ฆ-๋‹จ๊ณ„๊ฐ€-torch-not-installed-false-negative)) |

---

## Security

- Secrets go in `~/.bashrc.local` or `~/.zshrc.local` (never committed)
- Pre-commit hook blocks AWS keys, GitHub PATs, API keys
- Repository is **PRIVATE**
- Run `make secrets` to scan for leaked credentials