https://github.com/victorgu-github/eks-gpu-sharing-scaling-tf
https://github.com/victorgu-github/eks-gpu-sharing-scaling-tf
Last synced: 28 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/victorgu-github/eks-gpu-sharing-scaling-tf
- Owner: victorgu-github
- Created: 2023-04-21T01:45:22.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-04-25T02:56:51.000Z (almost 3 years ago)
- Last Synced: 2025-06-21T08:04:56.735Z (10 months ago)
- Language: HCL
- Size: 48.8 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Deploy Generative AI models to Amazon EKS cost efficiently with GPU nodes auto scaling and sharing
## Requirements
| Name | Version |
|------|---------|
| [terraform](#requirement\_terraform) | >= 1.0.0 |
| [aws](#requirement\_aws) | >= 3.67.0 |
| [helm](#requirement\_helm) | >= 2.4 |
| [kubectl](#requirement\_kubectl) | >= 1.14 |
| [random](#requirement\_random) | >= 2.1.2 |
## Providers
| Name | Version |
|------|---------|
| [aws](#provider\_aws) | >= 3.67.0 |
| [helm](#provider\_helm) | >= 2.4 |
| [kubectl](#provider\_kubectl) | >= 1.14 |
## Modules
| Name | Source | Version |
|------|--------|---------|
| [ebs\_csi\_driver\_irsa](#module\_ebs\_csi\_driver\_irsa) | terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks | ~> 5.14 |
| [eks](#module\_eks) | terraform-aws-modules/eks/aws | 18.26.6 |
| [eks\_blueprints\_kubernetes\_addons](#module\_eks\_blueprints\_kubernetes\_addons) | github.com/aws-ia/terraform-aws-eks-blueprints-addons | 3e64d809ac9dbc89aee872fe0f366f0b757d3137 |
| [karpenter\_irsa](#module\_karpenter\_irsa) | terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks | ~> 4.21.1 |
| [vpc](#module\_vpc) | terraform-aws-modules/vpc/aws | ~> 3.0 |
| [vpc\_cni\_irsa](#module\_vpc\_cni\_irsa) | terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks | ~> 5.14 |
| [vpc\_endpoint\_security\_group](#module\_vpc\_endpoint\_security\_group) | terraform-aws-modules/security-group/aws | ~> 4.0 |
| [vpc\_endpoints](#module\_vpc\_endpoints) | terraform-aws-modules/vpc/aws//modules/vpc-endpoints | ~> 3.0 |
## Resources
| Name | Type |
|------|------|
| [aws_iam_instance_profile.karpenter](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_instance_profile) | resource |
| [aws_launch_template.gpu](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template) | resource |
| [aws_prometheus_workspace.amp](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_workspace) | resource |
| [helm_release.karpenter](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
| [kubectl_manifest.karpenter_provisioner_gpu](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
| [aws_ami.eks_node](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ami) | data source |
| [aws_ami.eks_node_gpu](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ami) | data source |
| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones) | data source |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| [cidr](#input\_cidr) | The CIDR block for the VPC. Default value is a valid CIDR, but not acceptable by AWS and should be overriden | `string` | `"10.1.0.0/16"` | no |
| [cluster\_name](#input\_cluster\_name) | Project Name of the AWS Resources | `string` | `"gpu-share-dev-eks-xyhdb"` | no |
| [eks\_version](#input\_eks\_version) | EKS version | `string` | `"1.25"` | no |
| [private\_subnets](#input\_private\_subnets) | A list of private subnets inside the VPC | `list(string)` |
[
"10.1.0.0/18",
"10.1.64.0/18",
"10.1.128.0/18"
]
| no |
| [public\_subnets](#input\_public\_subnets) | A list of public subnets inside the VPC | `list(string)` | [
"10.1.192.0/20",
"10.1.208.0/20",
"10.1.224.0/20"
]
| no |
| [region](#input\_region) | Region of the AWS resources | `string` | `"us-east-2"` | no |
| [tags](#input\_tags) | Tags for AWS Resource | `map(string)` | {
"Environment": "dev",
"Terraform": "true"
} | no |
| [vpc\_name](#input\_vpc\_name) | Name to be used on all the resources as identifier | `string` | `"gpu-share-dev"` | no |
## Outputs
| Name | Description |
|------|-------------|
| [configure\_kubectl](#output\_configure\_kubectl) | Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig |
| [eks\_api\_server\_url](#output\_eks\_api\_server\_url) | Your eks API server endpoint |
| [vpc\_id](#output\_vpc\_id) | ID of the VPC |