{"id":14384360,"url":"https://github.com/Redactics/http-nas","last_synced_at":"2025-08-23T17:31:46.908Z","repository":{"id":45584276,"uuid":"429327760","full_name":"Redactics/http-nas","owner":"Redactics","description":"File streaming service designed for Kubernetes to provide ReadWriteMany storage support","archived":false,"fork":false,"pushed_at":"2023-07-17T02:04:29.000Z","size":173,"stargazers_count":14,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-08-29T18:34:12.176Z","etag":null,"topics":["airflow","aws","azure","google-cloud","kubernetes"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Redactics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-11-18T06:57:38.000Z","updated_at":"2024-08-04T02:51:46.000Z","dependencies_parsed_at":"2023-01-21T19:05:15.785Z","dependency_job_id":null,"html_url":"https://github.com/Redactics/http-nas","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Redactics%2Fhttp-nas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Redactics%2Fhttp-nas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Redactics%2Fhttp-nas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Redactics%2Fhttp-nas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Redactics","download_url":"https://codeload.github.com/Redactics/http-nas/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230716481,"owners_count":18269769,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","aws","azure","google-cloud","kubernetes"],"created_at":"2024-08-28T18:01:19.912Z","updated_at":"2024-12-21T12:30:29.913Z","avatar_url":"https://github.com/Redactics.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"\u003cp\u003e\n\u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg alt=\"MIT License\" src=\"https://img.shields.io/badge/License-MIT-yellow.svg\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n### About\n\nThis is very lightweight Node.js http-based file streaming service that functions as NAS (network attached storage) with Kubernetes and Airflow usage in mind. We make heavy use of this at [Redactics](https://www.redactics.com). You can learn more about this from [this blog post](https://medium.com/p/f68d88f548fe).\n\n### Why Does the World Need This?\n\nJob pipelines that run in Kubernetes (for example, Airflow `KubernetesPodOperator` DAG steps) that require persistent storage face some challenges. Here are some common tactics and their associated issues:\n\n* `Mounting Kubernetes persistent volumes/persistent volume claims in the container`: this option does not really support concurrency, and we've seen timeouts/timing issues with doing so in AWS. AWS recommends installing the [EBS CSI driver](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html) to mitigate these issues, but this requires a bunch of extra steps and figuring out how to maintain this in Terraform/IaC. This approach also doesn't work with AWS Fargate, using the AWS EFS CSI driver is recommended for this, which presents the same problems as well as dependencies on NFS.\n* `Running an NFS server in Kubernetes`: this can work, but it can sort of create a black box with poor visibility, and solutions that create new NFS storage classes will not run in serverless Kubernetes solutions such as AWS Fargate or GKE Autopilot.\n* `Using an AWS S3 bucket for storage`: works great, but having to download and upload files to the cloud has bandwidth and performance costs.\n* `Skipping persistent storage altogether`: the storage attached to a pod is stateless (i.e. will not persist) and controlling storage capacity can only be done at the node level.\n\n### How Does it Work?\n\nYou can install this service via the helmchart included in this repo, or via:\n\n```\nhelm repo add redactics https://redactics.github.io/http-nas\nhelm install http-nas redactics/http-nas --set \"pvc.size=10Gi\" --set \"storagePath=/mnt/storage\"\n```\n\nThis deploys the file streaming to your Kubernetes cluster with a PersistentVolumeClaim for storing files within this service of 10GB (you can of course enlarge this to whatever size you want), and the path inside the container used for storage `/mnt/storage`, which can also be anything so long as the parent directory exists (`/mnt` and `/tmp` are good choices for storagePath directories). Then, from inside your cluster you can run any of the following commands to interact with this service:\n\n* Stream (via http/REST post URL) file to service: `cat /path/to/cat.jpg | curl -X POST -H \"Transfer-Encoding: chunked\" -s -f -T - http://http-nas:3000/file/cat.jpg`. Note that this file can be streamed via `cat` plus a Linux pipe (i.e. `|`) rather than read in its entirety and loaded into memory - this entire service is http stream based. You don't have to use cURL to post this file, you can use any library or tool that can http post.\n* Append (via http/REST put URL) to file: `printf \"test append\" | curl -X PUT -H \"Transfer-Encoding: chunked\" -s -f -T - http://http-nas:3000/file/mydata.csv`.\n* Stream (via http/REST get URL) file from service: `curl http://http-nas:3000/file/cat.jpg`. This output can be piped elsewhere as well (e.g. `curl http://http-nas:3000/file/cat.jpg | aws s3 cp - s3://yourbucket/cat.jpg`)\n* Stream (via http/REST get URL) line count of file from service - i.e. the equivalent of a Linux `wc -l`: `curl http://http-nas:3000/file/mydata.csv/wc` (returns a string output)\n* Stream a file to service with a directory structure: `cat /path/to/cat.jpg | curl -X POST -H \"Transfer-Encoding: chunked\" -s -f -T - http://http-nas:3000/file/mydirectory%2Fcat.jpg`. `%2F` is the URL encoding of a path (i.e. `/`). Since this service is http based if we don't encode this path it will modify the URL and routing to the underlying service. If `mydirectory` doesn't exist in the underlying storage volume it will be automatically created.\n* Stream file inside directory from service: `curl http://http-nas:3000/file/mydirectory%2Fcat.jpg`\n* Move (mv) file (via http/REST put URL): `curl -X PUT http://http-nas:3000/file/cat.jpg --header 'Content-Type: application/json' --data-raw '{ \"path\": \"newcat.jpg\" }'`. This renames/moves `cat.jpg` -\u003e `newcat.jpg` sending the new filename in a JSON formatted payload.\n* Delete file (via http http/REST delete URL): `curl -X DELETE http://http-nas:3000/file/cat.jpg`\n\n### Usage in Serverless Kubernetes\n\nThis service works as is in GKE Autopilot. In AWS EKS Fargate, this helm chart will not deploy due to its dependency on a persistent volume. We recommend creating a new Fargate profile to deploy this into a dedicated Kubernetes namespace that uses a physical/managed node. This node can be the smallest possible instance type, this service is very light on resource usage. Then, assuming your namespace is entitled `nas`, you can make cross-namespace http requests by replacing `http://http-nas:3000` with `http://http-nas.nas.svc.cluster.local:3000` where `http-nas` is the Kubernetes service name, and `nas` is the namespace.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRedactics%2Fhttp-nas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRedactics%2Fhttp-nas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRedactics%2Fhttp-nas/lists"}