{"id":29440008,"url":"https://github.com/uttam-li/dfs","last_synced_at":"2026-05-16T21:01:41.053Z","repository":{"id":303404867,"uuid":"1015361608","full_name":"uttam-li/dfs","owner":"uttam-li","description":"An implementation of Google File System (GFS) in Go, featuring centralized metadata management, chunk-based storage, and fault-tolerant replication.","archived":false,"fork":false,"pushed_at":"2025-07-07T12:07:06.000Z","size":639,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-07-07T13:25:45.978Z","etag":null,"topics":["distributed-file-system","fuse","golang","grpc"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uttam-li.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-07T11:42:02.000Z","updated_at":"2025-07-07T12:08:46.000Z","dependencies_parsed_at":"2025-07-07T13:26:03.410Z","dependency_job_id":"fdec9b37-33d1-40aa-912b-7c98b7412769","html_url":"https://github.com/uttam-li/dfs","commit_stats":null,"previous_names":["uttam-li/dfs"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/uttam-li/dfs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uttam-li%2Fdfs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uttam-li%2Fdfs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uttam-li%2Fdfs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uttam-li%2Fdfs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uttam-li","download_url":"https://codeload.github.com/uttam-li/dfs/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uttam-li%2Fdfs/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265122420,"owners_count":23714547,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-file-system","fuse","golang","grpc"],"created_at":"2025-07-13T10:01:51.834Z","updated_at":"2026-05-16T21:01:36.014Z","avatar_url":"https://github.com/uttam-li.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Distributed File System (GFS Implementation)\n\nAn implementation of Google File System (GFS) in Go, featuring centralized metadata management, chunk-based storage, and fault-tolerant replication.\n\n## 🏗️ Architecture Overview\n\nThis implementation follows the original GFS paper design with three main components:\n\n![Architecture Diagram](arch.png)\n\n### Master Server (Single Node)\n\n- **Centralized Metadata Management**: Maintains file system namespace and chunk metadata\n- **Chunk Allocation**: Assigns unique chunk handles and manages replica placement\n- **Load Balancing**: Distributes chunks across chunkservers for performance and fault tolerance\n- **Persistence**: Operation logs with periodic checkpointing for crash recovery\n- **Location**: [`pkg/master/`](pkg/master/)\n\n### ChunkServer\n\n- **Chunk Storage**: Manages 64MB fixed-size chunks on local disk\n- **Replica Coordination**: Handles primary replica duties for assigned chunks\n- **Heartbeat Protocol**: Reports chunk status and health to master\n- **Location**: [`pkg/chunkserver/`](pkg/chunkserver/)\n\n### Client\n\n- **FUSE Interface**: Provides POSIX-like file system operations (implementation choice)\n- **Metadata Caching**: Caches chunk locations to reduce master load\n- **Direct Data Path**: Communicates directly with chunkservers, bypassing master for data operations\n- **Location**: [`pkg/client/`](pkg/client/)\n\n## ✅ Features Status\n\n### **Implemented \u0026 Working**\n\n- ✅ **File System Operations**: Create, read, write, delete files and directories\n- ✅ **Chunk Management**: 64MB chunks with unique handles and distribution\n- ✅ **Replication**: Configurable chunk replication across multiple servers\n- ✅ **Fault Tolerance**: Heartbeat monitoring and automatic re-replication\n- ✅ **Persistence**: Operation logging with crash recovery\n- ✅ **FUSE Interface**: POSIX-like file system mounting and operations\n- ✅ **gRPC Communication**: Master-ChunkServer and Client-Master protocols\n- ✅ **Metadata Management**: Hierarchical namespace with file attributes\n\n### **Partially Implemented**\n\n- 🟡 **Consistency Control**: Basic version tracking and write coordination\n- 🟡 **Performance Optimization**: Limited caching and write buffering\n- 🟡 **Advanced Replication**: Primary-secondary coordination needs refinement\n\n### **Not Implemented**\n\n- ❌ **Security**: Authentication, authorization, and access control\n- ❌ **Production Features**: Shadow master, monitoring\n- ❌ **Advanced Fault Tolerance**: Basic implementation, no automatic recovery\n- ❌ **Enterprise Features**: Snapshots, quotas, cross-datacenter replication\n\n## 📊 Performance Benchmarks\n\nPerformance characteristics on a typical development setup:\n\n### **File Operations**\n\n| Operation Type              | Throughput    |\n|-----------------------------|---------------|\n| Small File Write (1KB×1000) | 0.10 MB/s     |\n| Small File Read (1KB×1000)  | 1.34 MB/s     |\n| Large File Write (10MB×10)  | 65.52 MB/s    |\n| Large File Read (10MB×10)   | 216.40 MB/s   |\n\n### **Sequential Access**\n\n| Pattern                     | Throughput    |\n|-----------------------------|---------------|\n| Sequential Write (100MB)    | 44.89 MB/s    |\n| Sequential Read (100MB)     | 218.48 MB/s   |\n\n### **Random Access**\n\n| Pattern                     | Operations/sec |\n|-----------------------------|----------------|\n| Random Read (50MB dataset)  | 3,899 ops/sec  |\n\n**Note**: Results may vary based on hardware and configuration.\n\n## ⚠️ Important Disclaimer\n\n**This project is built for learning purposes and is not production-ready.**\n\nWhile I've strived to implement core GFS concepts as faithfully as possible, some aspects are simplified or incomplete. The original GFS paper describes an enormously complex system - implementing every detail would be a multi-year endeavor. This implementation captures the essential distributed file system principles while remaining a manageable educational project.\n\nIf you notice missing features or rough edges, that's intentional scope limitation rather than oversight.\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Go 1.21+\n- Protocol Buffers compiler (`protoc`)\n- FUSE library (for client mounting)\n\n### Operating System Compatibility\n\n- **Linux**: Tested and working.\n- **Windows**: Might work with **WSL 2**.\n- **macOS**: Not tested. [macFUSE](https://osxfuse.github.io/) could potentially be used.\n\n### Installation\n\n```bash\n# Install dependencies\nsudo apt-get install fuse3 libfuse3-dev protobuf-compiler\ngo install google.golang.org/protobuf/cmd/protoc-gen-go@latest\ngo install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest\n```\n\n### Build\n\n```bash\n# Generate protobuf code\nmake proto\n\n# Build all components\nmake build\n```\n\n### Configuration (Optional)\n\nCopy and modify the example environment file:\n\n```bash\ncp .env.example .env\n# Edit .env with your configuration\n```\n\n**Note**: If `.env` file is not found, default values will be used.\n\n### Run the System\n\n#### Option 1: Start All Services with One Command (Recommended)\n\nFor quick testing and development, start the complete DFS system with a single command:\n\n```bash\nmake all\n```\n\nThis will automatically start:\n\n- 1 Master server (port 8000)\n- 5 ChunkServers (ports 9081-9085)\n- 1 FUSE Client (mounted at `./mnt`)\n\nThe script provides real-time monitoring and graceful shutdown with `Ctrl+C`.\n\n#### Option 2: Start Services Individually\n\n1. **Start Master Server**:\n\n    ```bash\n    make master\n    ```\n\n2. **Start ChunkServers**:\n\n    ```bash\n    # ChunkServer 1\n    make chunkserver PORT=8081 STORAGE=chunk_1\n\n    # ChunkServer 2  \n    make chunkserver PORT=8082 STORAGE=chunk_2\n\n    # ChunkServer 3\n    make chunkserver PORT=8083 STORAGE=chunk_3\n    ```\n\n3. **Mount Client**:\n\n    ```bash\n    make client\n    ```\n\n## 📁 Project Structure\n\n```sh\n├── api/                  # gRPC API definitions and generated code\n│   ├── generated/        # Generated gRPC code\n│   │   ├── chunkserver/  # ChunkServer service bindings\n│   │   ├── common/       # Shared type bindings\n│   │   ├── master/       # Master service bindings\n│   │   └── persistence/  # Persistence service bindings\n│   └── proto/            # Protocol buffer definitions\n├── bin/                  # Compiled binaries\n├── checkpoints/          # Master metadata checkpoints\n├── cmd/                  # Main application entry points\n│   ├── chunkserver/      # ChunkServer executable\n│   ├── client/           # FUSE client executable\n│   └── master/           # Master server executable\n├── logs/                 # Service log files\n├── mnt/                  # FUSE mount point\n├── pkg/                  # Reusable packages\n│   ├── chunkserver/      # ChunkServer implementation\n│   ├── client/           # Client implementation\n│   ├── common/           # Shared utilities and types\n│   └── master/           # Master server implementation\n├── scripts/              # Automation scripts\n├── storage/              # ChunkServer data storage\n├── tests/                # System tests and benchmarks\n├── .env.example          # Environment configuration template\n└── Makefile              # Build and run automation\n```\n\n## 📚 References\n\n- [The Google File System (2003)](https://www.google.com/url?sa=t\u0026source=web\u0026rct=j\u0026opi=89978449\u0026url=https://research.google.com/archive/gfs-sosp2003.pdf\u0026ved=2ahUKEwjt1IGM5t-NAxWKamwGHavnJdMQFnoECEAQAQ\u0026usg=AOvVaw2bOjX6TrilZNxIhFKWZtBo)\n- [gRPC Go Documentation](https://grpc.io/docs/languages/go/)\n- [FUSE Documentation](https://github.com/hanwen/go-fuse)\n- [Protocol Buffers Guide](https://developers.google.com/protocol-buffers)\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Futtam-li%2Fdfs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Futtam-li%2Fdfs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Futtam-li%2Fdfs/lists"}