{"id":28643536,"url":"https://github.com/samdvr/samsa","last_synced_at":"2025-10-18T02:56:06.295Z","repository":{"id":297553377,"uuid":"997149716","full_name":"samdvr/samsa","owner":"samdvr","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-07T03:25:19.000Z","size":217,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-13T08:41:48.129Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/samdvr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-06T03:29:51.000Z","updated_at":"2025-06-08T15:10:35.000Z","dependencies_parsed_at":"2025-06-06T04:39:33.183Z","dependency_job_id":null,"html_url":"https://github.com/samdvr/samsa","commit_stats":null,"previous_names":["samdvr/samsa"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/samdvr/samsa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samdvr%2Fsamsa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samdvr%2Fsamsa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samdvr%2Fsamsa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samdvr%2Fsamsa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/samdvr","download_url":"https://codeload.github.com/samdvr/samsa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samdvr%2Fsamsa/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267998368,"owners_count":24178530,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-31T02:00:08.723Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-12T23:06:53.117Z","updated_at":"2025-10-18T02:56:06.202Z","avatar_url":"https://github.com/samdvr.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Samsa - Distributed Storage System\n\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![Rust](https://img.shields.io/badge/rust-1.70+-orange.svg)](https://www.rust-lang.org)\n\nA distributed streaming storage system built in Rust with gRPC, PostgreSQL, and etcd. This implementation focuses on scalability, observability, and operational excellence while providing a simple yet powerful API for stream-based data operations.\n\n## Architecture\n\nSamsa is a distributed streaming storage system with a unified architecture:\n\n- **Unified Node**: Single server process that handles all operations (accounts, buckets, streams, data)\n- **PostgreSQL**: Metadata storage with full ACID guarantees\n- **Object Storage**: Configurable object storage backend (S3-compatible)\n- **etcd**: Service discovery and coordination\n- **Observability**: Comprehensive metrics, tracing, and logging\n\n## Features\n\n### Core Capabilities\n\n- **Account Management**: Bucket lifecycle and access token management\n- **Stream Operations**: Create, configure, and manage data streams\n- **Data Operations**: High-performance append and read operations with batching\n- **Service Discovery**: Automatic node discovery and health monitoring via etcd\n- **Observability**: Prometheus metrics, OpenTelemetry tracing, structured logging\n\n### Stream Processing\n\n- **UUID v7 Sequence IDs**: Time-ordered, globally unique record identifiers\n- **Flexible Read Patterns**: Read by sequence ID, timestamp, or tail offset\n- **Streaming APIs**: Real-time subscriptions with session management\n- **Batched Operations**: High-performance batching for both reads and writes\n- **Configurable Storage**: Multiple storage classes and retention policies\n\n### Enterprise Features\n\n- **PostgreSQL Integration**: Reliable metadata storage with full ACID compliance\n- **Configuration Management**: Layered configuration system with validation\n- **Access Control**: Token-based authentication with fine-grained permissions\n- **Metrics \u0026 Monitoring**: Comprehensive observability with Prometheus integration\n- **Health Checks**: Built-in health monitoring and graceful degradation\n\n## Quick Start\n\n### Prerequisites\n\n1. **etcd**: For service discovery and coordination\n2. **PostgreSQL**: For metadata storage\n3. **Rust**: For building the project (1.70+)\n\n### Running the System\n\n1. **Start Dependencies**:\n\n   ```bash\n   # Start etcd using Docker\n   docker run -d --name etcd-samsa \\\n     -p 2379:2379 -p 2380:2380 \\\n     quay.io/coreos/etcd:v3.5.14 \\\n     /usr/local/bin/etcd \\\n     --data-dir=/etcd-data \\\n     --listen-client-urls=http://0.0.0.0:2379 \\\n     --advertise-client-urls=http://0.0.0.0:2379 \\\n     --listen-peer-urls=http://0.0.0.0:2380 \\\n     --initial-advertise-peer-urls=http://0.0.0.0:2380 \\\n     --initial-cluster=default=http://0.0.0.0:2380 \\\n     --initial-cluster-token=etcd-cluster-1 \\\n     --initial-cluster-state=new\n\n   # Or use docker-compose for testing\n   docker-compose -f docker-compose.test.yml up -d\n\n   # Setup PostgreSQL (example with local installation)\n   createdb samsa_metadata\n   ```\n\n2. **Configure the System**:\n\n   ```bash\n   # Copy example configuration\n   cp config/example.toml config/app.toml\n\n   # Edit configuration for your environment\n   # See CONFIG_GUIDE.md for detailed options\n   ```\n\n3. **Start the Server**:\n\n   ```bash\n   # Build and run\n   cargo build --release\n\n   # Start with default configuration\n   ./target/release/server\n\n   # Or with custom configuration\n   SERVER_PORT=50052 \\\n   ETCD_ENDPOINTS=http://localhost:2379 \\\n   cargo run --bin server\n   ```\n\n### Using the CLI\n\nThe CLI provides comprehensive access to all Samsa functionality:\n\n```bash\n# Build the CLI\ncargo build --bin samsa-cli --release\n\n# Create a bucket\n./target/release/samsa-cli bucket create my-bucket --auto-create-on-append\n\n# Create a stream\n./target/release/samsa-cli stream -b my-bucket create my-stream\n\n# Append data\necho \"Hello, World!\" | ./target/release/samsa-cli data -b my-bucket -s my-stream append --from-stdin\n\n# Read data\n./target/release/samsa-cli data -b my-bucket -s my-stream read\n```\n\nFor detailed CLI documentation, see [CLI_README.md](src/cli/CLI_README.md).\n\n## Configuration\n\nSamsa uses a comprehensive, layered configuration system. Configuration can be provided via:\n\n- **Configuration files** (TOML format)\n- **Environment variables**\n- **Command-line arguments**\n\n### Key Configuration Sections\n\n- **Server**: Node address, port, etcd endpoints, heartbeat settings\n- **Storage**: Batch sizes, flush intervals, cleanup policies\n- **Database**: PostgreSQL connection settings\n- **Observability**: Metrics, logging, and tracing configuration\n\nSee [CONFIG_GUIDE.md](CONFIG_GUIDE.md) for comprehensive configuration documentation.\n\n### Example Configuration\n\n```toml\n[server]\naddress = \"0.0.0.0\"\nport = 50052\netcd_endpoints = [\"http://localhost:2379\"]\nheartbeat_interval_secs = 30\nlease_ttl_secs = 60\n\n[storage]\nbatch_size = 100\nbatch_max_bytes = 1048576\nbatch_flush_interval_ms = 5000\n\n[observability]\nmetrics_port = 9090\nlog_level = \"info\"\nenable_otel_tracing = false\n\n[database]\nhost = \"localhost\"\nport = 5432\nusername = \"postgres\"\ndatabase_name = \"samsa_metadata\"\n```\n\n## Development\n\n### Building\n\n```bash\n# Build all binaries\ncargo build\n\n# Build specific binaries\ncargo build --bin server\ncargo build --bin samsa-cli\n\n# Build for release\ncargo build --release\n```\n\n### Testing\n\n```bash\n# Run unit tests\ncargo test\n\n# Run integration tests (requires infrastructure)\nmake test\n\n# Run specific test categories\nmake test-basic\nmake test-performance\nmake test-errors\n\n# Development testing (faster)\nmake test-dev\n```\n\n### Project Structure\n\n```\nsamsa/\n├── src/\n│   ├── cli/                    # Command-line interface\n│   │   ├── main.rs            # CLI application\n│   │   └── CLI_README.md      # CLI documentation\n│   ├── common/                # Shared utilities and services\n│   │   ├── config.rs          # Configuration management\n│   │   ├── error.rs           # Error types\n│   │   ├── etcd_client.rs     # etcd service discovery\n│   │   ├── storage.rs         # Storage layer with batching\n│   │   ├── metadata.rs        # Metadata repository interface\n│   │   ├── postgres_metadata_repository.rs # PostgreSQL implementation\n│   │   ├── observability.rs   # Metrics and tracing\n│   │   ├── metrics.rs         # Metrics collection\n│   │   └── ...               # Additional common modules\n│   ├── server/                # Unified server implementation\n│   │   ├── main.rs           # Server binary entry point\n│   │   ├── mod.rs            # Server core logic\n│   │   └── handlers/         # gRPC service handlers\n│   │       ├── account.rs    # Account service implementation\n│   │       ├── bucket.rs     # Bucket service implementation\n│   │       └── stream.rs     # Stream service implementation\n│   └── lib.rs                # Library root\n├── proto/\n│   └── samsa.proto              # gRPC service definitions\n├── config/                   # Configuration files\n│   ├── example.toml          # Example configuration\n│   ├── development.toml      # Development settings\n│   └── production.toml       # Production settings\n├── tests/                    # Integration tests\n├── examples/                 # Usage examples\n├── migrations/               # Database migrations\n└── scripts/                  # Utility scripts\n```\n\n### API Documentation\n\nThe system exposes three main gRPC services:\n\n- **AccountService**: Bucket and access token management\n- **BucketService**: Stream lifecycle management\n- **StreamService**: Data append and read operations\n\nSee [proto/samsa.proto](proto/samsa.proto) for complete API definitions.\n\n## Performance\n\nThe batched storage system provides excellent performance characteristics:\n\n- **High Throughput**: Configurable batching optimizes for bulk operations\n- **Low Latency**: Tunable flush intervals balance latency vs. throughput\n- **Memory Efficiency**: Streaming operations with bounded memory usage\n- **Scalability**: Horizontal scaling through multiple nodes\n\n### Performance Tuning\n\nKey configuration parameters for performance:\n\n```toml\n[storage]\nbatch_size = 1000              # Records per batch\nbatch_max_bytes = 10485760     # 10MB max batch size\nbatch_flush_interval_ms = 1000 # 1 second flush interval\n```\n\n## Monitoring and Observability\n\n### Metrics\n\nPrometheus metrics are exposed on port 9090 by default:\n\n- Request rates and latencies\n- Storage utilization and performance\n- etcd connectivity and health\n- Database connection pool status\n\n### Logging\n\nStructured logging with configurable levels:\n\n```bash\n# Enable debug logging\nRUST_LOG=samsa=debug ./target/release/server\n\n# JSON formatted logs for production\nLOG_LEVEL=info SERVICE_NAME=samsa-prod ./target/release/server\n```\n\n### Tracing\n\nOptional OpenTelemetry integration for distributed tracing:\n\n```toml\n[observability]\nenable_otel_tracing = true\notel_endpoint = \"http://localhost:4317\"\n```\n\n## Deployment\n\n### Production Requirements\n\n1. **Infrastructure**:\n\n   - etcd cluster (3+ nodes recommended)\n   - PostgreSQL database (with appropriate sizing)\n   - S3-compatible object storage\n   - Load balancer (for multiple Samsa nodes)\n\n2. **Configuration**:\n\n   - Appropriate batch sizes for workload\n   - Database connection pooling\n   - Monitoring and alerting setup\n\n3. **Security**:\n   - TLS encryption for all communications\n   - Proper access token management\n   - Network security policies\n\n### Docker Deployment\n\n```dockerfile\nFROM rust:1.87 as builder\nWORKDIR /app\nCOPY . .\nRUN cargo build --release\n\nFROM debian:bookworm-slim\nRUN apt-get update \u0026\u0026 apt-get install -y ca-certificates \u0026\u0026 rm -rf /var/lib/apt/lists/*\nCOPY --from=builder /app/target/release/server /usr/local/bin/\nCOPY --from=builder /app/target/release/samsa-cli /usr/local/bin/\nEXPOSE 50052 9090\nCMD [\"server\"]\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Database Connection Issues**:\n\n   - Verify PostgreSQL is running and accessible\n   - Check database credentials and permissions\n   - Ensure database exists and migrations are applied\n\n2. **etcd Connectivity**:\n\n   - Verify etcd endpoints are correct and accessible\n   - Check network connectivity and firewall rules\n   - Monitor etcd cluster health\n\n3. **Performance Issues**:\n   - Adjust batch sizes based on workload patterns\n   - Monitor database performance and indexing\n   - Check object storage latency and throughput\n\n### Debug Mode\n\n```bash\n# Enable comprehensive debug logging\nRUST_LOG=samsa=debug,sqlx=debug cargo run --bin server\n\n# Enable specific module debugging\nRUST_LOG=samsa::storage=debug,samsa::etcd_client=trace cargo run --bin server\n```\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes with tests\n4. Run the full test suite: `make test`\n5. Submit a pull request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamdvr%2Fsamsa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsamdvr%2Fsamsa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamdvr%2Fsamsa/lists"}